Text Mining: removing “Nuggets” of ambiguity from FrameMaker content

On May 10th, we had an extremely informative webinar with guest John Smart, “Text Mining from Adobe FrameMaker: How to find lost Terminology in seconds“.

This webinar focused on how one of Smart Software’s tools (Text Miner) can be used to “mine” terminology from FrameMaker documents, which can make both language translation and reading comprehension be much more effective.

Goal, finding the right 1,500 words

English has over 1,000,000 words and grows daily. Unfortunately, most of use approximately 1,500 words in speech or writing on a daily basis. Frequently, many of these 1,500 words are not good for global communications, as many words in our common vocabularies have no equivalent in other languages.

John Smart’s webinar included the “cloud” image of words below. Believe it or not, most of these words (in various contexts) do not translate well (or at all) into most languages:

cloud of words

One word that stands out in the cloud is “using”, a gerund. English is one of the only languages that used “-ing” words, and these are to be avoided.

What text mining can find

The following are major text components that text mining can identify in your existing FrameMaker content:

Usage of this tool and technology can make if possible for you to identify your terminology in days vs. years. (The on-demand recorded webinar makes the reasons for this much more evident.)

Sample results of text mining from FrameMaker content

As the webinar made clear, existing terminology is extracted as key words and displayed with “left text context” and “right text context.” The screen shot below shows a typical example:

sample extracts

One example of ambiguous terminology that could use a replacement is “excessive FORCE.” This begs the question, “what amount of pressure is excessive?” Although the meaning of the original text may seem obvious to an engineer or a seasoned group of native English speakers who have read several versions of previous documentation, this type of terminology would be extremely difficult to translate with accuracy.

In addition, the true meaning in English is widely open to interpretation, which could have legal ramifications for a failure with aircraft or similar hardware.

The essential human factor in text minining

As John Smart made clear, although Text Miner does an admirable job of identifying most of your terminology “automatically,” carefully qualified staff members are required to ensure that desired results are achieved.

You will need to designate a “Head Text Miner” who has the following qualifications:

The goal: one word, one meaning

A huge number of historic events and trends led to English becoming one of the most expressive languages on the planet. By its nature, English often offers a dozens ways (or words) to “say the same thing.” Of course, different words or phrases have different nuances and connotations.

The goal in effective technical communications for a global audience is whenever possible have one primary word with one meaning. The illustration shows “Simplified Spanish” that resulted from a clean terminology based after data mining. Because there was one English word for “electrical,” simplified choices in Spanish were the results.

Simplified Spanish

Review and Refine terminology

The webinar gave sensible guidelines for creating a review team to ensure that correct terminology is saved or rejected. Recommended steps include:

Using less text with the right words

Several of our blogs in the past have touched on Simplified English and other tools that can be used to “reshape content for the small screen” to achieve better content for mobile devices. Text mining to refine your terminology can be equally essential to ensure that your message is crystal clear, in all languages, and to avoid legal issues due to missing cautions or warnings.