The magazine of the Melbourne PC User Group

Assault on the Tower of Babel
Major Keary

The Old Testament tells us an attempt to reach heaven by way of a tower was frustrated by the imposition on mankind of multifarious languages. One wonders what role was played by the Melanesians ancestors, for they have more identified languages (as distinct from dialects) than the rest of the world put together.

For almost half a century computer technology has been perceived, with varying degrees of interest, as a means of overcoming the Babelonian impediment. Machine Translation (MT) of text from one language to another was a goal which led to enthusiastic attacks on the Tower of Babel. Each assault met with limited success and then petered out Eventually the campaign slowed almost to a halt, but has been revived with new attacks on two fronts.

Briefly, the history of machine translation (MT) began in 1946 when people began to cast about seeking applications for the new computer technology. It seemed logical that war-time code and cipher breaking successes could be repeated in the field of automated translation. During World War II the British and Americans were able to read German and Japanese encrypted radio communications. The highly successful code-breaking operation, known as Ultra, was kept secret until the early 70's. As a result there has been little public awareness of the fact that the first practical application of electronic computer technology was for cryptanalysis.

It is not surprising that those who developed Ultra perceived automated translation as a natural progression. MT was given impetus by the cold war. By 1951 programs had been written to handle Russian-to-English for the translation of scientific and technological material. Crude by any standard, they relied on a dictionary of Russian words with English equivalents. An expert in a given field could make sense of such a translation, but at the cost of excessive time in rewriting if others were to use the product. The Pentagon used MT to produce daily translations of Pravda which were rough, but sufficient for general immediate distribution. Anything which required more detailed scrutiny was passed over to human translators.

Progress was hampered by minuscule memory available in those days. However, there were achievements: techniques of handling syntactical structures and idiom were improved, and methods developed for the compression of dictionaries.

About 1954 the Americans staged a demonstration of MT. While producing nothing new, the surrounding publicity attracted international attention and triggered a race. Russia saw the advantages of making English-language technical publications available to its own professionals and students. They used MT for translation to a Level sufficient for specially trained writers to massage output into publishable form. The need diminished with the emergence of their own professional and technical authors, and so did interest in MT On the other side of the Iron Curtain lack of cost effectiveness became the inhibiting factor. Techniques of training linguists had improved and human translators became more readily available.

And so MT remained virtually static. Some commercial programs were developed, but were either for limited application or required considerable human editing. In Japan, and later Taiwan, several commercial packages appeared for specific commercial fields.

Recently new challenges have sparked fresh campaigns separately in Europe and Asia. Artificial intelligence has provided an effective weapon for a new generation of crusaders. However, the Europeans and Asians are employing quite different strategies: meta-language and straight transfer.

Europe is preparing for a unified market which will bring with it the need for multiple translation on a massive scale. Current membership of the EEC represents 121 possible linguistic pairs! The deadline is 1992, by which time pertinent regulations and official information are to be available in each and every one of the languages of market members.

The strategy is to create a meta-language. That is, an artificial core language which will accept input from any one of the real languages and produce output in any other of them. The meta-language does not have to be comprehensible to anyone or anything other than the program. While it has the benefit of one size fits all, context insensitivity is it weakness which can result in gross errors.

The alternative is a straight transfer system limited to single source and target languages. Early work in MT was confined to straight transfer, which is a one way street. An English to French program, for example will not work in reverse. Development of the meta-language system is seen as the only viable option.

Transfer type MT programs do, however, have their place. Those available in Japan. are fairly effective packages for limited commercial use. The existing English-to-Chinese programs available in Taiwan are slow and prone to error. An additional drawback to currently available packages is the need for considerable storage space for files which contain dictionaries and the like. On mainland China at least one program has been developed, but it is said to be cumbersome and unreliable.

In the Republic of China on Taiwan important new advances are being made in an English/Chinese translation project. With direct government encouragement several companies are independently pursuing their own lines of research and development. Because they are dealing with one pair of languages, the strategy is to develop straight transfer systems using the latest techniques of artificial intelligence and utilising high speed processors.

The most promising results are coming from Behaviour Tech Computer Corporation (BTCC) which has been working on a program, Arch Tran, since 1985. It is being used commercially, but is not likely to be ready for sale as a package for a while yet. In the meantime, it is reported, Unisys and Hewlett Packard are among clients translating manuals with Arch Tran.

Arch Tran is reputed to be capable of translating from English into Chinese at a rate of up to 6,000 words per hour. That is in the order of five times faster than the best human translator, even including the need to edit  output.

The problem of error-free machine translation has so far been intractable. MT is presently able to handle simple, formal text, but once any degree of complexity is introduced errors appear. BTCC's Arch Tran is the closest anyone has come to producing clean output and the system is improving. A 200,000 word dictionary looks for matches. The program then tags and assesses possible ambiguities in context to clarify intended meaning, and identifies non-words or expressions (symbols, abbreviations, equations, etc.). Each sentence is then analysed. Artificial intelligence techniques enable Arch Tran to abandon lines of fruitless analysis rather than trudge through to the end of a particular linguistic pattern.

Instead of producing garbled output Arch Tran, when unable to cope, leaves words or phrases in the source language. That is an improvement in technique which sets Arch Tran apart. Garbage has always bedevilled MT. I recollect seeing output from an early English-to-French program which could not cope with encore, producing a string of numerals.

Translating is a difficult task. Every language has words or expressions which simply do not translate. For example, "Nuts and bolts" is a common expression in English, but literal translation into another language would make nonsense. Such usages find their way into formal text and can present problems even for skilled human translators. It is little wonder that machines, when they encounter misused words (such as anticipate instead of expect) are confused.

Artificial intelligence may give MT systems the capacity to identify the intended meaning of words such as anticipate. However, I suspect final editing by human translators will be necessary for a long time yet. The difference between early MT and the new generation programs is that the need for human input is reduced considerably.

As an example of the problems which presently face MT, take the sentence, "My dog is big like a horse". That is quite likely to translate as, "My dog is fond of a big horse", which might be true but is not the same as the original sentence.

That does not detract from the value of MT which has the capacity to fill a very useful role in communication. Even though it has been around for almost fifty years, latest developments make MT as exciting now as it was then. The rapid development of CD-ROM technology, availability of hardware which can access virtually unlimited RAM, gigabyte hard disks, and the increasing speed of processors are very likely to place sophisticated translation packages within the reach of PC users by 1995.

I think it will be a few years beyond that before MT can cope with something like this: 
"Mr Bennett's property consisted almost entirely in an estate of two thousand a year, which, unfortunately for his daughters, was entailed, in default of heirs-male, on a distant relation; and their mother's fortune, though ample for her situation in life, could but ill supply the deficiency of his." 
(Jane Austen: Pride and Prejudice)

While MT is not an application the average user group member is likely to place on a wish list, it is growing in importance and might well point to a profitable area of development for local software writers. A bit esoteric, you say? Well, at least two Chinese word processing packages are Australian products, and one of them is regarded as being at the leading edge in that field.

Reprinted from the July 1991 issue of PC Update, the magazine of Melbourne PC User Group, Australia

[About Melbourne PC User Group]