A Brief Introduction to Machine Translation
Whilst machine language translation systems are great for getting a rough idea of what a text is about, virtually all serious language translation is still done by human beings hammering away at their keyboards. Indeed, the whole concept of machine language translation is based on the assumption that a language can be broken down into a set of rules and that there is in a sense a "right translation". In reality, however, language translation is more an art than a science.
The idea of using machines to translate human languages is not new, with the first recorded proposal dating back to the 17th century. No such machine was feasible, however, until the development of computers during World War II. Work began in earnest after the war, somewhat driven by the advent of the cold war and the need to translate vast volumes of classified Soviet documentation. The famed 1954 Georgetown experiment demonstrated the automatic translation of over 60 sentences from Russian to English and created great excitement in US government circles. Vast funding programmes followed over the next decade.
The party came to an abrupt stop, however, following the damning 1966 ALPAC report which concluded that machine translation had little practical future either in the short or medium term. As a result of the report, funding dried up and little or no progress was made on machine translation until the late 1980s when computational power began to increase rapidly, making the use of artificial intelligence possible. At an estimated 60%, accuracy was, nevertheless, still a major issue. Ultimately, terminological, semantic and grammatical ambiguities derailed all such projects.
In more recent times, the US Defence Advanced Research Projects Agency (DARPA – the force behind the development of ARPANET in the 60s, the forerunner to the Internet) launched the Gale programme (Global Autonomous Language Exploitation), with the goal of providing the Pentagon with a machine translation solution that generates less than 10% errors. It is primarily targeted at Arabic and Mandarin Chinese and initial reports seem to indicate that this target is being met.
It was the advent of the information society in the late 90s that brought machine translation into the public domain. Accuracy took a back seat to access. Indeed, the free availability of such systems has fundamentally reshaped the web by allowing us all to browse websites written in a host of languages. Popular online machine translation systems include Yahoo's Babelfish and Google's Language Tools. Quality varies dramatically depending on the complexity of what you're translating. This is a sample financial translation using both of these systems. Judge for yourself:
Original text: Les obligations à terme de deux ans ont gagné 0, 01 $ à 100,20 $; celles à 10 ans ont cédé 0,05 $ à 98,27 $; celles à long terme reculaient de 0,15 $ à 123,20 $.Le rendement des obligations canadiennes à 5,75 pour cent échéant en 2033, un indicateur du coût à long terme des emprunts, était de 4,27 pour cent. L'argent au jour le jour était à 4,25 pour cent. Ces données sont fournies par la Banque Royale.
Google's Language Tools: Bonds after two years have won 0, $ 01 to $ 100.20; those 10 years have yielded $ 0.05 to $ 98.27; those long-term decline of $ 0.15 to $ 123.20 . The return on Canadian bonds at 5.75 percent maturing in 2033, an indicator of long-term costs of borrowing, was 4.27 percent. The money from day to day was at 4.25 percent. These data are provided by the Royal Bank.
Yahoo's Babelfish: The obligations in the long term two years gained 0,01 $ to 100,20 $; those at 10 years yielded 0,05 $ to 98,27 $; those in the long run moved back of 0,15 $ to 123,20 $.Le output of the Canadian obligations to 5,75 percent falling due in 2033, an indicator of the long-term cost of the loans, was of 4,27 percent. L' money from day to day was with 4,25 percent. These data are provided by the Royal Bank.