I work at a large international organization translating speeches from French, Spanish, and Russian. When a rumor began spreading in my office that our jobs were to be "supplemented" by computer translation software, we mostly laughed it off.
Anybody who's played around with translation software knows how bad the technology can be. Everyone in my office knows the hoary classic in which "The spirit is willing, but the flesh is weak," translated into Russian and back, comes out "The vodka is good, but the steak is lousy." We all knew, or thought we knew, that computer translation—also known as machine translation, or MT—could never replace a human translator, with his vast cultural and linguistic experience, his ear for nuance, and his superior multilingual education. We all slept very well in the certainty of our indispensability.
Still, machine translation has been in development for almost 60 years, since it was conceived as an offshoot of the cryptographic technology developed during World War II. Grandiose prophecies of its perfectibility have been made ever since. Had MT evolved while I hadn't been paying attention? Had it really improved to the extent that it could be a viable alternative to the human touch? In my off hours, I did a little research.
The first thing I found out, to my alarm, is that the machine translation industry is now pulling in something like $8 billion a year globally, and growing fast. For clients in national intelligence, MT research now represents a potential magical fix for the shortfall of Arabic translators. One developer, Language Weaver, has clients in the Department of Defense and law enforcement who pay anywhere from $20,000 to $100,000 for its software. If our leaders, in their infinite wisdom, have tapped translation software to defend our country, maybe my job wasn't quite as secure as I'd thought.
To put my mind at ease, I tried a simple experiment. I found a useful sample of text available in multiple languages: the Vatican's online biography of Pope Benedict XVI. I took one simple sentence—"His youthful years were not easy"—from each of the five foreign-language versions available on the Vatican Web site and ran it through eight translation programs, ranging from basic free software to expensive professional versions.
That might seem pretty straightforward, yet even this simple sentence caused insuperable problems for most of the programs tested. A lot of the free translation software, such as Babel Fish and World Lingo, was not able to recognize the Italian word for "youth," instead returning something along the lines of "The time of its giovinezza has not been easy." Systran, a so-called professional program, had the same problem. Perhaps the funniest was InterTran's contribution: "Not she was soft does the stop at time of the her youth." If they could do that to the pope, what would they do to Proust? And how could they ever hope to take on Osama?
The one that stood out from the pack was Language Weaver. Not only did it recognize the subject as a human being—"The period of his youth was not easy"—but it translated the rest of the paragraph with only one minor error. Intrigued, I began to put the software through its paces. A headline from El Pais: "A wave of attacks left more than 100 dead in several cities in Iraq." So far, so good. A speech from the United Nations: "The problem is to maintain the level of international attention and ensure the implementation of the commitments." Perfect. The first line of Don Quixote: "In a place of the Channel, whose name do not want to remember, has not much Time living a Hidalgo the spearheaded in shipyard, adarga Antigua, Rocín weak and galgo corridor." Clearly, in the world of machine translation, everything has its limits.
The problem with translation software is context. When you hear or read a sentence, your brain refers not only to the spoken words but also to its accumulated experience. The words "con" and "pen," for example, have various meanings and can represent different parts of speech. But when you read "the con is in the pen," you know instantly that you are dealing with an incarcerated criminal—your life experience allows each word to contextualize the other. A computer can't do that because it has no frame of reference to help it match the contingent sense of "con" as criminal to the contingent sense of "pen" as jail. Short of being endowed with a knowledge base as vast as the human mind's, a computer simply cannot read context.
There are three fundamental types of machine translation in use today. Basic machine translation breaks each sentence down into component words, which are further analyzed for their base forms and grammatical and functional structures. The sentences are then transferred into the target language, sometimes using an "interlingua," an artificial language or universal interface such as Esperanto that is applicable, in theory, to all languages. The interlingua idea has been around for a long time, but a truly viable one has yet to be written.
Memory-based systems do not actually translate, but draw on a broad database of exact or similar matches from sentences or phrases that are already known. They can be useful in areas with a lot of standard phrasing, such as business letters, boilerplate contracts, and medical diagnostics.