Google Translate used to trip up more in its early days, when its data was skewed to the formal legalese of U.N. and EU documentation: Bellos recalls how searches for "avocado salad" in French (salade à l’avocat) would return "lawyer salad" in English—"avocat" being both avocado and lawyer in French, and in the corridors of the EU, "avocat" being more likely to mean lawyer.
This sort of family resemblance world of meaning is a given for humans. It's also something we clearly can teach computers—and the key to doing so is just gathering more data. Google was the first to really put this idea to use, and it marked a significant step from computers reading strict syntax to reading the force of meaning with context-sensitive intelligence. Today, the algorithm has an understanding of language something like a 10-year-old’s, but its rate of improvement is fast exceeding human language-learning development. When I say that the computer "learns," that isn’t just a metaphor. “In a very meaningful sense it’s really learning language,” says Och. It might not be able to form its own sentences, digest Gödel, Escher, Bach, and answer your questions—full-blown A.I.—but it teaches itself automatically from data alone.
With all its data, might the computer have some kind of advantage where it understands language better than we do? Could it even beat us at translation? Any reflection on the difficulties and subtleties of human translators’ work and art, from simultaneous interpreting to subtitling films, makes that suggestion almost laughable—almost. Bellos expressed great admiration for the innovations at Google, but pointed out its limitations: “Machines aren’t—at least aren’t currently—the same sort of thing as a human translator.” To reach human levels a machine must understand context in all its forms, be culturally aware in a particularly profound and deeply embedded way. It must understand the force of meaning, which of course comes from more than page by page data or facts it can be fed. Nonetheless a hefty chunk of language understanding does come from data, and data Google certainly has.
Perhaps the barrier is lower than we think. It’s at least conceivable that the machinery could become better linguists. Moore’s law—that computer power doubles relative to price every 18 months—persists, and is not to be understated. More data, more tweaks to the model, bring improvements year on year. The big question is whether or not Google's date-intensive method is determinately limited in some way. “It’s very clear that our translation quality will continue to improve,” says Och. “Where the limit is is a fundamental question.”
In the meantime, Google Translate’s enormous databank goes about learning languages as we actually speak them, taking in all their complexities and inconsistencies. It manages swear words and slang better than any rule-based system could, and keeps up to date with contemporary language. It speaks the English we speak, wherein infinitives are split and nonplussed means unperturbed—and linguistic conservatives have no say in what’s correct. But Google does. And does for dozens of languages besides English.
Might all that knowledge about language and how we speak it be something to worry about, when at the hands of a single company? Translation can be a sensitive thing—laden with responsibility that can’t be passed up to algorithms with a shrug of the shoulders. It’s worth questioning the power it brings Google—and especially since Google likes to shy away from the murky waters of culture and politics, where authority over language certainly brings them. Before you know it, all this technical tinkering and grand ambition brings them into public, social, and moral realms—something Evgeny Morozov has written about brilliantly.
In Wittgensteinian fashion, I shouldn’t advance any thesis—but there is food for thought here. With some clever technology built by some clever people, Google has decreed on language and translation, confirming the nonsense of prescriptive rules—a small victory for the descriptive grammarians, if an obvious one—and that Wittgenstein’s remarks on language really bear an effective model of artificial means of learning it. Google Translate might not become better than humans at translation any time soon—perhaps it can’t be done. But at some point it will be good enough: cheaper and more convenient than hiring human translators (for many whose standards are lower, this is already true) or bothering to learn languages ourselves—at which point "good enough" becomes our standard. How do we want our machine translators to work for us?