It already speaks 57 languages as well as a 10-year-old. How good can it get?
Photograph by Karen Bleier/AFP/Getty Images.
A computer that translates "natural language" is the holy grail of artificial intelligence—language being so integral to our intelligence and to our humanness that to crack it would be to achieve artificial consciousness itself. But until relatively recently, attempts at it have mostly sucked. They’ve tended to mix the words of one language with the grammar of the other, getting both wrong in the process. Mostly, this is the fault of literal translation—the kind of process that translates kindergarten as children garden. Newer methods—dominated by Google—turn the problem around: Using data, statistics, and brute force, they succeed in part by their refusal to "deconstruct" language and teach meaning to computers in the traditional way.
Google is grossly outperforming the rule-based methods that have historically been used to teach language to computers. These classic methods work on the principle that language can be decoded, stripped to its purest component parts of "meaning," and built back up again into another language. Linguists feed computers vocabularies, grammars, and endless rules about sentence structure—but language isn’t so easily formalized this way. There are more exceptions, qualifications, and ambiguities than rules and laws to follow. And, when you really think about it, this approach hardly respects the complexity of the problem.
Enter Google Translate—Google didn’t invent this method but they’re certainly dominating it now—which avoids that reductive concept of language altogether. Google mines existing translated material, recognizes how words or phrases typically correspond, and uses probability to deliver the best match based on context. Being Google, its digital Rosetta Stone amounts to trillions of words, from a corpus of U.N. documentation (in its six official languages, translated at high quality) to company memos to Harry Potter novels. Although Google builds a "language model" that describes the basic look of a well-formed sentence, it doesn’t have linguists try to decode the languages at all. Wittgenstein’s maxim of "Don’t ask for the meaning, ask for the use" is an effective working mantra for Google's statistical method.
In his wonderful book, Is That a Fish in Your Ear?, the Princeton linguist and translator David Bellos notes the link between early machine translation pioneers and modern philosophers of language—that hopeless pursuit to discover “the purely hypothetical language which all people really speak in the great basement of their souls.” When I spoke to Bellos about Google, he stressed that Google's achievements doesn’t make Google Translate akin to how human translation actually works. Though a translation is what you get, translation isn’t really what Google Translate does. (Depending on what we understand by "translation"—but let’s not get into that.) “It’s like the difference between engineering and knowledge,” says Bellos. “An engineering solution is to make something work, but the way you make it work doesn’t necessarily have anything to do with the underlying things. Airplanes do not work the way birds fly.”
Which is quite true. But even if Google Translate doesn’t translate language like humans do, there are parallels in the effect, especially in the way Google Translate learns language. Children don’t learn with prescriptive rules and by deconstructing sentence structure. Subjects, nouns, verbs—these are drilled later, once we’re all but fluent. When I spoke to Franz Och, who heads up Google Translate, he told me how, in hindsight, it’s almost obvious that rule-based methods aren’t necessarily as fruitful as data-driven ones. When children learn, “You just give examples, you interact with the child—grammar is something which is never explicit, it’s always implicit,” he says. “Just the same, when our system is learning, a lot of the grammar is not explicit—it’s implicit in the model parameters, in what comes out.”
Here Wittgenstein pops up again. Translation was one of the philosopher’s many examples of a "language game," a form of rule-following wherein we partake in the game (of translation) without direct use of the rules that are implicit in it. Translation isn’t reducible to its rules (grammar, syntax, semantics), but they’re still there, in some sense, beneath the surface. Just the same, Google Translate doesn’t grasp the "rules"—they’re implicit, and learned implicitly, as Och says.
A metaphor, perhaps, but this isn't the first time a little applied Wittgenstein has been put to work at Google, intentionally or not. Part of Google’s search power is in its intelligent handling of context: Searches for "hot dogs" yield results for the food rather than puppies, working on the insights of family resemblance. In Steven Levy’s recent book about Google, In the Plex, an interview with search engineer Amit Singhal suggests that the Wittgenstein influence was deliberate, and was a key breakthrough. Another example: “Today, if you type ‘Gandhi bio,’ we know that ‘bio’ means ‘biography,’ ” Levy quotes Singhal. “And if you type ‘bio warfare,’ it means ‘biological.’ ” In other words, Google's search engine learns its semantics from human input and improves with more data, just as Google Translate does.
Jeremy Kingsley is a writer in London.