Posted Thursday, Jan. 10, 2013, at 6:28 PM
IBM's Watson computing system at a press conference in 2011, before it appeared on Jeopardy!
Photo by Ben Hider/Getty Images
IBM’s Watson was a great fit for Jeopardy!. The machine, which took home the show’s $1 million prize in 2011, had the supreme intellect we’ve come to expect from Jeopardy! contestants—after all, it was designed specifically to answer the show’s questions. But it looks like Watson took a turn for the worse when researchers tried to teach it the intricacies of Internet slang by uploading the entirety of Urban Dictionary to its memory. What could possibly go wrong?
Arming Watson with this vast resource of vulgarity went about as you’d expect. It couldn’t tell which words in this fascinating new lexicon were appropriate for normal use—according to an account from Fortune magazine, Watson used the word “bullshit” in one answer to a query. Researchers promptly swept Urban Dictionary from its memory, and now Watson has a filter to prevent it from using profanity.
But it might be a good thing if Watson dropped an f-bomb every now and then. The whole endeavor was an exercise in beating the Turing test, which examines a machine’s ability to imitate a human’s written language capabilities. If Watson were to chat with a human, and that person couldn’t tell whether her correspondent was man or machine, then Watson would pass the test. To do that, as IBM scientist Eric Brown tells Fortune, Watson needs to know the subtleties of netspeak.
It’s unusual to talk on Gchat without an occasional “lol,” and many of the words and acronyms we use without thinking could trip up a machine that doesn’t understand them. They trip up people all the time, and Urban Dictionary is the best place to get a quick understanding.
In 2010, IBM’s David Ferrucci told the New York Times that Watson relies on “any kind of reference material you can imagine getting your hands on or licensing.” That includes millions of documents from dictionaries, encyclopedias, bibles, and literature—and it should include Urban Dictionary. Traditional reference works are all too slow (and perhaps too high-brow) to keep up with the language you might drop in a chat window.
Now, Urban Dictionary’s characteristic trait is going way too far on just about everything. With 6.9 million definitions and who knows how many completely made up words, there’s plenty to confuse even the smartest being. Even if Watson learns all these words, though, how often is it really going to have the chance to use a term like “Cosby sweater”? (Seriously, though, click that link at your own risk.)
If Watson’s going to pass the Turing test, I expect it to appreciate the full value of my OMG. And if it needs to call out my bullshit to do that? Well, that’d be a big win, too.