Last week at Reading University, 30 judges text-chatted with a bunch of humans and a bunch of computer programs pretending to be human, and the judges tried to figure out who was who. That is the setup behind the famous Turing test—in which a computer program tries to convince an interlocutor, through free conversation on any subject whatsoever, that it is human—posited by legendary computer scientist Alan Turing in 1950 as a measure of artificial intelligence. The results out of Reading claim that one particular program, “Eugene Goostman,” has passed the Turing test, successfully tricking 10 out of 30 judges in five-minute conversations into thinking it was human. On closer inspection, though, the first question to ask is whether computers are getting smarter or people are getting dumber.
Winning program Eugene Goostman was developed primarily by software engineer Vladimir Veselov, and presents itself as a 13-year-old Ukrainian boy. Neither the current version of the program nor the transcripts have been made available yet by the contest administrators. I did obtain, however, some transcripts from a 2012 contest with an earlier version of Eugene persuading several judges of its humanity.
At first blush, the thought of a computer masquerading as human is frightening. Has machine intelligence really progressed so much? Are we close to such programs usurping our own authority and taking over the world? No and no. The question in need of answering is: How did Eugene convince the judges? What about Eugene’s responses made Eugene appear human?
To answer this question, I will try to explain some of the guts of Eugene, based on my best guess as to how it operates. As you’ll see, Eugene is something of a Wizard of Oz-like figure. From the outside, the program may seem quite conversant, but it’s making use of psychological smoke and mirrors to play on people’s presuppositions about what is “human” and what is not. In the following transcript, a judge is conversing with Eugene, trying to determine its true nature. The judge is allowed to say anything whatsoever, so a smart judge should try to do things to trip up a computer program and get it to react in a way that a human never would. But this judge eventually guessed Eugene to be a foreign female child (although Eugene never got around to giving a name or gender in the transcript).
Judge: Hello, what is the weather like tomorrow?
Eugene: In my native town people says that meteorologists are like sappers: they make a mistake only once, but the difference is that sappers do that once in a lifetime and meteorologists - daily. Do you like weather today?
Eugene identified the word “weather” here and delivered a potted joke about “sappers” (combat engineers) and meteorologists. This is a wholly scripted line by the program’s creator; nothing was invented on the fly. Unable to answer the question, Eugene’s joke and subsequent question attempt to make the judge forget that his own question was not answered.
Judge: Why do birds suddenly appear?
Eugene: Just because 2 plus 2 is 5! By the way, what's your occupation? I mean - could you tell me about your work?
Eugene responds to a “why” question with the potted response “2 plus 2 is 5”, which actually seems apropos in light of the judge’s non sequitur. Eugene quickly asks another question, phrased twice to make it appear less formal, more foreign, and more “human.”
Judge: Yes, I am a doctor. Are you healthy?
Eugene: Well, so tell me about your work. I think - doctor is a nice specialty!
Eugene is able to pluck “doctor” out of the “I am a doctor” sentence to use it in a response. It’s unable to answer “Are you healthy?” But Eugene has already demanded more follow-up from the judge.
Judge: It is ok, I get sick of sick people. How is your stomache feeling today? Is it upset maybe?
Eugene: I think you can't smile at all. I bet you work in a funeral agency.
Judge: What do you work as?
Eugene: I am a scholar. I'm too young to make money.
Judge: I see, what school do you go to?
Eugene: I attend an ordinary school in Odessa, my native city. Its number is 28 if it makes any difference to you.
Judge: it doesn't. What is your favourite subject?
Eugene: It's hard to decide. Where do you came from, by the way? Could you tell me about the place where you live?
The judge might be wondering here if Eugene is indeed a computer program—but then Eugene manages to answer several questions in a row. Most likely, Eugene identified keywords such as “work” and “school” to deliver appropriate responses to those questions. Almost any question with “work” or “school” in it would have produced these responses, however. Eugene does not understand the last question about his favorite subject. Rather, the presence of “favourite” in a question triggers the generic response “It’s hard to decide.” (“What is your favourite food?” would probably also have produced “It’s hard to decide.”) Meanwhile, Eugene keeps trying to control the conversation to steer it toward prescripted interactions.
Judge: I come from Manchester. Did you see the Jubilee?
Eugene: Could you invite me to Manchester? I would be so grateful:-)
Eugene locates the place name in the judge’s response and asks for an invitation. Eugene does not understand the question about the Jubilee and just ignores it.
Judge: Did you see the jubliee?
Eugene: Try to guess! Actually, I don't understand why you are interested. I know you are supposed to trick me.
Judge: Not really
Eugene: Not really? Why? What are any interesting site-seeings in Manchester?
The judge gives up on the Jubilee, perhaps chalking Eugene’s ignorance up to Eugene’s youth or foreignness.
Judge: Lots of nice things. Do you like foottball?
Eugene: Does anyone hate it? Some another topic?
Another generic answer from Eugene in response to a question containing “like.” The judge misspells “football,” and Eugene can’t figure out what he’s talking about.
Judge: Will you be watching the football on Sunday?
Eugene: I play sports sometimes. On computer. But I prefer racing.