Turing Test, Reading University: Did Eugene Goostman finally make the grade?

A Computer Program Finally Passed the Turing Test? Not So Fast.

A Computer Program Finally Passed the Turing Test? Not So Fast.

Decoding the tech world.
June 10 2014 4:49 PM

A Computer Program Finally Passed the Turing Test?

Not so fast.

(Continued from Page 1)

Here the word “football,” spelled correctly this time, triggers a potted line about sports. Had the judge spelled “football” correctly in the previous question, Eugene might have given this response the first time.

In summary, Eugene will not be taking over the world anytime soon. It does not understand anything of the conversation. Rather, it tries to avoid letting the judge control the conversation—that way, it has to respond as little as possible. The more Eugene can monologue in a socially acceptable way, the more the program can just use prefabricated text to drag out the conversation until the time is up. There is minimal parsing of the judge’s words (“Manchester,” “doctor”) so that some conversational context is present, but it is quite minimal. And by declaring itself to be both young and foreign, Eugene further minimizes the judge’s expectations of a coherent conversation.

Now, this isn’t the first time computer programs have come off as real through such chicanery. My favorite, Mark Humphrys’ chatbot MGonz, managed to convince an unsuspecting student (who, unlike a Turing test judge, wasn’t told that MGonz might be a computer program) that it was human for 90 minutes all the way back in 1989. Here is a representative excerpt:

MGONZ: ok honestly when was the last time you got laid
USER: what?
MGONZ: cut this cryptic shit speak in full sentences
USER: i didn't catch the last thing you sent.
MGONZ: ok thats it im not talking to you any more
USER: it probably was stupid anyway.
MGONZ: what are you talking about
USER: just a repeat. as i expected from an idiot virgin such as you.
MGONZ: do you think religion is harmful
USER: it obviously has been in your case.
MGONZ: ok honestly when was the last time you got laid
USER: i think that your brain has been permenently damaged.
MGONZ: ok thats it im not talking to you any more
USER: i refuse to continue answering the same stupid laid question,
MGONZ: you are obviously an asshole
USER: unless you can tell me why you keep asking it.
MGONZ: ok thats it im not talking to you any more

By adopting a hostile, combative pose, MGonz could veil his lack of real responsiveness (and lack of understanding), much as Eugene Goostman’s youth and foreignness excused his poor conversational skills.

Still, it’s worth looking at just what was accomplished here. The Turing test was coined in several different versions by the brilliant computer scientist Alan Turing circa 1950. Turing’s greatest accomplishments include formulating the fundamental model of computation used in all theoretical computer science—the Turing machine—as well as helping to crack the German code cipher in World War II. He created the famous test as part of his explorations into the philosophy of artificial intelligence.

In its original version, the test took the form of an “Imitation Game” in which a man and a woman each try to convince a judge, through written communication, that the man is a woman and the woman is a man. Turing then posits: What would happen if a computer were to replace the man pretending to be a woman? Could the computer pretend to be a woman as well as a man could?

Turing later redefined the Imitation Game to the standard version known today as the Turing test: In text-message-style written exchanges, could a computer persuade a judge that it was human with some degree of success? But it’s worth dwelling on the original formulation. Turing was looking for some way to measure computers not in terms of calculational ability or operations per second, but in terms of their ability to negotiate a specifically human problem—in this case, deceiving another human being.

Trashing the Reading results, Hunch CEO Chris Dixon tweeted, “The point of the Turing Test is that you pass it when you've built machines that can fully simulate human thinking.” No, that is precisely not how you pass the Turing test. You pass the Turing test by convincing judges that a computer program is human. That’s it. Turing was interested in one black-box metric for how we might gauge “human intelligence,” precisely because it has been so difficult to establish what it is to “simulate human thinking.” Turing’s test is only one measure.

So the Reading contest was not the travesty of the Turing test that Dixon claims. Dixon’s problem isn’t with the Reading contest—it’s with the Turing test itself. People are arguing over whether the test was conducted fairly and whether the metrics were right, but the problem is more fundamental than that.

“Intelligence” is a notoriously difficult concept to pin down. Statistician Cosma Shalizi has debunked the idea of any measurable general factor of intelligence like IQ. Nonetheless, the word exists, and so we search for some way to measure it. Turing’s question—how could we term a computer intelligent?—was more important than his answer. The Turing test, famous as it is, is only one possible concrete measure of human intelligence, and by no means the best one.

In fact, artificial intelligence researchers tend to disparage or ignore the Turing test. Marvin Minsky dismisses it as “a joke,” while Stuart Russell and Peter Norvig, in the standard textbook Artificial Intelligence, write, “[I]t is more important to study the underlying principles of intelligence than to duplicate an exemplar. The quest for ‘artificial flight’ succeeded when the Wright brothers and others stopped imitating birds and started using wind tunnels and learning about aerodynamics.” Turing was trying to establish a beachhead for conceiving of intelligence in a way that wasn’t restricted to humans. Now that we can conceive of it, the question shifts not to whether a computer can trick people in one particular way, but to the vast varieties of ways in which it can display intelligence, human or no.

The gut appeal of the Turing test remains, however. We are obviously fascinated and threatened by the thought that machines could uncannily imitate us. We like to think we’re special; what if we’re not? We like to think no machine could ever fool us, certainly not the likes of Eugene Goostman or MGonz. But actually, they already are. (Automated bots on Twitter are already capable of attracting followers and behaving in plausibly human ways; in a sufficiently regimented social context, such as Twitter, human behavior becomes sufficiently stupid that a computer can imitate it with little to no understanding.) No technological milestone was passed this last weekend at Reading University. But humans certainly didn’t come out looking any better for it.