April 24 2009 12:28 PM

I'm Human, Computer, I Swear!

It's time to move beyond those squiggly letter tests that Web sites use to weed out spam.

If only someone had listened to computer scientist Moni Naor in 1996, proving that you're human on the Internet would have been so much more interesting. Naor was among the first to propose that simple tests only humans can solve would prevent malicious bots from infiltrating the Web. In an unpublished manuscript, Naor proposed nine possible tests, including gender recognition in images, fill-in-the-blank sentences, and a "deciding nudity" quiz in which you're asked to identify which person isn't wearing any clothes.

Alas, rather than getting to play "find the naked person" every time we sign up for a webmail account, we're now stuck with those reviled squiggly letter tests known as CAPTCHAs. Let's give credit where credit's due: These tests have been incredibly effective in combating spam. But even CAPTCHA pioneer Luis von Ahn, who received a MacArthur genius grant on account of his squiggly-letter work, admitted to me that they won't be a solution forever. For all their success, these tests are a crude way to weed out the bots among us. And they have proliferated to so many sites that the task of proving your humanity on the Internet is beginning to feel like an imposition.


This guess-the-funny-letters approach has been the dominant strategy in bot warfare for the past decade or so. As spammers have gotten more sophisticated, the CAPTCHAs have gotten harder to solve. Now, it's not at all uncommon for flesh-and-blood people to botch the tests, failing to convince the computer of their Homo sapiens credentials.

There is something uniquely vexing about having your humanity disputed by a machine. Don't blame the computer. While humans are perfectly capable of spotting a machine masquerading as a human on the Internet—the classic definition of a Turing test—it gets much more difficult when you're asking a computer to be the judge, particularly as hackers get better at teaching computers to read. Tech publications regularly report that this or that CAPTCHA has been cracked by spammers, though there's often some dispute over whether the culprits are using optical character recognition or simply paying people in India to solve them en masse. (This site, for example, charges $2 per 1,000 solutions.) An engineer at Google told me that the company has collected evidence of OCR attacks on its CAPTCHAs but believes the majority of illicit solving is being done by humans.

We all despise spam, but using CAPTCHAs as a first line of defense often amounts to killing a mosquito with a squiggly machete. Serving readers these Pictionary exercises might be called for in situations with high-value targets, like free webmail services that get conscripted to send more spam. But is it really necessary for me to fill out a CAPTCHA in order to send an e-mail to an English professor at Auburn University?

These days, most of the advances in human verification involve new and improved tests. Google is experimenting with rotated images, since computers still have trouble telling up from down. Von Ahn currently runs a system called reCAPTCHA that helps digitize books in the process of getting people to identify words. (He says reCAPTCHA, which is used by more than 100,000 sites, is still spammer-proof.) For some really nutty proposals, check out this paper, which proposes a series of word association games and inkblot tests.

I don't doubt that these innovations can extend CAPTCHA's lifespan for at least a few years. But I don't think this should be the goal. Rather, developers should be moving away from a system where humans have to prove they're human, particularly for sites that are low-value targets. (No offense, Auburn English department.) Ideally, software should be able to figure out who's a human on its own.

