I'm Human, Computer, I Swear!
It's time to move beyond those squiggly letter tests that Web sites use to weed out spam.
To that end, there are a few interesting techniques that can at least weed out the dumbest spambots. Developer James Edwards offers a nice overview of noninteractive alternatives. My favorite is the "honeypot" defense: Since bots live inside the Internet and see HTML, not the pretty versions of Web pages our browsers make for us, they can have a difficult time figuring out what's visible to humans and what isn't. So when they see a submission form—say, to submit a comment to a blog—they're inclined to enter something in all the fields and try to post it. The honeypot here is an input field that is invisible to readers. As a human, you will never know this secret input box exists, and even if you did, there would be no way for you to access it. If the site receives a submission in the invisible field, then, it's probably coming from a bot and can be automatically discarded.
Spammers, of course, are dedicated, able, and not easily fooled. Anyone trying to target a specific site would not have much trouble bypassing this defense. But for sites whose main threat comes from roving bots that paint with a wide brush, these sorts of solutions are sensible.
For more robust protection, my hope lies in systems like Akismet, which applies a complex algorithm on blog comments to determine whether they are spam. It's in the same vein as e-mail spam filters that examine the content of a message and give it a thumbs up or thumbs down. These filters have gotten a lot better over the years—it's no longer possible to fool the e-mail watchdogs by spelling your product R0lex. Another automatic system called Bad Behavior boasts that it doesn't even bother with the content. Instead, it uses what it calls a "fingerprinting" strategy to identify spammers based on technical characteristics, like the IP address and the details of the HTTP request, exploiting the fact that most spammers are sloppy programmers who leave at least a few digital red flags waving.
Herein lies the key to leaving squiggly letters behind. As Alan Turing laid out in the 1950 paper that postulated his test, the goal is to determine whether a computer can behave like a human, not perform tasks that a human can. The reason CAPTCHAs have a term limit is that they measure ability, not behavior. The history of computing shows us that machines will eventually learn how to perform all manner of tasks—like identifying words, for instance—that we currently assume only humans can solve.
How might it be possible to measure behavior rather than ability? The other day, I was writing a note to company using the online form they provided for media requests, doing the usual amount of typing, backspacing, and retyping as I tried to phrase my note in a way that would make them respond quickly. It occurred to me that the random, circuitous way that people interact with Web pages—the scrolling and highlighting and typing and retyping—would be very difficult for a bot to mimic. A system that could capture the way humans interact with forms algorithmically could eventually relieve humans of the need to prove anything altogether.
Any solution that could replace CAPTCHAs en masse would have to be free, work across a wide variety of platforms, and be easy for the average blogger or Web admin to install. One of the reasons that CAPTCHAs have spread like kudzu, I suspect, is that they're so easy to implement—in some cases, as simple as checking a box on a site that helps you set up an input form. The more a bot-fighting algorithm can insinuate itself behind the scenes, the better. In the meantime, we'll all have to keep debating the eternal question: Is that a W, or is it a V and an I attached at the hip?
Chris Wilson is a Slate contributor.