Yes, Ill Matty You
How your cell phone's autocorrect software works, and why it's getting better.
Autocorrect gets no respect. Every day, you dash off dozens of messages on your mobile phone, and most of the time, you do it wrong—you mistype, misspell, or make some other kind of error that's bound to cause you great embarrassment. In the vast majority of cases, your phone steps in to save the day. Thanks to the genius of autocorrect, you can appear fully literate even when you type "im ar thw store," "thats so fibby," or "yes ill matty you."
But no one ever thanks autocorrect. Instead you focus on the few instances in which your phone, overwhelmed by your errors, makes a mistake of its own. True, some of these are spectacular: The iPhone turns "heard about garys internship at the whitehouse?" to "Heard about farts internship at the whorehouse?" On the Motorola Droid, you might aim for "mmm, I donno about that restaurant" but get, "Mommy, I donno" instead. The Web abounds with such gaffes; David Pogue's readers recently compiled a hilariously comprehensive list. Most errors, though, are relatively prosaic—the most common one I experience on the iPhone is its insistence that hell should be he'll. (That explains my recent preference for the schoolmarmish exclamation "What the heck?")
Perhaps due to the thanklessness of the job, nearly all the mobile phone companies I contacted about autocorrect were reluctant to discuss the software. Apple, Google, Microsoft, Research in Motion, and HTC all either did not respond or declined requests for interviews. Surreptitiousness seems to be the operating philosophy here: "You do your best not to be noticed," says Scott Taylor, the vice president of mobile solutions at Nuance, the one software company that was happy to talk about how your phone turns hapless tapping into something resembling readable English. Nuance makes T9, one of the oldest and most popular mobile text-entry systems. The software—which is often customized by handset makers, and sometimes doesn't carry the T9 branding—has been bundled with more than 4 billion phones. In its earliest incarnation, T9 was simply a way to enter text via a nine-digit numeric keypad. More recent versions automatically correct input from full-on QWERTY keyboards, and it can even recognize handwriting from styluses.
The basic algorithm behind autocorrection software like T9 is pretty simple. The system is essentially the same as a word processor's spell checker—as you type, the software checks each word against a built-in dictionary, and it suggests alternatives when it doesn't find a match. Many phones will also try to predict what you're going for and suggest a word before you've finished typing it.
There are two difficulties in this process, Taylor says. The first is building the correct dictionary. The phone's list of words has to be both comprehensive and well-targeted for its audience, stuffed with colloquialisms that a modern mobile user might employ. The second problem is creating an accurate "language model," the system that determines which words to suggest. If a user types in fecer, did he mean fever or feces? The right answer depends on the context and the user—if you were e-mailing your boss about your absence from work, you'd be going for the former, while if you were a film critic who'd just attended the The Last Airbender, you'd probably want the latter. The more sophisticated the autocorrection system, the more of these contextual factors get taken into account when suggesting alternatives.
The most obvious way to build a dictionary suitable for phones would be to collect and analyze a large sample of words that people actually type in to their devices. But privacy policies prohibit this sort of analysis, Taylor says. Indeed, many phone-based autocorrect systems aren't tied to the Web—they don't automatically learn new words or find more timely alternatives to old words, in the same way that Google's search-engine spell checker does. Instead, autocorrection systems are usually seeded by a large body of text—what linguists call a "corpus"—that's made up of articles from the popular media. "We analyze those for things like the structure of the language, frequency of word use, and other factors, and then we create this language model," Taylor says. The word-suggestion algorithm also considers the layout of your keyboard in order to predict which key you meant to hit when you mashed several of them at the same time.
Most autocorrection systems—including on the iPhone, Android, BlackBerry, and T9–also incorporate some kind of learning behavior. For instance, they'll pay attention to when you recorrect a corrected word, and learn never to offer that faulty choice again. They'll also note the proper nouns in your address book and avoid suggesting alternatives for those. T9 and Google's Android will also let you add your own words to the phone's dictionary. (The iPhone also allegedly has this option, but I haven't been able to get it to work.)
While these efforts are helpful, I don't think I've ever encountered an autocorrect system that didn't bother me in some way; I couldn't tell you, for instance, whether the iPhone, Android, or BlackBerry offers a better system, because they're all pretty similar, and I mainly remember their goofs, not their successes. Indeed, I often find myself wishing for deeper autocorrection intelligence—instead of merely paying attention to what I re-correct, why can't the phone look at the context of my sentence, or my entire conversation, in order to guess what I'm going for? You try to type "meth addicts" into your phone, but it turns it into "method addicts." Certainly it's possible that you were referring to actors who couldn't get out of character, but shouldn't the phone infer that that's not likely—that in the overwhelming majority of cases, the word preceding "addicts" in that sentence should be "meth," not "method"?
Taylor says that these more advanced methods may be on the horizon. As phones get faster and can store larger dictionaries, their autocorrection systems will be able to take a deeper, more meaningful look at your sentence before offering a word choice. * The other promise is crowdsourcing—if phones begin to base their suggestions on what other people are typing on the Web, they can not only become better at correcting your misspellings, but they might also be able to predict full phrases or sentences that you're aiming for.
Imagine you start an e-mail with the subject line "Sick today." As soon as you begin writing the first couple of words, "I'm not," your phone—which knows that it's a Monday, that you've been partying all weekend, and has seen other people type such messages before—offers a suggestion: "I'm not feeling well today." The model is not that different from Google Suggest—except you'd see suggestions in e-mail, text messages, and everywhere else in your phone. Taylor notes that there are substantial privacy concerns with this approach—you would essentially be sending everything you type to servers in the Web—and phone makers would likely incorporate them only on an opt-in basis, if at all. Still, some users may be willing to make that trade-off. "We have some prototypes running here, and it's funny and freaky to watch those demos—at how good the phone can be at predicting what you mean," Taylor says.
One more note about autocorrection: Pretty much the first thing we all do when testing out a text-entry system is look up profanity. Does the phone offer suggestions for the word fuck? Will it suggest shit when you type in sgit? In most cases, the answer is no. "Our linguists spend a fair amount of time monitoring the current state of the art in objectionable words to make sure they stay out of our language models," Taylor says. "We know that we can put words in people's mouths, and we want to be careful not to put the wrong word." So perhaps political correctness explains why the iPhone will correct "hell" to "he'll" but not "well" to "we'll." To which I say, WTF?!
Correction, July 14, 2010: Originally this article incorrectly stated that cell phones store a "corpus" of words. While a corpus of text is used to seed to phone autocorrection systems, phones themselves have a dictionary. (Return to the corrected sentence.)
Farhad Manjoo is Slate's technology columnist and the author of True Enough: Learning To Live in a Post-Fact Society. You can email him at email@example.com and follow him on Twitter.