Google speech recognition software for your cellphone actually works.

Innovation, the Internet, gadgets, and more.
April 6 2011 4:36 PM

Now You're Talking!

Google has developed speech-recognition technology that actually works.

Google speech recognition.

If you've got an Android phone, try this: Hit the microphone icon on the home screen, then ask, "How many angstroms in a mile?" Use your normal speaking voice—don't speak slowly or strain to over-pronounce "angstrom." So long as you have a good Internet connection, the phone shouldn't take more than a second to recognize your question and shoot back a reply: 1.609344 × 1013.

This works with all kinds of queries. Say "what's 10 times 10 divided by 5 billion" and the phone will do math for you. Say "directions to McDonald's" or read out an address—even a vague one like "33rd and Sixth, NYC"—and Android will pull up a map showing where you want to go. It works for other languages, too: Android's Translate app (also available for the iPhone) will not only convert your English into spoken French (among several other languages) but also has a "conversation mode" that will translate the French waiter's response back into English. And if that's not enough, Android lets you dictate your e-mail and text messages, too.

If you've tried speech-recognition software in the past, you may be skeptical of Android's capabilities. Older speech software required you to talk in a stilted manner, and it was so prone to error that it was usually easier just to give up and type. Today's top-of-the-line systems—like software made by Dragon—don't ask you to talk funny, but they tend to be slow and use up a lot of your computer's power when deciphering your words. Google's system, on the other hand, offloads its processing to the Internet cloud. Everything you say to Android goes back to Google's data centers, where powerful servers apply statistical modeling to determine what you're saying. The process is fast, can be done from anywhere, and is uncannily accurate. You can speak normally (though if you want punctuation in your email, you've got to say "period" and "comma"), you can speak for as long as you'd like, and you can use the biggest words you can think of. It even works if you've got an accent.

How does Android's speech system work so well? The magic of data. Speech recognition is one of a handful of Google's artificial intelligence programs—the others are language translation and image search—that get their power by analyzing impossibly huge troves of information. For the speech system, the data are a large number of voice recordings. If you've used Android's speech recognition system, Google Voice's e-mail transcription service, Goog411 (a now-defunct information service), or some other Google speech-related service, there's a good chance that the company has your voice somewhere on its servers. And it's only because Google has your voice—and millions of others—that it can recognize mine.

Advertisement

Unless you've turned on Android's "personalized voice recognition" system, your recordings are stored anonymously—that is, Google can't tie your voice to your name. Still, the privacy implications in building a huge database of millions of peoples' utterances are fascinating—so fascinating that I'll devote my next column to discussing them. Leaving aside privacy concerns for a moment, it's undeniable that speech recognition is one of a number of programs that could only have come about because of our newfound capacity to store and analyze lots and lots of information. In some ways the future of software—and, thus, of the computer industry—depends on such databases. If The Graduate were filmed today, the job advice to Benjamin Braddock would go like this: "One word: data."

To understand why Google's stash of recorded voice snippets is necessary for speech recognition, it helps to understand the history of creating machines that can decipher speech. Late last year, I met Mike Cohen, the head of Google's speech system, in a nondescript conference room at Google's Mountain View, Calif., headquarters. Cohen is one of the world's experts in voice-recognition systems; he's been in the business for decades, and he's seen it evolve from a field dominated by linguists who were interested in computers to one dominated by engineers who are interested in linguistics.

  Slate Plus
Working
Nov. 27 2014 12:31 PM Slate’s Working Podcast: Episode 11 Transcript Read what David Plotz asked a helicopter paramedic about his workday.