Hear Me

Hear Me

Hear Me

Inside the Internet.
Sept. 12 1997 3:30 AM

Hear Me

Rating the new voice-recognition software.

For several years I've had acute tendonitis in both my hands, making it painful to type for long stretches. Two months ago, I wrote the previous "Webhead" by slowly dictating to my computer. I was using IBM's VoiceType software, which required me to speak clearly and pause briefly between words. My wife was pleased with this development because, in normal speech, I enunciate so poorly and speak so quickly that I often run whole sentences into unintelligible garble. She hoped I would be forced to learn better speaking habits. But two new software packages--IBM's successor to VoiceType, called "ViaVoice" ($199), and Dragon Systems' Naturally Speaking ($349)--use new "continuous speech" technology, meaning that I can speak as quickly and poorly as I normally do, and my computer will still understand me. Sorry, sweetie.


The two programs work similarly. They break your speech into pieces of sound, which are first processed through a customized database of your voice, speech patterns, and accent, and then compared against a word list. Each comes with a fairly comprehensive vocabulary (250,000 words for Dragon; 200,000 for IBM), but keeps the working vocabulary much smaller (30,000 words for Dragon; 22,000 for IBM). This is because recognition is easier when you can limit the options. The words you use when writing a letter to Mom are different from those you use when writing a trashy novel (well, they ought to be, anyway). By excluding special-purpose words used in the latter (like "heaving" and "bodice"), the software ensures that a missive to Mummy will yield fewer potentially embarrassing errors. Both programs allow you to adjust the working vocabulary for any particular task by referring to similar documents such as, say, a folder of previous letters home. Dragon handles this somewhat more elegantly than IBM, making it easier to switch back and forth between different vocabulary sets. Dragon also adjusts the vocabulary as you speak, noting which words you use and how often you use them.

Before allowing you to start dictating, Dragon forces you to acclimate the software by reading aloud canned text for about 20 minutes. This is long and tedious enough that Dragon cleverly gives you a choice of texts: interesting (Arthur C. Clarke's 3001: Final Odyssey) or funny (Dave Barry's Dave Barry in Cyberspace). IBM, which allows but does not require this process, has you read what amounts to marketing materials and the program's manual.

The software inevitably makes errors that you can correct, helping it to learn more about your particular vocal patterns. It's easy to feel superior about your computer's idiotic attempts to understand simple words, but IBM's software plays back your recorded voice every time you correct an error--a bracing experience. I'll be damned if I don't pronounce "vendors" just like "offenders," and if my rendition of "environment" sounds uncannily like "in dark," as the software insists it does.

Which has better recognition? My informal trials showed no clear winner. IBM seemed to do better off the bat, even without the preliminary training, but Dragon seemed to learn better, in line with its more flexible vocabulary. Dragon's knowledge of the English language seemed smarter. For example, Dragon was more likely than IBM to correctly distinguish "its" from "it's." However, Dragon sometimes seemed too smart, writing "NW" when I said "Northwest" and choosing "Seoul" instead of "soul." What's more, I could find no way of convincing it not to do that. Both programs failed to understand that "the store's goods" shouldn't be written as "the stores goods," even though simple, natural language rules should have made that clear.


Just for fun, we decided to see how much vocal variation the programs could handle. I said the same phrase normally, very quickly, in a radio announcer's voice; singing; in a redneck accent; and while eating Famous Amos® cookies (my wife will attest that the latter is a realistic scenario). You can click the audio icon to hear how I sounded, then see what Dragon and IBM made of me.

Dragon has the clear edge when it comes to hands-off operation, particularly important for the typing-impaired like myself. A document can be dictated and completely edited using only voice commands. You can move the cursor around using statements like, "Move to end of sentence." If you say "Fred" but the computer types "dead," you can highlight the mistake by saying, "Select 'dead' "; bring up the correction dialogue by saying, "Correct that"; and then spell out the correct word letter by letter. When it all works perfectly, this is incredibly helpful and intuitive. But unfortunately, recognizing voice navigation is as unpredictable as recognizing words; and so, instead of moving to the end of the sentence, you may find that you have just typed "moved to end of sense." Fortunately, you can undo it by saying, "Scratch that." IBM's software forces you to use the mouse to first double click on the offending word and then type the correction.

Dragon also allows you to spell out words using natural spelling (e.g., "B-I-L-L"), where IBM forces you to use the "military" phonetic alphabet. I felt silly sitting there saying "Bravo India Lima Lima." IBM's program does understand the natural spelling of certain abbreviations, namely "IBM," which was useful in dictating this review. Dragon's way is again more intuitive, but when it stubbornly wouldn't distinguish "B" from "V," I found myself thinking, "Bravo for IBM."

Both packages come with customized versions of Windows 95's WordPad, a simple but functional word processor. However, IBM's software also works inside Microsoft Word, a huge advantage for anyone who owns and uses that word processor. Word's built-in automatic spelling and grammar checkers are also useful hints that an error in recognition has occurred. Neither has the ability to use your voice to control Windows or any other applications.

IBM's software can be used by multiple users, each with his or her own customized speech database. You'd have to pay for Dragon's Naturally Speaking Deluxe--a pricey $649--to get the same privilege. On the other hand, Dragon's requires only a 133Mhz Pentium, while IBM's insists on a faster 150Mhz Pentium with MMX. Both require a minimum of 32M on Windows 95 (48M on Windows NT), and neither is currently available on the Macintosh.

Overall, Dragon's software, which is more expensive, feels more polished and feature-rich. But IBM's more utilitarian approach, which attempts less but fails less, often won me over, and was the one I used to dictate this review. Both programs could use a little more maturity, suffering from minor bugs that forced me to reboot and (in the case of Dragon) to reinstall.

Will I use either of these to write in the future? The recognition is good enough. The main problem is that I'm not comfortable dictating. In the past 20 years I have learned to think with my fingers, constantly editing and re-editing as I type, and thinking out loud doesn't come easily. And that's a problem that the best voice-recognition software can't fix. I guess I'll have to wait for Microsoft Thought Recognizer®.