This week 281 students between the ages of 8 and 15 will assemble outside Washington, D.C., for the Scripps National Spelling Bee. Over the last 10 years, it’s taken an average of 645 words and 5,680 letters to weed out the wannabes from the one who outspells them all. Looking at past trends, we can take a shot at predicting which letters and sounds will cause contestants to go home D-E-F-E-A-T-E-D and brō-kən.
Thanks to the folks at the National Spelling Bee (who sent me complete records for the last decade) and Merriam-Webster (which provided their pronunciations), I’ve been able to compile statistics on all of the words that have been spelled correctly (there are 5,042 of them) and incorrectly (1,409) during the traditional oral rounds. (I didn’t look at words that were part of the bee’s written test.) So, what’s most likely to throw a speller off?
You might suspect that longer words are more likely to trip up contestants. The two longest words in the data set were 17 letters apiece: triboluminescence and idiosyncratically, both of which sent their spellers home. But long words aren’t always so tricky. Five of the eight 16-letter words were spelled correctly, Michelangelesque and sphygmomanometer among them. And of the two shortest words to appear in the spelling bee in the last 10 years, gbo and rya, only the former was spelled correctly.
Looking at length more systematically, the number of letters in a word seems to have little correlation with spelling difficulty. Roughly half of the words in the bee have nine or more letters. These words were spelled correctly 78 percent of the time. By comparison, those with eight or fewer letters were spelled correctly 79 percent of the time.
In the first two oral rounds, which include a greater mix of weak and strong spellers, the effect of word length is more pronounced.
When you exclude the first two oral rounds and look only at the best spellers, words of nine letters of more are actually spelled correctly more often (70 percent of the time) than shorter words (65 percent).
It’s possible these statistics are the result of pure chance. Though more than 1,700 words have been spelled correctly in the third oral round and beyond, the difference between above-average-length words and below-average-length words barely misses out on statistical significance. Regardless, the fact that long words and short words are spelled correctly at roughly the same rate shows that, in general, the word pickers are doing a good job. Though word lengths can vary, ideally all words in a given round should be of the same difficulty.
If not length, what causes the most spelling hiccups? To answer this, I grouped spelling mistakes into three categories of my own design. The first is a substitution, such as spelling atrabilious as atribilious, mistakenly subbing an I for the second A. The second is a deletion: spelling ecchymosis as echymosis, erroneously removing a C. The third is an insertion: spelling vacillant as vascillant, adding an S that shouldn’t be there. Multiple mistakes were recorded if a speller, for example, had a substitution and an insertion error in the same word.
It was not possible to categorize every single mistake or every single word. For example, in one case a speller began to spell idiosyncratically as I-O and immediately realized his mistake, he finished by spelling the word as I-O-Q-R-S-Z-3-cuatro-F-L-V-R-Q. This word was tossed from the analysis, but the vast majority of the nearly 6,451 words from the last 10 years stayed in.
Most mistakes, by my categorization, were substitutions. Just short of 70 percent of spelling errors were caused by subbing in one letter for another, while 19 percent were deletions and 11 percent insertions.
More telling than the type of error are the letters involved. Proportional to how often the letter appears, J is the thorniest letter in the alphabet. Roughly 9 percent of the time, a J was incorrectly swapped out for another letter, as when jardiniere was incorrectly spelled as gardiniere. On the other end of the continuum, N was used more than 3,500 times in almost 2,900 different words and not once substituted for the wrong letter. The letter B was used 1,005 times and was only switched with the wrong letter once, when dysbarism was spelled dysporism.
While J is a tricky letter, is appears in only a small fraction of spelling bee words—about 2 percent. Last year’s winner, Arvind Mahankali, spelled 14 words onstage and never received a word with a J, Q, or Y.
Most people are forced to give up their spelling dreams because of trouble with vowels. In the substitution category, the five letters that were most likely to be missed are E, I, A, O, and Y. While these letters are common to begin with, representing 39 percent of all letters in spelling bee words, they make up a disproportionate 74 percent of all errors.
Vowels also cause trouble in the deletion and insertion categories. The most common insertion was adding an extra E. The most common deletion: giving no letter where there should have been an E.
Treating all deletions, insertions, and specific letter-for-letter substitutions as separate mistakes, I counted 140 unique error categories in the last 10 spelling bees. Below are the 30 most likely reasons a speller will hear the elimination ding. (If multiple mistakes occurred in one word, all were counted.)
To make sense of this data, I talked to Arjun Modi, a two-time National Spelling Bee participant who placed 17th in 2005. When I showed him the letters in the chart above, he offered a simple explanation: ə.
The ə is an orthographic representation of the schwa, a ubiquitous and bland vowel sound—it’s the uh in dull. Modi describes it as “the most difficult to get right, since it’s one sound and hard to pick up since it is unstressed.” What makes it particularly troubling, for spellers, is that it can take the form of every vowel. In last year’s bee, the schwa threw off spellers when it should’ve been an A (cyanophycean misspelled as cyanophycein), an E (zenaida misspelled as zaneida), an I (cabotinage misspelled as cabotonnage), an O (melocoton misspelled as melecaton), a U (kuruma misspelled as kurama) and a Y (doryline misspelled as doraline). The top three runners-up in last year’s bee were all eliminated when they used the wrong vowel to spell out the ə sound.
From the data provided, I matched 1,100-plus spelling mistakes in the last 10 years to the official Merriam-Webster pronunciations. (If there were multiple pronunciations, only the first was used). All individual characters were counted as unique sounds, with the exception of ch, sh, th, and zh, which were all treated as unique sounds. Of the more than 1,100 mistakes in the data set, 35 percent occurred on the ə sound. The next-biggest offender was s, at 8 percent, followed closely by ē (e.g., the long E in beep), k, and i.
The schwa is the most error-causing sound in terms of total mistakes triggered, but it’s also a very common sound. Does ə cause the most total mistakes because it is the most common sound or because it is likely to cause the spellers to flub? It turns out the answer is both: The schwa both causes the most total errors and causes errors at a higher rate than any other sound.
The schwa causes a mistake about 7.5 percent of the time it appears. A few other sounds come close (ī triggers errors 6.7 percent of the time, while ē comes in at 6.2 percent), but most others don’t compare. S, k, and i all cause a speller to go home less than 3.5 percent of the time, and t and l (which are both in the top 10 for total errors caused) caused an incorrect spelling less than 2 percent of the time.
When America’s top spellers line up this week, it will only be a matter of time before the first one succumbs to the ə. If they’re lucky, perhaps they won’t get a ə and they’ll be able to sneak on by. Maybe they’ll even get the word schwa—using Merriam-Webster’s first pronunciation, it’s spoken with an ä.