I have in my possession two tiles from a prototype of the game that would become Scrabble. They were crafted by architect Alfred Butts circa 1938. They are made of plywood with the letters stenciled in India ink. Their point values are handwritten on tiny squares of paper and glued beneath the center of each letter. My Q is worth 10 points. My X is worth six.
Any Scrabble player can tell you that the X is actually worth eight points. But as Butts was creating the game, in a fifth-floor walkup in Queens, he tinkered—with the layout of the board, with the total number of tiles, with their distribution, and with their respective point values. “It’s not hit or miss,” Butts said long afterward. “It’s carefully worked out.”
Seventy-five years later, Butts’ carefully worked out point values are under attack. Late last month, a University of California–San Diego, cognitive science postdoc and casual player named Joshua Lewis conducted a computer analysis to recalibrate Scrabble’s letter values based on the game’s current lexicon. Lewis reposted his findings to Hacker News, and they were picked up by Digg and went viral. Around the same time, Sam Eifling, writing for Deadspin, asked a programmer friend to do the same. Both were inspired by the fact that while the language had changed dramatically from the time Butts performed his calculations, the game of Scrabble had not.
It’s a fair observation. Since Scrabble was adopted in chess parlors in New York in the 1950s, competitive players have dissected its strategic quirks. One early realization was that short words have outsized value, so players scoured the preferred source (the now-defunct Funk & Wagnalls Standard College Dictionary) and compiled lists of two- and three-letter words. They also recognized that the most common letters showed up in a lot of words, so they recorded and memorized seven- and eight-letter words—ones that would earn the 50-point bonus for using all seven tiles at once—that contained A, E, I, N, R, S, and T, among other single-point letters. You didn’t need a computer to see that the Q, though worth the most points, was a pain in the rack but the Z not so much.
Since the publication in 1978 of the Official Scrabble Players Dictionary, a compilation of several standard college dictionaries, the game’s word list has grown by tens of thousands of words. From a playing vantage, the addition of QI (a Chinese life force) and ZA (short for pizza) in the last lexicon update, in 2006, were game-changers. Players feared the new words would cheapen Scrabble, boosting scoring and elevating the role of chance. It didn’t happen. The Q became less of a hindrance, a slightly fairer tile than before, and players adjusted strategy to account for the new gimmes.
That need to adjust validates Lewis’ and Eifling’s suspicion that the values assigned to letters aren’t in perfect harmony with the frequency of their use in English or in its narrower subset, the Scrabble word list. The two approached the problem differently. Eifling and software developer Kyle Rimkus totaled the number of letters in Scrabble-eligible words (1.58 million), isolated the frequency of each letter, and then calculated how overvalued or undervalued each letter was compared to its existing point value. Lewis’ approach was more complicated. He weighted letters not only by appearance in the Scrabble lexicon but also by the frequency with which they appear in words of different lengths (with emphasis on two-, three-, seven-, and eight-letter words) and by their ability to “transition” into and out of other letters.
The findings don’t differ much. In both analyses, the values of about half the letters change by one or two points. One or the other found that B, C, F, H, K, M, P, X, Y, and Z are overvalued, which makes some intuitive sense. For instance, the X (eight points) and the Z (10) can be easy money, especially since they occur in a number of short words; bumping them down to six points apiece is a logical move. Similarly, the H was set by Butts at four points, but it now appears in nine two-letter words and combines beautifully with other letters, while the M appears in 12 two-letter words. Living-room players detest the C, but they haven’t studied seven- and eight-letter “bingos,” in which C’s abound. The clunky U and V, by contrast, are undervalued—ratcheting them up to two points and five points respectively seems reasonable.
While the media pounced on the story (I joined in), the Scrabble community has been largely unmoved. Why? Several reasons. One, the game’s owners, Hasbro in North America and Mattel overseas, aren’t changing anything. Two, such proposed rejiggerings aren’t new. Three, players understand that variances, in letter values and tile distribution (too many I’s, the Q without a U), are part of the game and strategize accordingly. Four, there are other, arguably more sophisticated ways to assess tiles values. Five, and most important, adjusting any core variables would create a completely different game requiring different strategies. “It's basically saying, Let’s change the game to make a new game,” Jason Katz-Brown, a software engineer who co-wrote Scrabble’s best computer player, Quackle, told me.
Let’s examine the potential effects of the “new” values. In both Lewis’ and Deadspin’s calculations, 10 tiles decrease in value, four increase, and 12 don’t change. More tiles fall in a band of two to five points, bringing the utility of each closer together. The consensus of my math-brained Scrabble colleagues is that this would be like a dose of lithium for the game, flattening scoring and eliminating swings that keep games interesting. Big but still-reasonable values for some tiles, especially the X and Z, are good because they improve the odds of comebacks, make tile positioning a compelling strategic consideration, and give players with lesser word knowledge a slightly better chance against those who are booked up. “Except for the Q, Josh [Lewis] basically squashes the volatility,” said Eric Chaikin, co-director of the Scrabble documentary Word Wars. “His values take the fun out.”
Quackle co-writer John O’Laughlin, a software engineer at Google, said the existing inequities also confer advantages on better players, who understand the “equity value” of each tile—that is, its “worth” in points compared with the average tile. That gives them an edge in balancing scoring versus saving letters for future turns, and in knowing which letters play well with others. “If we tried to equalize the letters, this part of the game wouldn't be eliminated, but it would definitely be muted,” O’Laughlin said. “Simply playing the highest score available every turn would be a much more fruitful strategy than it currently is.”
In response to Lewis’ findings, John Chew, co-president of the North American Scrabble Players Association and a mathematics doctoral student at the University of Toronto, wrote that Scrabble has always had an “intentional” imbalance between the face value and the equity value of the letters. Whenever the game’s lexicon changes—a fifth edition of the Scrabble dictionary is due in 2014—players adapt. “The tile values were chosen to make an interesting game, not to accurately represent the statistical properties of a particular lexicon,” he wrote.
In fact, for Alfred Butts, the face value of the letters was secondary to their distribution. That’s because Butts’ original research was for a word-formation game called Lexiko, which didn’t involve a board and didn’t assign points to individual letters.* In the early 1930s, on spreadsheets containing 26 rows, one for each letter, Butts tallied tens of thousands of letters from the pages of the New York Times, the New York Herald Tribune, and the Saturday Evening Post. Then he compared the frequency with which the letters occurred, both on their own and as part of words of particular lengths. From those tabulations, he determined how many of each letter should be included in his game.
Butts manufactured and sold a few hundred sets of Lexiko from his Queens walkup, but Milton Bradley and Parker Brothers, among others, rejected it. In 1938, Butts decided to add a board. That’s when he assigned values to letters roughly corresponding with their frequency. Butts’ files, which I read while researching my book Word Freak, contain pages of spreadsheets and notes about letter frequency and tile distribution. But there’s little about how he settled on the point values for his new game, which he named Criss-Cross Words. He did experiment, though. I found one iteration in which the Z was worth nine points; the K and V six; and the B, F, and W five. And there’s the plywood evidence of my six-point X.
My conclusion: Butts coupled intuition with direct observation of the game in action—he tested it on his wife and their friends—to arrive at values that he felt balanced equity and volatility. (Not that he would have used those words.) So, for instance, while Butts might have suspected the X was logically worth six points, he understood that eight would make the game more exciting. Similarly, while Butts included a lexicographically reasonable seven S’s in Lexiko, he knew the letter was so valuable for pluralizing words that he should reduce the number in his board game, which he did, to four.
(As with Lexiko, Butts made and sold Criss-Cross Words from his apartment with no success. In 1948, he sold its rights to a small businessman, James Brunot of Newtown, Conn., who renamed, redesigned, and manufactured the game. Five years later, Scrabble became a national sensation. Butts received royalties until 1976. He died in 1993.)
Lewis and Eifling performed a statistical exercise that can help lay players reconsider particular letters based on the overall lexicon. “Get rid of your J and your Q as quickly as possible, because they’re just damn hard to play and will clog your rack,” Eifling concluded, correctly. A deeper dive would involve valuing each letter based on how it is actually used in Scrabble, because words of certain lengths or lexicographic properties might be more or less useful when playing the game. Lewis’ program didn’t consider, for instance, longer combinations of letters, or the importance of four- and five-letter words. “The game isn’t about drawing words at random from the dictionary,” Chew wrote. “It’s about actually finding places to play them on the board.”
Chew’s and O’Laughlin’s approach to reconsidering the face values of the tiles involves adjusting their equity values. Equity value in Scrabble is similar to advanced baseball stats that compare players to an “average” replacement. Scrabble theorists have been calculating this stat—let’s call it VORT, or Value Over Replacement Tile—since the 1980s. “The Barry Bonds of the Scrabble set,” O’Laughlin said, is the blank, with a VORT of about 25 points. That means a blank plus six random tiles will likely net 25 more points than seven random tiles. At the other end, the Q has the lowest VORT, about -7 points.
To perform a revaluation using VORT, Chew and O’Laughlin would shift the equity value of positive tiles downward and negative tiles upward, have Quackle play thousands of games against itself using those new values, and keep adjusting the values and making Quackle play until the equity of each tile approaches zero. At that point, the tiles could be given corresponding face values, which would be based on how Scrabble is played by the world’s best player.
Chew and O’Laughlin said they aren’t interested in conducting that analysis because it would be time-consuming and wouldn’t add to the understanding of the game. But they have begun trading emails with Lewis, who told me he’s eager to work with the Scrabble quants to learn more and find ways to study the statistical properties of the tiles further. Which shows how, in contrast to the way this story is being framed in the media—Scrabble Controversy!—it’s really just a bunch of curious living-room players and super-smart computer guys swapping ideas about game theory and analysis. Lewis isn’t demanding that the game’s manufacturers change anything, just performing an interesting statistical and intellectual exercise.
Still, he does believe, as he wrote in a new post on Wednesday, that tweaking Scrabble’s tile values would “keep the intentional luck in the game and remove the unintentional luck that has crept in over time as the use of English has changed.” He’s perplexed as to why competitive Scrabble players wouldn’t favor mediating some of the luck, because that might make the results of games and tournaments more accurate.
I can answer that. Because Scrabble players understand that the game’s inequities are on the margins, and that figuring them out is a crucial part of learning to play well. And we respect, and are in fact awed by, how Alfred Butts, without the benefit of computer programs and language databases, came damn close to nailing both letter distribution and letter valuation, and in the process created a game that exquisitely, often maddeningly, balances skill and luck. Making the X worth six points won’t improve on that.
Correction, Jan. 19, 2013: This article originally misspelled the name of Alfred Butts’ first word game, Lexiko.