In the run-up to Jeopardy's Watson competition last month—in which IBM's digital brain bested two human champions— Slate published my analysis of Jeopardy clues, answers, and strategy, all based on J-Archive.com's remarkable database. Since then, I've received a number of great questions and requests for more analyses. Let me answer some.
Commenter AJS posted one of the most interesting questions to investigate, about Jeopardy's "Pavlovian" clues. The game tends to link certain terms, regardless of category or context. "Polish composer is Chopin, Chinese architect is Pei, nonsense poet is always Lear, bachelor president is Buchanan," wrote AJS, who wondered which such pairings appeared most frequently on the show.
A small cadre of men and women have achieved such prominence in their field or genre to be synonymous with it (on Jeopardy!, at least). The only silversmith you need to know is Paul Revere; he was the answer to 14 of the 15 clues that mention that word. "British sculptor" has been Henry Moore each of six times. And Alexander Calder was the answer to all 11 clues containing the word mobiles. Twenty-two of 28 clues mentioning cubist are looking for Pablo Picasso. When Trebek says "contralto," you should say Marian Anderson—you'd be right seven of 11 times. And, yes, James Buchanan has been the "bachelor president" all nine times, which makes sense since he's the only bachelor president. (Chinese architect, curiously, doesn't appear in any games transcribed at J-Archive.)
Six of six times each, "soup company" was Campbell's, "sports drink" was Gatorade, and "pungent bulb" was garlic. The only "Louisiana city" you need to know is New Orleans, the answer to 18 of those 19 clues (Baton Rouge was the other); Prince Edward Island is 10 for 11 as the "island province."
Commenter Jason asks whetherthe percentage of questions answered correctly has trended up or down over the years. Not remarkably in either direction. From 1997 through 2010, the years with at least 6,000 (and often closer to 13,000) answers transcribed on J-Archive, the proportion of clues solved by at least one contestant ranges from roughly 82 percent to 88 percent, though with no clear trend. The proportions in the five years between 2006 and 2010: 82.5 percent, 84.8 percent, 85.4 percent, 85.0 percent, and 84.3 percent.
Rob Iracane e-mailed with this question, another of my favorites: "How often do none of the three contestants get the correct response in Final Jeopardy? There was a recent streak [Jan. 12, 13, 14, 17, and 18] of five consecutive triple stumpers, and my friends and I were trying to see how rare that was."
Rare, but not as rare as you might think. Just 49.2 percent of all Final Jeopardy guesses have been correct. About 20 percent of three-player Final Jeopardies (that is not counting final rounds when a contestant has been eliminated from the last round because he has a negative score) were triple-stumpers. Thirty-two percent were answered correctly by only one player, 29 percent were answered correctly by two players, and 19 percent were solved by all three.
If we assume that triple-stumper Final Jeopardies are randomly distributed, there's roughly a one in 3,000 chance that any given five-episode run of Final Jeopardies will all end in triple-stumpers. But the odds of that five-game streak ever happening in the history of Jeopardy are much higher.
To figure this out, I created a basic simulation of 100,000 Jeopardy "series." Each series contained 6,100 "episodes"—roughly the number that have aired since 1984, when Trebek started hosting the show. Each "episode" had a 20.10648 percent chance of ending in a triple-stumper. When I ran the simulation, I found that five-game triple-stumper streaks appeared in about 80 percent of the series, meaning that such a streak is roughly four times as likely to happen as not to happen. (You'd have even odds after about 2,650 episodes.)
In slightly fewer than half of the simulations, more than one five-game streak occurred. In the J-Archive data, there are a handful of four-game triple-stumper streaks, but no five-game streaks other than the one Rob pointed out. The J-Archive data, however, are incomplete; if you can recall another five-game streak, tell us in the comments.
Commenter Barry Forer asks, "I've always wondered how well contestants can predict whether they will get a Daily Double correct or not [...] are contestants who are more confident more likely to be correct on a Daily Double than those who are less confident?"
For consistency's sake, let's limit our focus to episodes that have aired since 2002—the first year all episodes used the dollar-values still in place today—and to the first round, when wagers are presumable influenced more by confidence than accumulated score. The most popular first-round Daily Double wager is $1,000, accounting for more than one-third of wagers. (Even if you have $0, you're allowed to bid $1,000.) Contestants who made this baseline wager, guessed correctly 68 percent of the time. At lower wagers, players were right only 65 percent of the time; above $1,000, they were right 70 percent of the time. The most common sub-$1,000 wager is $800, after which contestants guessed correctly just 56.5 percent of the time.
The largest Daily Double wager in the J-Archive database came in the episode that aired on September 6, 2002. Jake Maeroff ("a law student from Plantation, Florida") landed on a second-round Daily Double in the "Saints on a Map" category, and wagered $14,000. His clue: "Jesse James was terminated in this city, once home to a terminus of the Pony Express." Maeroff answered correctly—James was killed in St. Joseph, Mo. Maeroff was playing against Myron Meyer, whose one-game winnings in the previous episode set a record (broken several times since) at $50,000. In that game, Meyer wagered a then-record $13,000 on a Daily Double—and also answered correctly.
Commenter Amanda Benjamin asks, "What is the most money a contestant could possibly win in one game?" A set of ambitious commenters chimed in: You'd have to answer all the questions correctly, find all the Daily Doubles—which would all have to be in the top row—last in each round, and bid all your cash in Final Jeopardy. Commenter Dunk, who solved this equation most convincingly, writes: "The contestant would have $35,600 after Jeopardy, $283,200 after Double Jeopardy, and $566,400 after Final Jeopardy."
But, as Amanda notes, even the most brilliant player could never hope to achieve this score, since Jeopardy's creators would never place so many of the Daily Doubles so close to the top of the board. What's a more realistic score? It's hard to say, but here's a piece of the puzzle: The highest placement of all the Daily Doubles in the J-Archive database was all three in the second row; this happened on Sept. 26, 1997, and Dec. 28, 2006. For these boards, the highest possible score is $556,800.
As a few readers pointed out, Watson's very first pick in his first televised match against Ken Jennings and Brad Rutter was the leftmost cell in the fourth row from the top—precisely the board position I had identified as the most likely place to find a Daily Double. Of course, Jeopardy taped the episode long before I wrote the piece, and IBM's programmers are orders of magnitude more skilled than I am, so I won't take any credit. But I did ask the Watson team about the coincidence.
Daily Double distribution was indeed a "significant factor" in Watson's selection, wrote team leader David Ferrucci, but so were "other factors such as its predicted accuracy in each category and using lower-value clues to learn with less risk how the category works."
In case you were curious, here's a chart showing the full distribution of Daily Doubles: