The Ultimate Crossword Smackdown

Who writes better puzzles, humans or computers?

July 12, 20064:41 PM

Listen to an audio interview with the author here, or sign up for Slate’s free daily podcast on iTunes.

When people find out that I write crosswords for a living, they often ask, “Can’t you just write crosswords using a computer program now?” After I finish crying—some people really know how to hurt a guy—I respond that, yes, computers play a role in crossword design these days. There are three parts to constructing a crossword: coming up with a theme, filling in the grid, and writing the clues. Until artificial intelligence makes some serious leaps, humans will do the heavy lifting when it comes to theme creation and clue writing. But the second part, filling grids with words, is quite computer-friendly. It’s here that machines have revolutionized the construction of crossword puzzles.

Early efforts in computer-aided crossword design spat out marginal little grids filled with obscure words. But in the late 1980s, Boston computer programmer Eric Albert had an insight while tangling with this problem: A computer could generate high-quality crossword puzzles if each entry in its word database were ranked on, say, a scale from one to 10. An excellent puzzle word like JUKEBOX (gotta love all those high-scoring Scrabble letters) might be worth a nine or 10, while a hacky obscurity like UNAU (a type of sloth that has appeared in crosswords more times than it’s been spotted in real life) would be a one or a two. By ranking the words, the junk would be left out and just the good stuff would go in.

This is how computer-aided crossword design still works today. The database operator has to place theme entries and black squares logically in the grid; this placement is done intuitively, based on what the human thinks the computer can handle. After the computer fills in the blanks, the human operator will likely do some further tweaking, such as marking off a corner of the grid he doesn’t like so the computer can take another shot at it.

The best databases belong to Frank Longo and Peter Gordon, who work in the service of the two best daily crosswords in the country, the New York Times and New York Sun. Longo’s database contains about 720,000 words—pretty much every word or combination of words that’s ever been used in a high-quality puzzle. Longo and Gordon use these massive word libraries in concert with their intuition to craft wonderful, tricky puzzles that are enjoyed by multitudes of crossword enthusiasts.

The number of humans who can write a better puzzle than the top databases is small and dwindling. New constructors who come into the field are likely to use computer assistance right off the bat. I’m one of the few holdouts—I don’t use a computer database to fill my grids. When am I going to be replaced by a computer? Are my days already numbered?

To answer that, I put together the Ultimate Crossword Smackdown: two top human constructors vs. Longo’s and Gordon’s databases. I humbly chose myself, the author of more than 2,500 crosswords in the past 20 years, to represent humanity. I also selected Santa Clara University math professor Byron Walden, one of the best constructors in the country. The panel of judges will be familiar if you’ve seen the documentary Wordplay. Ellen Ripstein, Tyler Hinman, Trip Payne, and Jon Delfin are all former champions of the American Crossword Puzzle Tournament. The fifth judge is esteemed Los Angeles Times crossword editor Rich Norris. They will evaluate the puzzles based on the quality of the words used, the amount of “crosswordese”—UNAU-type words you never see anywhere but the puzzle page—and what can best be described as the puzzle’s “overall feel.” These folks know a good crossword when they see one.

We all start with a standard 15-by-15 square crossword puzzle grid. I’ve created the puzzle’s theme (famous people with the initials B.B.) and placed the three 15-letter theme entries, which are unchangeable and unmovable.

Longo and Gordon feed the theme entries into their computers, then place black squares into the grid themselves based on their sense of what pattern their databases can handle.Walden and I likewise scatter a few black squaresaround the grid, and we’re off to the races.

My strategy is to fit as many X’s, Q’s, J’s, and Z’s in as possible. Hopefully, I won’t get penalized for the crosswordese that creeps in to accommodate those high-value letters.

Overall, I’m fairly happy with my grid. The puzzle has some lousy entries like ATO (as in “From ___ Z,” a legit-but-dull crossword standard), EEEEE (“Widest male shoe size”), and INE (“Chemistry suffix”). There are some really nice entries, though, like THX (short for “thanks”) and ZZ TOP and ZUCCHINI and SHORT I’S (“Some vowels”).

And here are the other three entries. For now, I’ll keep you in the dark about which one emerged from the mind of Byron Walden and which ones came from the computers. If you’d like to play along, give four points to the puzzle you think is best, three to your second favorite, and so forth.

GRID A:

GRID B:

GRID C:

The judges’ opinions were pretty scattershot. Each of the four participants got at least one last-place vote, and three of the four contestants got at least one first-place vote. Most amazingly, two of the judges ranked the four grids in the precise reverse order of each other. There’s clearly a sizable dose of subjectivity in assessing the quality of a crossword puzzle, especially at this, ahem, elite level.

The final results: GRID A prevailed with 14 points. Close behind were GRID B and GRID C, each with 13 points. I brought up the caboose (sigh) with 10 points.

In GRID A, the judges liked INJUN JOE, LEMON SOLE, I DIG, and ARTICLE V. IN TUXEDOS, though, was slammed as a contrived phrase. That didn’t keep the puzzle from taking two of the five first-place votes.

In GRID B, judges liked DAN MARINO, EAR CANDY, and EZINE. They weren’t partial to RAT ISLAND, which was deemed a little obscure.

GRID C got props for LEE J. COBB, N SYNC, and CRASH PAD. But POURBOIRES, a French word meaning “tips,” as in a restaurant or bar, was cited by two judges as being outside their ken.

My grid drew the praise and condemnation I expected: one first-place vote but two last places as well. One judge wrote: “This puzzle, to me, screams ‘trying too hard.’ “

And now for the big reveal. GRID A (14 points) was written by Byron Walden, GRID B (13 points) by Frank Longo’s database, and GRID C (13 points) by Peter Gordon’s database. That makes the final score: Computers 26, Humans 24. I tip my hat to our cyber-champions but will not hesitate to unplug them if they get too smug in victory. They can’t get too smug anyway, because Byron’s top score is a reassuring note for human constructors.

No sample test proves anything with certainty. Different competitors, different judges, a different day—anything could’ve pushed the results one way or the other. Still, the computers won this battle on points, and they have other advantages that make the future ominous for human constructors. Their databases are getting larger every year, and they work much more quickly than we do. A typical human constructor may take between two and five hours to fill a challenging 15-by-15 grid. Longo’s and Gordon’s databases can sometimes do it in under a minute.

Will I be jumping onboard the database ship? It may be a poor career move, but I will not. I’d rather figure out how to win a rematch than join the other team.

This piece has been adapted from Gridlock: Crossword Puzzles and the Mad Geniuses Who Create Them. You can buy the book here.