What is the longest word ever to appear in print in English? There's a lot of confusion and misinformation on the subject. But when I set out to write about the periodic table, I was hopeful I would get to the bottom of this mystery. Why? Because all the contenders for the longest word are chemistry terms. More specifically, carbon-based chemistry terms.
Carbon settled into a tough spot on the periodic table. To feel satisfied and content, elements need full "sets" of electrons in their outer shell. (Think of a shell as a storage container for electrons.) Elements can either steal electrons from other atoms to accumulate a full set or else shed electrons to get down to a full set. The number of electrons that constitutes a full set differs slightly from element to element; for carbon, a full set is eight electrons in its outer shell. But carbon has only four electrons in its outer shell, which means it needs to steal four other electrons to acquire a full set. Or it could fob its four electrons off on some other atom. For various reasons, both are almost impossible tasks, which leaves carbon restless and unsatisfied.
But there is another option—sharing electrons. Basically, two atoms put two or more electrons in between themselves, and each pretends the whole set belongs to it alone. It's not an ideal situation, but it's better than nothing. Carbon often loans its four electrons out to four partners simultaneously, forming bonds in all directions. And that promiscuity is actually carbon's virtue. Sharing electrons in all directions allows it to form rings and chains and complicated structures, like the structures found in living beings.
What does this have to do with the world's longest word? For a time in the 1800s, Germany dominated European science, and most chemists in the mid- and late 1800s wrote and published in German journals. The German language makes compound words like crazy, and it was only natural that this feature rubbed off on chemists when it came time to name their creations. Take proteins. Proteins are chains of amino acids, and the individual amino acids (all of which are carbon-based) have names like serine, taurine, and leucine. If you form a compound from those three, you'd call it seryltaurlyleucine or taurylleucylserine. You can see where this is going. Proteins can be made up of hundreds or even thousands of amino acids, and the names get pretty unwieldy pretty quickly.
For a long time, chemistry reference works published the full names of compounds. A company in Ohio called Chemical Abstracts Service, which prints reference books collectively called Chemical Abstracts, was the best example, dutifully publishing every letter of every long protein discovered through the 1950s. However, around 1965 the company gave up, changing its policy to discourage eye-glazing names that ran for dozens of lines. (I like to think it was a copy editors' revolt—imagine spell-checking these things!) Shorter names became the rule. After 1965, long carbon-based names stop appearing in Chemical Abstracts. So if we adopt a sensible definition of what we're after here—the longest word to appear in an English-language document whose purpose was not to set the record for the longest word ever—then the word almost certainly appeared in Chemical Abstracts around this time, since no one else was publishing words even close to this length.
Now for the confusion. Many sources list a 1,185-letter protein name (C785H1220N212O248S2) from the "tobacco mosaic virus" as the record holder. But there's also a tryptophan protein, a relative of the chemical that people think (wrongly) makes them sleepy when they eat turkey. The tryptophan protein (C1289H2051N343O375S8) runs 1,913 letters, 60 percent longer than the mosaic virus protein, and numerous sources—old editions of Guinness World Records and Mrs. Byrne's Dictionary of Unusual, Obscure, and Preposterous Words—list tryptophan as the champ. But no source seemed to realize that contradictory answers were out there.
Settling the debate required going back to the original source of the words, the 12-inch-tall bound volumes of Chemical Abstracts. These encyclopedias of chemistry are so voluminous that the Library of Congress dedicates an entire room to just these books, and my search required combing through 10 years' worth of what looked like about eight-point font. I won't bore you (too late?) by describing the many hours it took just to learn how to useChemical Abstracts to look things up and hunt down leads. But eventually, I found the right page and settled on the exhaustive approach—photocopying pages and counting the letters one by one.
I found the name for the tobacco mosaic virus spelled out twice, first in 1964, then in 1966. But in all my searching, I never did locate the full, spelled-out version of the tryptophan molecule. It is mentioned, but only under the name "tryptophan synthetase protein a." So its full name has never appeared in print as far as I can tell, which perhaps explains why Guinness has stopped listing it as the longest word in its recent editions.
To throw in one more twist, I also hunted down the original 1964 tobacco mosaic virus paper that announced its decoding—and discovered it was in German. But Chemical Abstracts is an English-language document, in the fine reference-work tradition of Samuel Johnson and the OED; and it printed the name not toshow off long words but to propagate knowledge, so it counts.
Incidentally, people sometimes wonder why DNA doesn't count as the longest word, since genomes can run for billions of "letters" (AGACTCGATA ...). For one, DNA often doesn't appear in print except in small bites. More important, even if the 3 billion-letter human genome was printed out, it still isn't really a word—try pronouncing it. However contrived and formulaic, the mosaic virus name is speakable. Take a deep breath:
People also wonder what would be the longest word today if Chemical Abstracts had stubbornly kept printing the full names of molecules. Well, there is one protein out there with 34,000 amino acids. Its full name runs 189,819 letters—47 single-spaced pages of a Microsoft Word document. Mercifully, it's known as titin.