Google Ngram Viewer: a language time machine.

Can a Language Time Machine Change How We View the Past?

Can a Language Time Machine Change How We View the Past?

The citizen’s guide to the future.
Sept. 9 2013 9:45 AM

The Language Time Machine

Google’s Ngram Viewer gave us a new way to explore history, but has it led to any real discoveries?

We’ve been to the moon and just about everywhere on Earth. So what’s left to discover? In September, Future Tense is publishing a series of articles in response to the question, “Is exploration dead?” Read more about modern-day exploration of the sea, space, land, and more unexpected areas.

Jessica Pare and Jon Hamm in Mad Men.
Jessica Paréand Jon Hamm in Mad Men

Courtesy of AMC

At age 14, I began reading my first Shakespeare play—Romeo and Juliet—for English class. It only took a line or two before my first profound, literary thought began to percolate: “This is English?!”

It was an important early lesson in the evolution of language—how cultural understandings shift between eras, how time twists words and constructs new definitions.


Yet, it was an incomplete one. Back when I was in school, readers could see an antiquated expression sprinkled throughout early texts—but it was impossible to scientifically trace its rise and fall without studying thousands of volumes. Even the Oxford English Dictionary, with its rich recorded histories of just about every English word, couldn't tell us how many times journalist appeared in 1888 vs. 1978. Few attempted that task. Fewer survived.

But when Google’s Ngram Viewer launched in 2010, it transformed that Everest-level word exploration into a seconds-long bunny hill search—and illecebrous, captivating graphs. Using data from Google Books (the company has digitized about 20 million books and uses data from about 6 million in the viewer), the Ngram Viewer plots frequency of particular words and phrases across time.

Through this digital tool, we can explore the rise and fall of your great-great-grandmother’s vocabulary—the dustiest, rustiest words in the OED. During the viewer’s first day, language zealots performed more than 3 million word searches (parsing Google Books data) to see how use frequency ebbed and flowed across time. According to Google, the tool is now used about 50 times per minute. Those poetic word frequency graphs have sprouted everywhere.

Chart generated by Google Ngram Viewer.

Chart generated by Google Ngram Viewer

But now that it’s been three years since the hype and fanfare, some spermologers want to know: Has this new tool led to any legitimate discoveries—knowledge that has upended our sense of events or eras? In other words, what have we unearthed after three years of exploring an untapped trove of recorded history? 

The Viewer has enabled many granular discoveries—think fossilized Mesopotamian cooking tool rather than Dead Sea Scrolls. “There are hundreds of little mysteries that one can resolve with the Ngram Viewer,” says Erez Lieberman Aiden, a founding father of the Viewer and the field of Culturomics (which studies human culture and history through the lens of massive datasets) and fellow at the Harvard Society of Fellows. Take the mystery of donuts vs. doughnuts. When did the spelling change? Before the Ngram Viewer, “it would’ve taken a very long time to determine when that spelling transition took place,” Aiden explains. But according to the Viewer, the donut spelling starts to take off in early 1950s, right around the time Dunkin’ Donuts opened its first store. Of course, it doesn’t prove that Dunkin’ Donuts alone changed the spelling—but it does add a compelling dimension to the story.

The Viewer exposes Nazi censorship of philosophers in German texts and Chinese suppression of the Tiananmen Square massacre of 1989. Its easily accessible, swith data could help writers of historic fiction TV shows, like Mad Men and Downton Abbey, produce more accurate and convincing dialogue. And it dismantles the narrative that the Civil War caused Americans to refer to the United States as a singular entity (the United States is) rather than a plural (the United States are). That long-held belief “is a phenomenal story line, kind of an inspiring story line,” Aiden says. “It brings together grammar and battle and has figured prominently in the works of prominent historians.” But as the Viewer has demonstrated, the reality is more complex. Americans were using both versions before the Civil War, but the singular overtook the plural starting in the early 1880s.

The Viewer also helps corroborate larger, semantic debates—like, do words actually evolve in the Darwinian sense?

And that was the question that set Aiden and Jean-Baptiste Michel, another Viewer founding father and co-founder of the Culturomics field, on the path to create such a tool in the first place. Back in 2007, Aiden, Michel, and a crew of undergraduate students decided to test the word evolution hypothesis by tracking irregular verbs over the past 1,000 years. They found 177 that were traceable (for instance, go and went, run and ran), plotted them manually, and discovered that the verbs did undergo a kind of evolutionary process. “The less frequent the verb, the more rapidly it becomes irregular,” Aiden explains. “Our work became this demo of how evolution by natural selection might work in a cultural study.”

It was work that Ted Underwood, an associate professor of English at the University of Illinois, had been eager to do since the ’90s. “But the tools weren’t available to chart word frequency on the scale that you need,” he said. He acknowledges that the tool hasn’t led to any discoveries that radically changed our sense of the past. Still, “it can confirm things, make us notice details we hadn’t previously noticed.”