Can a Language Time Machine Change How We View the Past?

What's to come?
Sept. 9 2013 9:45 AM

The Language Time Machine

Google’s Ngram Viewer gave us a new way to explore history, but has it led to any real discoveries?

(Continued from Page 1)

If academics and researchers are actually using it, that is. Mark Davies, a professor of corpus linguistics at Brigham Young University, and the creator of a corpus of American historical English similar to Google’s work, says his colleagues aren’t using results from the Viewer in published research or presentations. Google Books data, he says, “is not even on the radar for most people. They look at these cute charts and say, all you can do is see a chart for one word—that’s pretty limiting.”

Even though Google recently tagged words by part of speech, there’s no way to check and make sure it labeled words correctly. “In academia, it doesn’t fly to say ‘Trust us, we did it right,’ ” Davies says. Another reason Davies thinks the Viewer hasn’t gained traction in his world: It doesn’t allow for searching by collocates, or words that occur nearby other words—but aren’t adjacent. (The Viewer does allow users to search for words that are next to each other.)

Linguists use collocates to understand how word meanings change over time. Gay, for example, used to be surrounded by color names and party—later, it began to appear by bisexual and marriage. (Technically, researchers can search collocates, but only if they download the underlying Ngram raw data set—and even then, Davies says, it’s a very complicated process). According to Google Research Manager Jon Orwant, the team is working on making it possible to search for words that are not just adjacent, but nearby.

Advertisement

Other academics fall on the opposite end of the Davies spectrum—they place too much power in the Viewer, and can misinterpret its results. A recent yemeles New York Times piece, for example, suggests that an uptick in toddler and similar words in postmodern fiction could signal “growing attention paid to children.”

“But in a dataset where novels are mixed with parenting manuals and cookbooks, it’s really hard to say what that increase tells us about the novel,” Underwood says. Researchers can break down their search by a fiction-related genre, but it’s not restricted to only traditional novels, as Aiden and Michel’s original research paper in Science explains.

Though it’s tempting to make dramatic claims using the data, it may be that the most valuable contribution of the Viewer so far isn’t a seismic cultural discovery. It’s the shift in the way we see—and question—our historic record.

“For me, it’s no question that the broader set of changes associated with [the Viewer and Google data] are changing the way research happens,” Underwood says, “We’re looking at an initial simplified outline of a picture that will get much richer and more interesting as we approach to take a closer look.”

The data will get richer, too, as we learn what to search for—and how to parse it. “People need more training in thinking in terms of questions that are good digital questions,” Davies says.

Aiden and Michel envision a more sophisticated Viewer in the future, one that uses more languages (the current one has data from nine, including both American and British English) and more puissant search functionality. Right now, you can search 22 corpora, or large groups of books, under genres like English fiction, Russian, and Hebrew. But there’s a potential barrier to significant progress: copyright laws. “Basically after the mid-’20s, you can’t really share the full text of most books published,” Aiden says. ”I think something has to happen in Congress in order to make these sorts of big data approaches to history move forward.”

There is, of course, one set of data you’ll always be free to search—your own. “If you could apply this technology to contemporary text on Twitter, blogs, or on your own corpus, you could search your own past and see trends in your own life,” Michel says. “That’s going to be possible.”

The Ngram Viewer, it seems, may be a little like reading your first Shakespeare play. It may take a while to adjust to a new syntax and rhythm—and at first, it can be jargogling. But once you do, the meaning behind that foreign diction begins to reveal itself.

This article arises from Future Tense, a collaboration among Arizona State University, the New America Foundation, and Slate. Future Tense explores the ways emerging technologies affect society, policy, and culture. To read more, visit the Future Tense blog and the Future Tense home page. You can also follow us on Twitter.

Elizabeth Weingarten is the associate editor at New America and the associate director of its Global Gender Parity Initiative.