How future historians will use the Library of Congress' Twitter archives.

Who's winning, who's losing, and why.
April 20 2010 7:03 PM

#Posterity

How future historians will use the Twitter archives.

The Library of Congress. Click image to expand.
The Library of Congress

Among the many criticisms of Twitter, the most common by far is that no one cares what you ate for breakfast.

In fact, quite a few people care. "I actually think it's very useful," says Paul Freedman, a professor at Yale University who studies the history of food. For him, a 140-character ode to your KFC Double Down—along with the worshipful photo you took before devouring it—could be a priceless historical document. "Historians are interested in ordinary life," Freedman says. "And Twitter is an incredible resource for ordinary life."

Hence the decision by the Library of Congress last week to store the complete archives of Twitter. Starting six months from now, every last tweet—currently produced at a rate of 50 million a day—will be saved on an LoC hard drive and will presumably be accessible to historians for … well, forever.

Digital archiving isn't anything new. A nonprofit digital library called the Internet Archive started collecting snapshots of the World Wide Web in 1996. University libraries regularly scan their research collections to make them accessible on the Web. Google Books is currently scanning the books of at least 20 major research libraries.

Advertisement

But the decision to archive Twitter takes digital preservation to a new level of detail. In the past, all archives, even digital ones, had to be selective. The Internet Archive doesn't preserve every last byte of the Web—only the seemingly important parts. The Twitter archive, by contrast, will be mind-numbingly complete. Everything from reactions to the uprising in Iran to Robert Gibbs' first tweet to your roommate's two-sentence analysis of Hot Tub Time Machine will be saved for posterity. Which is, from a historian's perspective, historic. Now that we've started logging all the stray thoughts hurled into cyberspace, the prospect of recording every last word ever published—to paraphrase archivist Brewster Kahle, we're "one-upping the Greeks"—doesn't seem especially crazy.

The question is, does the preservation of digital content, from tweets to Facebook updates to blog comments, make the job of historians easier or harder?

The answer is: both. On the one hand, there's more useful information for historians to sift. On the other, there's more useless information. And without the benefit of hindsight, it's impossible to tell which is which. It's like what John Wanamaker supposedly said about advertising: He knew half of it was wasted, he just didn't know which half.

The trick will be organization. Hashtags—the # symbols people use to create discussion threads, such as #ashtag for the Iceland volcano cloud and #snowpocalypse for the February snowstorm that swept Washington, D.C.—are a start. But many tweeters don't bother to tag their posts. Historians will probably be able to search by keyword. But that can lead them astray, too. How do you know if someone is complaining about the windows in their house or the Windows on their computer?

Data-mining has become sophisticated enough to make these distinctions based on context. Sometimes that means looking at other keywords surrounding a keyword. (If the word "laptop" appears near "Windows," for example, the author is probably talking about software.) It could also mean looking at metadata—when the tweet was sent, where it was sent from, whom the person is following and vice versa. Twitter has no plans to share public metadata with the LoC, but a spokesman says it would be "open to discussing this with them."

TODAY IN SLATE

The World

The Budget Disaster that Sabotaged the WHO’s Response to Ebola

Are the Attacks in Canada a Sign of ISIS on the Rise in the West?

PowerPoint Is the Worst, and Now It’s the Latest Way to Hack Into Your Computer

Is It Offensive When Kids Use Bad Words for Good Causes?

Fascinating Maps Based on Reddit, Craigslist, and OkCupid Data

Culturebox

The Real Secret of Serial

What reporter Sarah Koenig actually believes.

Culturebox

The Actual World

“Mount Thoreau” and the naming of things in the wilderness.

In Praise of 13th Grade: Why a Fifth Year of High School Is a Great Idea

Can Democratic Sen. Mary Landrieu Pull Off One More Louisiana Miracle?

  News & Politics
Politics
Oct. 23 2014 3:55 PM Panda Sluggers Democrats are in trouble. Time to bash China.
  Business
Business Insider
Oct. 23 2014 2:36 PM Take a Rare Peek Inside the Massive Data Centers That Power Google
  Life
Atlas Obscura
Oct. 23 2014 1:34 PM Leave Me Be Beneath a Tree: Trunyan Cemetery in Bali
  Double X
The XX Factor
Oct. 23 2014 11:33 AM Watch Little Princesses Curse for the Feminist Cause
  Slate Plus
Working
Oct. 23 2014 11:28 AM Slate’s Working Podcast: Episode 2 Transcript Read what David Plotz asked Dr. Meri Kolbrener about her workday.
  Arts
Brow Beat
Oct. 23 2014 4:03 PM You’re Doing It Wrong: Puttanesca Sauce
  Technology
Technology
Oct. 23 2014 4:36 PM Vampire Porn Mindgeek is a cautionary tale of consolidating production and distribution in a single, monopolistic owner.
  Health & Science
Bad Astronomy
Oct. 23 2014 7:30 AM Our Solar System and Galaxy … Seen by an Astronaut
  Sports
Sports Nut
Oct. 20 2014 5:09 PM Keepaway, on Three. Ready—Break! On his record-breaking touchdown pass, Peyton Manning couldn’t even leave the celebration to chance.