Nothing but Net

Inside the Internet.
Feb. 28 1997 3:30 AM

Nothing but Net

Preserving the Internet, 1 terabyte at a time.

The Internet is a moving target. Every minute, thousands of Web pages are updated or abandoned. Messages sent to newsgroups replace older postings. All but a fraction of the chat-room conversations and digital images that streak across the Net vanish after they're displayed.

Advertisement

Seeking to preserve the chaos of the Net for posterity is Brewster Kahle, a man with a mission, a server, and a lot of magnetic tape. Kahle, who once designed computers for Thinking Machines Corp., founded the Internet Archive in 1996 to collect and store all the disparate bits of the Internet. From offices in the Presidio, the former Army base adjacent to San Francisco's Golden Gate Bridge, the Internet Archive's powerful computer the Internet at high speeds. Consulting intelligent algorithms about what information to store and how often, the archive's computer copies data to tape cassettes on a Quantum DLT4500 recorder. When each cassette is full, a robotic arm removes it, stores it in a carousel, and replaces it with a blank one.

T he Internet may seem impossibly vast to users, but in fact it's quite finite. The entire World Wide Web is currently estimated to contain about 1.5 terabytes (or 1.5 million megabytes) of data. Newsgroups,, and other Internet subsystems account for another 5.5 or so terabytes. (Compare these numbers with the 20 terabytes of ASCII data contained in the Library of Congress' 20 million books or the 8 terabytes of data at the average video store.) With tape-cassette storage costing only $20 per gigabyte (1 billion bytes), archiving the Internet is practically economical. Already, the archivists have stockpiled more than 2 terabytes of the Net, and currently they're storing about 100 gigabytes of data every month. Faster connections to the Net promise to speed things up, and Kahle estimates that his group will be done by the end of 1997.

Storing the Internet once is only the beginning. As experienced Web surfers know, things change rapidly on the Net. The archive doesn't have the computer muscle to store the publicly available Internet every week, but even if it did, a lot of stuff would still fall through the cracks. On sites like MSNBC and CNN, breaking news comes and goes every minute, which means pages disappear faster than they can currently be squirreled away. Slate is updated daily. Shifting faster still are Web sites generated by databases, such as the online bookstore Amazon.com. Because the information these sites produce is specific to a user's experience, they can generate a literally infinite number of different pages. Finally, much of the traffic on the Internet is dynamic--chat rooms, instant messages, and now even phone conversations. To archive the Internet with absolute fidelity would require cloning not only every computer on the Internet, but also every person using every computer.

Many responsible netizens already archive themselves for selfish reasons. Archiving is a no-brainer for publication sites like the San Francisco Chronicle's The Gate, which collects the contents of the daily newspaper and connects them to a good search engine. And other sites like Deja News already assemble postings from the Internet newsgroups.

Where the Internet Archive trumps these archives, of course, is in its sheer comprehensiveness. While it isn't a replica of the Internet, it's a start. And it's not useful just to historians. Suppose your Web browser allowed you to specify not only an address but also a date. Remember that headline you saw on Wired News, but have been unable to find since? The headline was posted for only a day, and you haven't had much luck using the site's search tool to locate the piece. But using the Internet Archive to turn back the hands of time will uncover it for you. And what about your teen-age cousin's Web page, with that cute picture of her Mohawk? Cousin's mother cancelled her ISP account, and now the site is gone. But an intelligent browser could catch the "no such site" error and look it up on the archive instead, displaying the last-known version. Did your favorite politician really just flip-flop on your hot-button issue? Compare last year's campaign Web site with today's. These are just a few of the many valuable services that promise to keep the nonprofit Internet Archive richly endowed.

43000_43653_brown_tape

U seful though it might be, the idea of archiving the Internet is assailed by all sides. David Berreby argued last year in Slate that exhaustive documentation of our world threatens to box us into a corner. The recent "Documenting the Digital Age" conference gathered experts from the computing, telecommunication, and archiving worlds to explore these issues. Corporate executives complained that because their archives are routinely subpoenaed by plaintiffs' attorneys, they have every incentive to shred their data instead of preserving them. Lawyers worried aloud about privacy and copyright concerns. Should you have the right to exclude your public page from the archive? (Consensus opinion: Yes.) Should we be saving usage logs, which detail every page a person sees? (Probably not.) Doesn't this whole thing violate current copyright laws left and right? (Almost certainly.) Should those laws be amended to allow such an archive? (Probably.)

Professional archivists argue that it's a waste of time to store the Internet without providing a proper historical context. Others say that having toomuch information about the Web at our disposal will be as bad as not having enough. They add that finding things promptly on the Web with a search engine is hard enough, that using it as a historical research tool would be incredibly painful. They advocate an orderly weeding, assembling, and categorizing of digital records. Microsoft's chief technical officer (and Slate contributor), Nathan Myhrvold, whose "Save the Web" memo last year helped start the archive movement, counters that we don't know now what will be important later. Your cousin might grow up to be president, at which point her teen-age Mohawk Web site will become substantially more important than it is now. Myhrvold adds that it's better to start saving today's Internet now, even if it is badly collected and organized, rather than lose it forever.

And to think that Brewster Kahle thought he was just solving a problem by starting the Internet Archive, and not introducing lots of new ones.

Bill Barnes, Slate's founding program manager, draws and co-writes the daily comic strip Overdue.

TODAY IN SLATE

Culturebox

The Ebola Story

How our minds build narratives out of disaster.

The Budget Disaster That Completely Sabotaged the WHO’s Response to Ebola

PowerPoint Is the Worst, and Now It’s the Latest Way to Hack Into Your Computer

The Shooting Tragedies That Forged Canada’s Gun Politics

A Highly Unscientific Ranking of Crazy-Old German Beers

Education

Welcome to 13th Grade!

Some high schools are offering a fifth year. That’s a great idea.

Culturebox

The Actual World

“Mount Thoreau” and the naming of things in the wilderness.

Want Kids to Delay Sex? Let Planned Parenthood Teach Them Sex Ed.

Would You Trust Walmart to Provide Your Health Care? (You Should.)

  News & Politics
Politics
Oct. 22 2014 9:42 PM Landslide Landrieu Can the Louisiana Democrat use the powers of incumbency to save herself one more time?
  Business
Continuously Operating
Oct. 22 2014 2:38 PM Crack Open an Old One A highly unscientific evaluation of Germany’s oldest breweries.
  Life
Gentleman Scholar
Oct. 22 2014 5:54 PM May I Offer to Sharpen My Friends’ Knives? Or would that be rude?
  Double X
The XX Factor
Oct. 22 2014 4:27 PM Three Ways Your Text Messages Change After You Get Married
  Slate Plus
Tv Club
Oct. 22 2014 5:27 PM The Slate Walking Dead Podcast A spoiler-filled discussion of Episodes 1 and 2.
  Arts
Culturebox
Oct. 22 2014 11:54 PM The Actual World “Mount Thoreau” and the naming of things in the wilderness.
  Technology
Future Tense
Oct. 22 2014 5:33 PM One More Reason Not to Use PowerPoint: It’s The Gateway for a Serious Windows Vulnerability
  Health & Science
Wild Things
Oct. 22 2014 2:42 PM Orcas, Via Drone, for the First Time Ever
  Sports
Sports Nut
Oct. 20 2014 5:09 PM Keepaway, on Three. Ready—Break! On his record-breaking touchdown pass, Peyton Manning couldn’t even leave the celebration to chance.