Preserving software for future historians: Emulators versus physical copies.

Arts, entertainment, and more.
July 30 2013 8:47 AM

History.exe

How can we preserve the software of today for historians of tomorrow?

Boxes of software at the Library of Congress's Packard Campus for Audiovisual Conservation.
Boxes of software at the Library of Congress' Packard Campus for Audio-Visual Conservation

Photo by Dave Gibson

Visitors to Microsoft in Redmond, Wash., can avail themselves of free shuttle cars to help them make their way about the sprawling suburban campus. The cars are clean and quiet and always appear within minutes of being summoned, and the drivers always know where they are going. Or almost always: During my weeklong stay my drivers repeatedly told me this was their first trip to Building 126, Microsoft’s corporate archives. One or two had to look it up on Bing Maps.

Nonetheless, the facility is impressive: a dedicated building with 6,600 shelf-feet of cold storage, all holdings duly packaged in the archive world’s ubiquitous soft gray Hollinger boxes, as well as earthquake safeguards and redundant off-site storage at Iron Mountain in Pennsylvania. I had come to Microsoft to do research for the book I am writing on the literary history of word processing. Microsoft Word, I reasoned, was the most widely used piece of writing software in the world, the No. 2 pencil of the digital age. Outside researchers here are scarce indeed, but I was lucky enough to get in thanks to assistance from Microsoft Research Connections, where there are people who understand why an English professor might be interested in what they do. Over the course of a week spent sequestered in the remote archives building on the edge of the Microsoft campus, I came to appreciate exactly what it means to think about software as an artifact: not some abstract, ephemeral essence, not even as lines of code, but as something made, something that builds up layers of tangible history through the years, something that contains stories and subplots and dramatis personae. And I started thinking anew about how we preserve software for the future: future users, future programmers, and future historians. If, hundreds of years from now, a literary scholar wanted to run Word 97, the first consumer version to implement the popular “track changes” feature, how would she find it? What machine would accommodate this ancient artifact of textual technology?

Just as early filmmakers couldn’t have predicted the level of ongoing interest in their work more than 100 years later, who can say what future generations will find important to know and preserve about the early history of software? While the notion that someone might go diving into some long outmoded version of Word might seem improbable, knowledge of the human past turns up in all kinds of unexpected places. Historians of the analog world have long known this: Writing, after all, began as a form of accounting—would the Sumerian scribes who incised cuneiform into wet clay have thought their peculiar angular scratchings would have been of interest to a future age?

Advertisement

We tend to conceptualize software and communicate about it using very tangible metaphors. “Let’s fork that build.” “Do you have the patch?” “We can use these libraries.” “What’s the code base?” Software may be, as David Gelernter once wrote, stuff unlike any other, but it is still a thing, tangible and present for all of its supposed virtual ineffability. Software developers themselves have a name for the way in which software tends to accrete as layers of sedimentary history, fossils and relics of past versions and developmental dead ends: cruft, a word every bit as textured as crust or dust or fluff, all of which refer to physical rinds and remainders.

And even outside the program itself, what else is the software? Is it just the code, or is it also the shrink-wrapped artifact, complete with artwork, documentation, and “feelies,” extras like faux maps or letters that would become part of the play of a game? Is it the program or the operating system? What about the hardware? The firmware? What about controllers and other peripherals integral to the experience of a given piece of software? How to handle all the different versions and iterations of software? What about fan-generated add-ons like mods and macros? What about discussion boards and strategy guides and blogs and cheat sheets, all of which capture the lively communities around software?

More simply: What do we save, and how do we save it?

---

In the green hills west of Washington, D.C., sits the Library of Congress’ Culpeper, Va., campus, properly known as the Packard Campus for Audio-Visual Conservation. Built with support from David Woodley Packard, it’s a state-of-the-art facility whose sweeping lines and aery interiors seem more characteristic of its Silicon Valley patronage than the Virginia countryside or federal bureaucracy. The 45-acre Packard campus is charged with the long-term safekeeping of the nation’s films, television programs, radio broadcasts, and sound recordings—the largest audio and moving-image collection in the world.