House Democrats improperly redacted documents wrong, but they're not alone.

Redacting Digital Documents Is Easy. Why Do People Keep Doing it Wrong?

Redacting Digital Documents Is Easy. Why Do People Keep Doing it Wrong?

Future Tense
The Citizen's Guide to the Future
June 28 2016 5:24 PM

Redacting Digital Documents Is Easy. Why Do People Keep Doing it Wrong?

You can't redact digital documents the same way you would paper ones.


What should have been a public relations coup turned into minor fiasco this week when House Democrats publicly released a cache of digital documents related to the Benghazi committee’s inconclusive investigations. Though those documents were supposed to make the GOP look bad, one instead revealed compromising information about Hillary Clinton adviser Sidney Blumenthal. The Los Angeles Times found that seemingly redacted portions of a transcript featuring Blumenthal were actually available if you copied those sections from the PDF and pasted them into another document.

As Slate’s Ben Mathis-Lilley rightly points out, it’s “Embarrassing for House Democrats because they screwed up a process that can be successfully completed with a single black marker.” But it’s also a mistake that’s more common than you might think, one that has everything to do with our fundamental confusion about an increasingly digital world. In fact, it happens because would-be censors act as if they’re using black markers, despite the very different needs of electronic documents.


Here are just a few notable incidents: The Transportation Security Administration made the same error as the House Democrats when it released a screening manual in 2009, sending out a PDF in which, according to Wired, employees “merely overlaid black rectangles on the sensitive text … instead of cutting the text itself.” Among other details, those obscured sections included information about CIA protocols for handling foreign dignitaries. In 2011, a U.S. District Court’s opinion accidentally included redacted information about Apple’s business dealings, accessible by the same copy-paste trick. Though those revelations weren’t especially compromising, that same year, the British Ministry of Defense inadvertently leaked classified details about nuclear submarines that it thought it had censored, a considerably more consequential breach. Other examples abound, especially in legal filings.

This copy-paste workaround comes down the way that PDFs package and present data. Timothy B. Lee explains that PDFs generally work through vector-based graphics, effectively stacking multiple image layers atop one another to create the total picture you see on a given page of a document. (This is why you’ll sometimes watch as the various elements of an image gradually pop into view after you load up an especially complex file.) When you’re working in this format, drawing a black square over the text with the shape tool—much as you would hide sections of a physical document with a marker—may visually obscure information, but it doesn’t actually strip it from the document. The words are still there, even if they’re temporarily hidden when you look at the file in Acrobat or some other viewer.

This problem is hardly unknown, least of all to Adobe, which created the format in the first place. Rick Borstein, who maintains a law blog for the company, writes, “Simply covering up text and graphics [will] black rectangles is not redaction. It is, however, a recipe for disaster.” Wholly aware that that avoiding such catastrophes is necessary, the company includes a robust redaction tool within Acrobat. Though the results are visually the same—you’ll end up with big black boxes over the things you’ve hidden—the tool removes the underlying information from the document.

There’s plenty of information about that tool available for those willing to dig around just a little. In a 2010 article—perfectly timed to prevent some of the most notorious redaction screw-ups, if only anyone had been paying attention—Borstein detailed some of its features, including redactions across multiple pages. As Lisa Needham writes on the blog Lawyerist, the Acrobat redaction tool can also remove metadata from a document—stripping it of information about, say, the computer on which it was written. And though the tool can clearly be hard to find, Borstein has even put together a post showing users how to track it down without digging through menus. All of this is to say that there’s little excuse for ineffective redaction.

There are, of course, plenty of other ways to lazily hide information without really eliminating it: A document of digital redaction guidelines from the District of Columbia Circuit Court lays out a handful of other worst practices that would-be secret sharers should avoid. For example, it genially explains, “Changing the text’s font to white will make it look as though the words disappear, but they don’t!” And there are, of course, other ways to leave unwanted information in a document: Microsoft Word’s track changes feature can, as an article in the Florida Bar Journal suggests, inadvertently convey incriminating details if you don’t delete past revisions and comments before sending a file along. It’s easy, however, to forget that such information is there if you set tell Word to hide markup details, as many writers do while giving their work a final pass.

These are the sort of errors that we make when we refuse to recognize that digital documents are far more complex than their physical brethren. The House Democrats’ humiliating oversight, and other incidents like it, follow from follow a shared misapprehension, the belief that if it looks like paper it must also behave like paper. Think of it as a kind of aspirational skeuomorphism, a fantasy that paper’s qualities persist across the different media that imitate it.

It may be the very ease of copying from one document and pasting into another that helps us maintain this illusion. The feature is all but essential to modern computing platforms, so much so that we’re often baffled when it’s not available. As Lily Hay Newman has shown in Future Tense, however, things are rarely as simple as they seem, not least of all because “Copy and paste … doesn’t just magically interoperate between applications.” Comfortable with the relative ease of dropping, say, a tweet into an email, we forget how much is going on behind the scenes to make that transfer possible. To the contrary, we’d do well to remember just how remarkably complex that common feature is—and guard our secrets accordingly.

Future Tense is a partnership of SlateNew America, and Arizona State University.