Newly Released Documents Show How Government Inflated the Definition of Metadata

Nov. 20 2013 10:45 AM

Newly Released Documents Show How Government Inflated the Definition of Metadata

Director of National Intelligence James Clapper

On Monday evening, the Office of the Director of National Intelligence finally made public the Foreign Intelligence Surveillance Court order approving a program to collect metadata—that is, information other than the actual contents of an email—from electronic communications. But rather than settling the debate that has raged over surveillance since the recent leaks, the document only complicates the already heated conversation about what data should be collected and by whom, pointing at larger questions of what is the medium and what is the message in this digital age—and when does one become the other?

In the order, Judge Colleen Kollar-Kotelly acknowledges she is authorizing “an exceptionally broad form of collection,” but notes that “under prior statutory language,” her ruling might have been quite different. Indeed, the order highlights the significance of changes made to legislative language defining surveillance techniques in the past decade and a half. Those changes, intended to update surveillance laws, radically altered the landscape of what is considered metadata,  making possible the kind of sweeping online data collection carried out by the NSA.


Unlike content, metadata is not protected by the Fourth Amendment, per a 1979 Supreme Court ruling that deals with telephone surveillance. So determining what constitutes metadata versus content effectively sets the boundaries for what kinds of information the government can collect more casually about larger numbers of people, as opposed to what they need to work harder—and obtain warrants—to acquire.  This division was more straightforward with traditional telephone technology: There were phone numbers, dates, and times (metadata) and then there were conversations (content). This made it relatively easy for Congress to define metadata surveillance techniques as those that identify “the numbers dialed” or an “originating number.”

Kollar-Kotelly’s order hinges on the definitions for “pen register” and “trap and trace device”—surveillance mechanisms for capturing the metadata of outgoing and incoming communications, respectively. Prior to 2001, she notes, the definitions of those techniques in 18 U.S.C. §3127 were geared very specifically toward capturing telephone metadata. A pen register was defined as “a device which records or decodes electronic or other impulses which identify the numbers dialed or otherwise transmitted on the telephone line to which such device is attached,” while a trap-and-trace device was one that “captures the incoming electronic or other impulses which identify the originating number of an instrument or device from which a wire or electronic communication was transmitted.” That’s legalese for devices that could identify incoming and outgoing phone numbers.

But that changed with the 2001 Patriot Act, which updated the definitions to apply unequivocally to Internet communications and make the law “technology neutral,” so that rather than focusing on the telephone the provisions could apply to all of the different communications technologies we use today—as well as ones we haven’t even invented yet—without requiring Congress to constantly rewrite or add new laws. Today, a pen register is “a device or process which records or decodes dialing, routing, addressing, or signaling information transmitted by an instrument or facility from which a wire or electronic communication is transmitted, provided, however, that such information shall not include the contents of any communication.” This shift from defining metadata as phone numbers to conceiving of it as everything that is not “content” does much more than update the definition for the Internet age. Or, in a sense, it does less: It does not update the language so much as it completely dodges the issue of the ways in which the nature of metadata has changed over time, with the development of new technologies.

Metadata is not—and cannot—be technology neutral. The types of peripheral, addressing, and noncontent information that are generated by communications depend entirely on the type of technology being used. For instance, email metadata might include email addresses, IP addresses, subject lines, and server transfer information—data that in many cases have no clear counterpart in a telephone call. Internet communications create entirely new categories and types of metadata; they generate much more non-content information, both in terms of quantity and variety, than the telephone. The Guardian’s excellent interactive guide to metadata, which lists the different types of meta data associated with emails, phone calls, cameras, Facebook and Twitter accounts, Google searches, and Web browsers, drives home the extent to which the information you generate depends on the devices and services you use.

Frustratingly, the released court order redacts the “categories of information” that the government believes fall under “the umbrella term ‘meta data.’ ” The redactions appear to span multiple pages, leaving open the question of just how many different types of information may fall into the allowable “dialing, routing, addressing, or signaling information” designation.

At this point, metadata is pretty much anything that is not a communication’s contents, and since content itself is defined in fairly vague terms in U.S. law as “any information concerning the substance, purport or meaning” of a communication, it’s not difficult to understand how Kollar-Kotelly came to the conclusion that even an “exceptionally broad form of collection” was warranted under the law (though, of course, it’s harder to assess her reasoning given how much of the order has been redacted).

It’s incredible how much of the way we think about surveillance continues to be governed by telephone technology. Even President Obama sought to reassure the public about the nature of the leaked surveillance programs by telling reporters in June that “nobody is listening to your telephone calls.” When it came to old-fashioned telephone technology, there was a clear-cut distinction between metadata and content, there were data irrelevant to a communication’s meaning, and we were reasonably well aware of the kinds of metadata we generated when we made calls. But now, none of those assumptions necessarily holds true, and we’re beginning to understand the extent to which the medium—and the metadata—really are the message.

Josephine Wolff is an assistant professor of public policy and computing security at Rochester Institute of Technology and a faculty associate at the Harvard Berkman Center for Internet and Society. Follow her on Twitter.

