Lies, Damned Lies, and Open Data
Governments are releasing more information than ever. But now we have to face new kinds of political debates.
Photo by Logan Mock-Bunting/Getty Images.
This article arises from Future Tense, a partnership of Slate, the New America Foundation, and Arizona State University. On Sept. 6, Future Tense joined Aventura Capital Partners in Mexico City to host a conference on the “mobile city,” exploring how to make government information freely available to the public.
We are winning—for the moment—the battle for open government data. Everywhere one looks, governments at the local, state, national, and international levels are working toward launching open data portals that share, in a structured format that companies, nonprofits, and developers can reuse, information like budgets, product recalls, factory pollution levels, and crime data. Even Saudi Arabia—a country whose records in accountability, human rights, and transparency are dubious—has an open-data portal.
For many of us who have campaigned for the right to access and reuse government information, it would be easy to pause and relish the sweet victory. We have the ammunition, so now, believe the most techno-utopian advocates, open data will fundamentally change politics—depoliticizing debates and eliminating irresponsible positions.
But that would be a mistake.
Now we must renew the much larger battle over the role of evidence in public policy. On the surface, the open data movement was about who could access and use government data. It rested on the idea that data was as much a public asset as a highway, bridge, or park and so should be made available to those who paid for its creation and curation: taxpayers. But contrary to the hopes of some advocates, improving public access to data—that is, access to the evidence upon which public policy is going to be constructed—does not magically cause governments’, and politicians’, desire for control to evaporate. Quite the opposite. Open data will not depoliticize debate. It will force citizens, and governments, to realize how politicized data is, and always has been.
Governments, lobbyists, and other vested interests have always tried to shape public perception to their advantage through data—hence the line about “lies, damned lies, and statistics.” Our future will not be filled with a greater consensus on how to solve problems but by new debates over what, how, and why data are collected in service of public discourse. These political fights will be painful and they will matter. A lot.
As accessibility becomes less politicized, how governments collect data will become the new political battlefield. The most relevant “open” U.S. government data set may be the census. The grand history of disputes over its seemingly benign numbers—what questions to ask, what methodologies to use, what to do about the information—is emblematic of the bickering on the horizon. The census is so contentious because the stakes are so high: Its results determine seat counts in Congress, as well as how more than $400 billion in federal and state funds are allocated. Yet the numbers have long been plagued by inaccuracies. The 1990 census failed to include an estimated 8 million immigrants and urban minorities while double-counting roughly 4 million white Americans.
If one party is able to legislate how data are collected, subsequent battles may not matter—it can, in effect, create an “official” reality that serves a broader goal. The 1990 census became so problematic and partisan it ended up in the Supreme Court. A 1999 court ruling that statistical models and sampling could not be used to reapportion congressional seats is just a taste of what is to come as Big Data’s results come to influence even more decisions.
Equally political will be debates about not just how governments collect data, but what they choose ask. Consider the fantastic research by ViveloHoy, a local paper in Illinois' twin cities Champaign and Urbana.* It acquired five years of arrest data—via Freedom of Information Requests, not open data, unfortunately—to show that the 16 percent of African-Americans in each community accounted for 41 percent and 49 percent of all arrests—and 88 percent and 91 percent of jaywalking arrests. Why so many jaywalkers? After investigating the arrest locations, the reporters noted that many African-American neighborhoods lacked sidewalks. These data transformed anecdotal stories into something measurable—a quantification of the structural inequity and its implications.
For a window into the types of ways that data will be used politically, look north to Canada. In 2010 the Canadian government announced it would make the “long form” portion of the census—the equivalent of the American Community Survey—voluntary. As in the United States, the Canadian census shapes its allocation of resources, enabling government to be leaner and smarter by targeting programs.