Lies, Damned Lies, and Open Data

Governments are releasing more information than ever. But now we have to face new kinds of political debates.

Sept 07, 20123:43 PM

A child plays in Sargassum under cloudy skies September 9, 2007 in Wrightsville Beach, NC. — Data that show the North Carolina coast may change dramatically over the next century were suppressed by lawmakers

Photo by Logan Mock-Bunting/Getty Images.

This article arises from Future Tense, a partnership of Slate, the New America Foundation, and Arizona State University. On Sept. 6, Future Tense joined Aventura Capital Partners in Mexico City to host a conference on the “mobile city,” exploring how to make government information freely available to the public.

We are winning—for the moment—the battle for open government data. Everywhere one looks, governments at the local, state, national, and international levels are working toward launching open data portals that share, in a structured format that companies, nonprofits, and developers can reuse, information like budgets, product recalls, factory pollution levels, and crime data. Even Saudi Arabia—a country whose records in accountability, human rights, and transparency are dubious—has an open-data portal.

For many of us who have campaigned for the right to access and reuse government information, it would be easy to pause and relish the sweet victory. We have the ammunition, so now, believe the most techno-utopian advocates, open data will fundamentally change politics—depoliticizing debates and eliminating irresponsible positions.

But that would be a mistake.

Now we must renew the much larger battle over the role of evidence in public policy. On the surface, the open data movement was about who could access and use government data. It rested on the idea that data was as much a public asset as a highway, bridge, or park and so should be made available to those who paid for its creation and curation: taxpayers. But contrary to the hopes of some advocates, improving public access to data—that is, access to the evidence upon which public policy is going to be constructed—does not magically cause governments’, and politicians’, desire for control to evaporate. Quite the opposite. Open data will not depoliticize debate. It will force citizens, and governments, to realize how politicized data is, and always has been.

Governments, lobbyists, and other vested interests have always tried to shape public perception to their advantage through data—hence the line about “lies, damned lies, and statistics.” Our future will not be filled with a greater consensus on how to solve problems but by new debates over what, how, and why data are collected in service of public discourse. These political fights will be painful and they will matter. A lot.

As accessibility becomes less politicized, how governments collect data will become the new political battlefield. The most relevant “open” U.S. government data set may be the census. The grand history of disputes over its seemingly benign numbers—what questions to ask, what methodologies to use, what to do about the information—is emblematic of the bickering on the horizon. The census is so contentious because the stakes are so high: Its results determine seat counts in Congress, as well as how more than $400 billion in federal and state funds are allocated. Yet the numbers have long been plagued by inaccuracies. The 1990 census failed to include an estimated 8 million immigrants and urban minorities while double-counting roughly 4 million white Americans.

If one party is able to legislate how data are collected, subsequent battles may not matter—it can, in effect, create an “official” reality that serves a broader goal. The 1990 census became so problematic and partisan it ended up in the Supreme Court. A 1999 court ruling that statistical models and sampling could not be used to reapportion congressional seats is just a taste of what is to come as Big Data’s results come to influence even more decisions.

Equally political will be debates about not just how governments collect data, but what they choose ask. Consider the fantastic research by ViveloHoy, a local paper in Illinois’ twin cities Champaign and Urbana.* It acquired five years of arrest data—via Freedom of Information Requests, not open data, unfortunately—to show that the 16 percent of African-Americans in each community accounted for 41 percent and 49 percent of all arrests—and 88 percent and 91 percent of jaywalking arrests. Why so many jaywalkers? After investigating the arrest locations, the reporters noted that many African-American neighborhoods lacked sidewalks. These data transformed anecdotal stories into something measurable—a quantification of the structural inequity and its implications.

For a window into the types of ways that data will be used politically, look north to Canada. In 2010 the Canadian government announced it would make the “long form” portion of the census—the equivalent of the American Community Survey—voluntary. As in the United States, the Canadian census shapes its allocation of resources, enabling government to be leaner and smarter by targeting programs.

By making the survey voluntary, a number of issues—principally the problem of selection bias—rendered the result virtually worthless. Many suspect this was the point. As one conservative pundit noted, scrapping the long form was a fantastic way to prevent the government from helping a specific community. If you don’t know about inequity, marginalization, or structural racism, you can pretend they don’t exist. In reality, governments, ever responsive to citizens, will probably still run these programs—they’ll just be less efficient and effective. It will also make the country less productive as the nonprofit and business sectors, which also rely on these data, are negatively impacted. (It is worth noting that the U.S. House of Representatives voted to eliminate the ACS earlier this year.)

Of course, if political force fails to shape how or what data are collected, it can always just prevent government from using it. This practice was commonplace during the Bush presidency, where language and facts on issues like climate were systematically suppressed, such as in the EPA’s Report on the Environment.

Even the most innocuous-sounding data can become political in this atmosphere. North Carolina’s Senate recently approved a law requiring that only “historical data” be used to predict its future sea levels. So despite data suggesting the state’s sea levels will rise 39 inches by the end of the century, the North Carolina law required officials to plan for a 12-inch increase.

As scientific data—and more problematically, government data—increasingly come into conflict with official policy or dogma, expect more of this. Having accurate stats can be a powerful tool—but it is only that. Data cannot face down ignorance or self-serving interests (such as North Carolina’s coastal property developers) on its own.

We are going to have to find ways to ensure not just the openness of data, but also its credibility and reliability. The Argentinean government has fudged its inflation statistics to such a degree that few citizens believe the state, and organizations like the Economist have abandoned republishing its official numbers. But governments will find more sophisticated, subtle approaches to massaging the data. How will we be able to tell if we don’t start these conversations about best practices now?

Making data open is an important method by which to ensuring it serves citizens well. For example, imagine if census data weren’t open and available. Without expert, but also public, access to verify the work of the U.S. government, even worse distortions would be possible, allowing the politically powerful to manipulate the census to serve their interests. But proactive, rather than just reactive, steps will be needed to ensure that government data serve citizens, not special interests. Open data does not represent an endgame, but another step in what will likely be a never-ending struggle for rational debate and evidence based public policy.

Correction, Sept. 9, 2012: This article originally and incorrectly referred to Champaign and Urbana as suburbs of Chicago. The twin cities are located about 150 miles away from Chicago.

Also in the Future Tense package on government and open data: why Yelp and the government should share data; what a burger mob tells us about the future of democracy; and how Mexico is using open data to move beyond its authoritarian past.