An HTML for Numbers
Is Google's Public Data Explorer the first step toward a universal data format?
The Age of Data is just around the corner, right where it has been for years. As someone who spends a lot his time creating visualizations, I've been hoping for this day to come for a very long time. "It used to be that you would get stories by chatting to people in bars," Internet godfather (and non-journalist) Tim Berners-Lee declared last year. "But now it's also going to be about poring over data and equipping yourself with the tools to analyze it." Don't buy it? This transfixing eight-part video series from Knight Journalism fellow Geoff McGhee might change your mind. Data isn't just for nerds any more—it's beautiful, alluring, extraordinary.
It's also incredibly hard to work with. The problem with bringing data to journalism isn't convincing writers and editors that it's useful for telling stories; it's the toil required to get the numbers in a usable format. The data is already there, from federal sentencing figures and unemployment rates by county to minute-by-minute Twitter responses to the Black Eyed Peas' smoldering wreckage of a Super Bowl halftime show. The problem is that it all looks different. It is compiled by different people using different programs and represented in different formats. As a result, mashing up data isn't as simple as mashing together two balls of Silly Putty. It's more like trying to plug a bunch of American appliances into outlets in Tbilisi.
In hopes of bridging this data divide, Google is rolling out a tool called Public Data Explorer. While Data Explorer has been around for a while, it's now been extended to allow users to upload and visualize their own data sets. But that's not why Google's effort is important. If you want to make cool visualizations, IBM's Many Eyes offers more than a dozen different ways to display information. (Google currently offers four pretty standard ones.) The exciting news here is that Google is pushing for the adoption of a specific format. Users must upload their data in two files, one for all the numbers and one that describes what those numbers represent. If this feature becomes popular, it will make it a whole lot easier for people and agencies to use one another's data. It's not quite a universal format, but it's a lot closer than anything we have today.
The beauty of the Web—in fact, the reason the Internet can function in the first place—is that it doesn't require intensive training to publish a page in a readable format. Sure, you might have to learn a few HTML tags—or pay an 11-year-old who knows HTML—but it's a simple language that's easy to pick up. There is no equivalent for data. There are plenty of standards for making data readable by a machine, but no single format that everyone can understand and agree on.
While plenty of people have tried to develop a data standard, none of them have been named Google. A promising site called Swivel tried to became a "YouTube for data" a few years ago, but don't go looking for it now. One of Google's greatest powers is its ability to cajole Web developers into playing by the company's rules, in hopes of climbing in the rankings and generally staying in the demigod's good graces. For sure, there are well-developed languages, like XML and JSON, for organizing data in a way computers can understand. While these are great for representing data for a specific purpose, a search engine wouldn't know what to do with my code without extra information from me on what the numbers mean. This is where a standard format becomes essential.
To understand why I'm rooting for Google, consider this brief tale of woe. When I was trying to build a map of job-loss data for Slate, I started with the month-by-month, county-by-county figures from the Bureau of Labor Statistics. This data comes in huge text files with arcane codes—meaningless gibberish unless you have the software and the know-how to match those codes to the names of counties, which live in a different file. At the time, I did this in Excel with a cocktail of Byzantine macros, late nights, and emotional breakdowns.
Chris Wilson is a Slate contributor.
Photograph by iStockphoto/Thinkstock.