An HTML for Numbers
Is Google's Public Data Explorer the first step toward a universal data format?
I've since discovered better ways to crunch these figures, but I had to learn a lot of programming to get there. If the job data I wanted for my map had already been represented in Google's format, I would have saved days of work getting it into shape (even if I wanted to use my own software to visualize it instead one of the four display options that Google offers).
Even more compelling is the possibility that data could join the ranks of text, images, and video in Google search results. This happens in a very basic form now. If you Google "population of Italy," you see a simple graph of population over time at the top of the results page, which you can click for more detail. This is the exact same tool that's opening up to the public today. Imagine if Google's spiders, forever crawling the Web to index its contents, could smartly identify and sort data? Let's say I publish an article on YouTube view counts that includes proprietary data I collected for the piece. If it's formatted according to Google's standards, it might show up as a little bar graph when someone searches for "YouTube views," even if my article itself isn't at the top of the results. (By the way, Public Data Explorer allows you to choose whether to share your figures—despite the name, your data won't be public if you don't want it to be.)
Thebreadth and relative complexity of Google's format will become clearer over time, but it can already represent many common types of information. For example, it can account for hierarchical data—a set in which the number of jobs in King County, Wash., is represented as a subset of jobs in Washington, which is in turn a subset of jobs in the United States. (This is useful for things like treemaps.) It's also good for time-lapse data, allowing you to show change over time in animated charts and graphs. While Google's system will have to evolve to accommodate less-traditional visualizations like network diagrams, its relative simplicity is a good thing. In a format like RDF, the author needs to add a ton of extra information to the source code to help computers figure out what's what. Google, by contrast, wants most of the burden to be on its shoulders.
Public Data Explorer is important because not many people care to read data in its raw form. A simple presentation tool—essentially, an HTML for numbers instead of words—might not be sexy, but it could do a lot to elevate data to the same importance as text in search results.Then, fingers crossed, mashing it all up could end up being just like mashing together a couple of balls of putty.
Chris Wilson is a Slate contributor.
Photograph by iStockphoto/Thinkstock.