So What Is Big Data Good for, Anyway?

Decoding the tech world.
Aug. 7 2014 12:00 PM

The Big Data Paradox

It’s never complete, and it’s always messy—and if it’s not, you can’t trust it.

XT-002 Deconstructor.
World of Warcraft’s XT-002 Deconstructor. WoW players wrote automated tracking systems to gather data on where treasure items and monsters popped up in the game.

Courtesy of Blizzard Entertainment, Inc./Creative Commons

Big data is messy data. It’s not enough just to collect it and count it, because there is never just one way to count it. Big data certainly doesn’t mean “the end of theory,” as Wired editor Chris Anderson notoriously put it in 2008. I came down hard on big data last week while discussing the Facebook and OkCupid experiments on users and their supposed revelations about human nature. These revelations turned out to be founded on sloppy analysis. That said, big data is undeniably important and is already responsible for great gains in efficiency and knowledge. Big data is not a miracle worker, but it is changing our lives.

David Auerbach David Auerbach

David Auerbach is a writer and software engineer based in New York. His website is http://davidauerba.ch.

One question to ask is how big data differs from regular data, other than there just being a lot more of it. Regular data doesn’t magically become big data just because you’ve got 100 million data points instead of a thousand. While computers have made large-scale number crunching far easier and faster than it was 20 or 30 years ago, that doesn’t mean that weather reports or graphs of seismic activity suddenly qualify as big data.

Contrariwise, big data is never complete either. It’s easy to think of big data as simply including all the data, but as Rachel Schutt and Cathy O’Neil put it in their excellent and skeptical book Doing Data Science, “It’s pretty much never all.” This is not a bad thing. As Jorge Luis Borges put it in “On Exactitude in Science,” a perfect map of a country—one necessarily as big as the country itself—is perfectly useless. But you must remain aware of what’s being excluded.

Advertisement

Aside from sheer quantity, there are three defining characteristics of big data. One key big data difference is megasourcing: taking data from huge numbers of distributed sources. If these sources are people, you can call it “crowdsourcing,” but the sources don’t need to be people. Every online ranking system, from Facebook “likes” to Reddit reputation systems, is an example of megasourcing, but so is Google Maps, which aggregates data from thousands of cars and satellites and third-party data sources around the world.

Another is automation. The ability to analyze data as fast as it can be collected means that the results can be put in play automatically, without anyone having to examine the data manually. This is not just a benefit, but a necessity, as the sheer quantity of data is becoming too great for humans to analyze even with the benefit of extra time. Hence the danger of big data: that the analyses are garbage, as we saw in the case of Facebook’s mood experiment, where “not happy” and “happy” both got treated as positive mood indicators. There’s so much data that there’s not enough time to validate the results (unless there’s a public outcry).

Finally, there is the issue of feedback. If an automated ad system decides you should see an ad for diapers because you recently “liked” a stroller on Facebook and bought wipes on Amazon, then further data on you—such as whether you clicked on that diaper ad—is interpreted as a consequence of the analysis that’s already been performed. Big data does not measure static or pristine systems; it puts its results back into these systems and changes their behavior. (This, naturally, makes the effects of big data that much more complicated and dependent.) “We’re witnessing the beginning of a massive, culturally saturated feedback loop,” write Schutt and O’Neil, “where our behavior changes the product and the product changes our behavior.”

A perfect example of these three features of big data comes from online multiplayer game World of Warcraft. (As usual, computer gamers got here first.) To figure out how often certain rare treasure items drop, how strong certain monsters are, and where items and monsters pop up in the game world, players wrote external, automated tracking systems like Wowhead that could be installed on their computers. Anyone who used these extensions while playing would automatically upload data of all of their encounters, pickups, and statistics to a central third-party server, which would aggregate them into a searchable database and generate stats. So if you wanted to know where to find a particular monster in WoW, you could get a breakdown of probabilities, down to the specific in-world coordinates.

TODAY IN SLATE

Technocracy

Forget Oculus Rift

This $25 cardboard box turns your phone into an incredibly fun virtual reality experience.

The Congressional Republican Digging Through Scientists’ Grant Proposals

Renée Zellweger’s New Face Is Too Real

Sleater-Kinney Was Once America’s Best Rock Band

Can it be again?

Whole Foods Is Desperate for Customers to Feel Warm and Fuzzy Again

The XX Factor

I’m 25. I Have $250.03.

My doctors want me to freeze my eggs.

The XX Factor
Oct. 20 2014 6:17 PM I’m 25. I Have $250.03. My doctors want me to freeze my eggs.
Politics

Smash and Grab

Will competitive Senate contests in Kansas and South Dakota lead to more late-breaking races in future elections?

George Tiller’s Murderer Threatens Another Abortion Provider, Claims Free Speech

Walmart Is Crushing the Rest of Corporate America in Adopting Solar Power

  News & Politics
The World
Oct. 21 2014 3:13 PM Why Countries Make Human Rights Pledges They Have No Intention of Honoring
  Business
Moneybox
Oct. 21 2014 5:57 PM Soda and Fries Have Lost Their Charm for Both Consumers and Investors
  Life
The Vault
Oct. 21 2014 2:23 PM A Data-Packed Map of American Immigration in 1903
  Double X
The XX Factor
Oct. 21 2014 3:03 PM Renée Zellweger’s New Face Is Too Real
  Slate Plus
Behind the Scenes
Oct. 21 2014 1:02 PM Where Are Slate Plus Members From? This Weird Cartogram Explains. A weird-looking cartogram of Slate Plus memberships by state.
  Arts
Brow Beat
Oct. 21 2014 1:47 PM The Best Way to Fry an Egg
  Technology
Technology
Oct. 21 2014 5:38 PM Justified Paranoia Citizenfour offers a look into the mind of Edward Snowden.
  Health & Science
Climate Desk
Oct. 21 2014 11:53 AM Taking Research for Granted Texas Republican Lamar Smith continues his crusade against independence in science.
  Sports
Sports Nut
Oct. 20 2014 5:09 PM Keepaway, on Three. Ready—Break! On his record-breaking touchdown pass, Peyton Manning couldn’t even leave the celebration to chance.