So What Is Big Data Good for, Anyway?

Decoding the tech world.
Aug. 7 2014 12:00 PM

The Big Data Paradox

It’s never complete, and it’s always messy—and if it’s not, you can’t trust it.

XT-002 Deconstructor.
World of Warcraft’s XT-002 Deconstructor. WoW players wrote automated tracking systems to gather data on where treasure items and monsters popped up in the game.

Courtesy of Blizzard Entertainment, Inc./Creative Commons

Big data is messy data. It’s not enough just to collect it and count it, because there is never just one way to count it. Big data certainly doesn’t mean “the end of theory,” as Wired editor Chris Anderson notoriously put it in 2008. I came down hard on big data last week while discussing the Facebook and OkCupid experiments on users and their supposed revelations about human nature. These revelations turned out to be founded on sloppy analysis. That said, big data is undeniably important and is already responsible for great gains in efficiency and knowledge. Big data is not a miracle worker, but it is changing our lives.

David Auerbach David Auerbach

David Auerbach is a writer and software engineer based in New York. His website is http://davidauerba.ch.

One question to ask is how big data differs from regular data, other than there just being a lot more of it. Regular data doesn’t magically become big data just because you’ve got 100 million data points instead of a thousand. While computers have made large-scale number crunching far easier and faster than it was 20 or 30 years ago, that doesn’t mean that weather reports or graphs of seismic activity suddenly qualify as big data.

Contrariwise, big data is never complete either. It’s easy to think of big data as simply including all the data, but as Rachel Schutt and Cathy O’Neil put it in their excellent and skeptical book Doing Data Science, “It’s pretty much never all.” This is not a bad thing. As Jorge Luis Borges put it in “On Exactitude in Science,” a perfect map of a country—one necessarily as big as the country itself—is perfectly useless. But you must remain aware of what’s being excluded.

Advertisement

Aside from sheer quantity, there are three defining characteristics of big data. One key big data difference is megasourcing: taking data from huge numbers of distributed sources. If these sources are people, you can call it “crowdsourcing,” but the sources don’t need to be people. Every online ranking system, from Facebook “likes” to Reddit reputation systems, is an example of megasourcing, but so is Google Maps, which aggregates data from thousands of cars and satellites and third-party data sources around the world.

Another is automation. The ability to analyze data as fast as it can be collected means that the results can be put in play automatically, without anyone having to examine the data manually. This is not just a benefit, but a necessity, as the sheer quantity of data is becoming too great for humans to analyze even with the benefit of extra time. Hence the danger of big data: that the analyses are garbage, as we saw in the case of Facebook’s mood experiment, where “not happy” and “happy” both got treated as positive mood indicators. There’s so much data that there’s not enough time to validate the results (unless there’s a public outcry).

Finally, there is the issue of feedback. If an automated ad system decides you should see an ad for diapers because you recently “liked” a stroller on Facebook and bought wipes on Amazon, then further data on you—such as whether you clicked on that diaper ad—is interpreted as a consequence of the analysis that’s already been performed. Big data does not measure static or pristine systems; it puts its results back into these systems and changes their behavior. (This, naturally, makes the effects of big data that much more complicated and dependent.) “We’re witnessing the beginning of a massive, culturally saturated feedback loop,” write Schutt and O’Neil, “where our behavior changes the product and the product changes our behavior.”

A perfect example of these three features of big data comes from online multiplayer game World of Warcraft. (As usual, computer gamers got here first.) To figure out how often certain rare treasure items drop, how strong certain monsters are, and where items and monsters pop up in the game world, players wrote external, automated tracking systems like Wowhead that could be installed on their computers. Anyone who used these extensions while playing would automatically upload data of all of their encounters, pickups, and statistics to a central third-party server, which would aggregate them into a searchable database and generate stats. So if you wanted to know where to find a particular monster in WoW, you could get a breakdown of probabilities, down to the specific in-world coordinates.

TODAY IN SLATE

Doublex

Crying Rape

False rape accusations exist, and they are a serious problem.

Scotland Is Just the Beginning. Expect More Political Earthquakes in Europe.

No, New York Times, Shonda Rhimes Is Not an “Angry Black Woman” 

Brow Beat
Sept. 19 2014 1:39 PM Shonda Rhimes Is Not an “Angry Black Woman,” New York Times. Neither Are Her Characters.

The Music Industry Is Ignoring Some of the Best Black Women Singing R&B

How Will You Carry Around Your Huge New iPhone? Apple Pants!

Medical Examiner

The Most Terrifying Thing About Ebola 

The disease threatens humanity by preying on humanity.

Television

The Other Huxtable Effect

Thirty years ago, The Cosby Show gave us one of TV’s great feminists.

There’s a Way to Keep Ex-Cons Out of Prison That Pays for Itself. Why Don’t More States Use It?

Why Men Can Never Remember Anything

The XX Factor
Sept. 19 2014 1:11 PM Why Men Can Never Remember Anything
Behold
Sept. 19 2014 11:33 AM An Up-Close Look at the U.S.–Mexico Border
  News & Politics
Foreigners
Sept. 19 2014 1:56 PM Scotland’s Attack on the Status Quo Expect more political earthquakes across Europe.
  Business
Moneybox
Sept. 19 2014 3:24 PM Why Innovators Hate MBAs
  Life
Inside Higher Ed
Sept. 19 2014 1:34 PM Empty Seats, Fewer Donors? College football isn’t attracting the audience it used to.
  Double X
The XX Factor
Sept. 19 2014 3:07 PM Everything Is a "Women's Issue"
  Slate Plus
Slate Picks
Sept. 19 2014 12:00 PM What Happened at Slate This Week? The Slatest editor tells us to read well-informed skepticism, media criticism, and more.
  Arts
Brow Beat
Sept. 19 2014 4:03 PM Kern Your Enthusiasm: The Ubiquity of Gotham
  Technology
Future Tense
Sept. 19 2014 12:38 PM Forward, March! Nine leading climate scientists urge you to attend the People’s Climate March.
  Health & Science
Medical Examiner
Sept. 19 2014 12:13 PM The Most Terrifying Thing About Ebola  The disease threatens humanity by preying on humanity.
  Sports
Sports Nut
Sept. 18 2014 11:42 AM Grandmaster Clash One of the most amazing feats in chess history just happened, and no one noticed.