OkCupid’s User Experiments Show Us the Limits of Big Data

Decoding the tech world.
July 30 2014 2:12 PM

Big Data Is Overrated

And OkCupid’s user experiments prove it.

Modern love
OkCupid announced that it conducted experiments on its users, including lying to them about how well-matched they were with potential dates.

Photo by Massonstock/Thinkstock

Today, we live in a world of data. Twenty years ago, we didn’t. Just as computing power has exponentially increased over the last 50 years, doubling every two years or so, the amount of computational data has been doubling at a similar rate. Ninety percent of all the data in human history was created in the last two years. And the advent of “big data” brings with it such scary and Orwellian doings as Facebook conducting mood experiments on its users.

David Auerbach David Auerbach

David Auerbach is a writer and software engineer based in New York. His website is http://davidauerba.ch.

OkCupid founder Christian Rudder jumped to Facebook’s defense on Monday, talking about how the online dating service had conducted similar experiments on its millions of users, including lying to them about how well-matched they were with potential dates. (People weren’t quite as outraged as they were with Facebook, possibly because, in the words of Gawker’s Jay Hathaway, “Online dating already feels like consenting to participate in a social experiment.”)

So however bad Facebook’s experiment was, it looks like there might be a lot more of it in our future. But unlike Facebook, which published its findings in a humorless academic paper, OkCupid treated its results with some serious skepticism, raising the question: What is big data actually good for? Does it even work?

Advertisement

Not all data is equal, of course. The complete works of Isaac Newton and William Shakespeare take up about as much space as a sound file of Pharrell’s “Happy.” But even if you restrict yourself to words and numbers, the great works of human civilization have now been drowned in measurements, statistics, and status updates. Songs and texts are on the order of megabytes. There are a bit more than a million megabytes in a terabyte, which is about what it would take to store the entire printed material of the Library of Congress. The total human store of information is a few billion Libraries of Congress, measured in zettabytes (a billion terabytes).

What do we do with all our big data? The answer is often “not much.” The National Security Agency stuffs all its surveillance data into its Utah data center despite not having the tools to analyze most of it. Data storage costs have become so cheap that it’s far easier to collect petabytes than to figure out why they’re useful. Last year, market research firm Gartner put big data and many of its technologies near the top of the “peak of inflated expectations” of its hype cycle, to be followed soon by a “trough of disillusionment.”

140730_BIT_HypeExpectations

Courtesy of Gartner

Why the trough? Because big data has yet to yield big money. For all the hype about the quantified self, the Internet of things, and data science, big data has yet to yield a true killer app. Google Flu Trends is a fascinating idea, but extrapolating flu incidents from Google searches on flu keywords has not produced reliable results. The New York Times recently published a piece by Sendhil Mullainathan wondering if search queries for “slow iPhone” might imply that Apple is intentionally slowing down older iPhones as new ones are released, but he concluded merely that big data doesn’t tell us enough to know for sure.

Big data really only has one unalloyed success on its track record, and it’s an old one: Google, specifically its Web search. (Disclosure: I used to work at Google, and my wife still does.) Way back in the last century, Google found that by analyzing the entirety of the Web, a sufficient number of pages gave them the ability to obtain a) really good results for keyword searches and b) high click-through ads for those keywords. What’s more, it didn’t require any particularly sophisticated analysis of their data. Simply examining word frequencies and the link structure of the Web was enough to obtain high-quality analysis. (This has changed as SEOs and click farms have tried to game the system, but the point stands.) As artificial intelligence kingpin Peter Norvig puts it, “Simple models and a lot of data trump more elaborate models based on less data.”

Many companies, including Google itself, have tried to repeat that success since then, but no one has really succeeded. Amazon is probably the No. 2 big data success because of its recommendation engine, but Amazon’s success was still not primarily dependent on big data–style analysis in the way that Google’s core business has been. Facebook has succeeded more through viral ubiquity than through big data innovation.

The recent Facebook data science experiment is telling. Regardless of moral outrage, the problem with Facebook’s recent attempt to make its users feel bad (or good) by curating their news feeds is that the manipulation was inept—because the analysis was done using the inadequate Linguistic Inquiry and Word Count software. For example, “I don’t feel happy” and “I feel happy” both registered as “positive” updates, simply due to the presence of the word happy.

TODAY IN SLATE

Politics

Smash and Grab

Will competitive Senate contests in Kansas and South Dakota lead to more late-breaking races in future elections?

Even When They Go to College, the Poor Sometimes Stay Poor

Here’s Just How Far a Southern Woman May Have to Drive to Get an Abortion

The Most Ingenious Teaching Device Ever Invented

Marvel’s Civil War Is a Far-Right Paranoid Fantasy

It’s also a mess. Can the movies do better?

Behold

Sprawl, Decadence, and Environmental Ruin in Nevada

Space: The Next Generation

An All-Female Mission to Mars

As a NASA guinea pig, I verified that women would be cheaper to launch than men.

Watching Netflix in Bed. Hanging Bananas. Is There Anything These Hooks Can’t Solve?

The Procedural Rule That Could Prevent Gay Marriage From Reaching SCOTUS Again

  News & Politics
Politics
Oct. 20 2014 6:24 PM The GOP Can’t Quit “Willie Horton” Even though they promise to do so, again and again.
  Business
Moneybox
Oct. 20 2014 5:39 PM Whole Foods Desperately Wants Customers to Feel Warm and Fuzzy Again
  Life
Outward
Oct. 20 2014 3:16 PM The Catholic Church Is Changing, and Celibate Gays Are Leading the Way
  Double X
The XX Factor
Oct. 20 2014 6:17 PM I Am 25. I Don't Work at Facebook. My Doctors Want Me to Freeze My Eggs.
  Slate Plus
Tv Club
Oct. 20 2014 7:15 AM The Slate Doctor Who Podcast: Episode 9 A spoiler-filled discussion of "Flatline."
  Arts
Brow Beat
Oct. 20 2014 6:32 PM Taylor Swift’s Pro-Gay “Welcome to New York” Takes Her Further From Nashville Than Ever
  Technology
Future Tense
Oct. 20 2014 4:59 PM Canadian Town Cancels Outdoor Halloween Because Polar Bears
  Health & Science
Medical Examiner
Oct. 20 2014 11:46 AM Is Anybody Watching My Do-Gooding? The difference between being a hero and being an altruist.
  Sports
Sports Nut
Oct. 20 2014 5:09 PM Keepaway, on Three. Ready—Break! On his record-breaking touchdown pass, Peyton Manning couldn’t even leave the celebration to chance.