OkCupid’s User Experiments Show Us the Limits of Big Data

Decoding the tech world.
July 30 2014 2:12 PM

Big Data Is Overrated

And OkCupid’s user experiments prove it.

(Continued from Page 1)

So problem No. 1 is that big data is frequently bad data. For a closer look at big data’s pitfalls, there’s no better example than OkCupid, which has been both honest and irreverent about its own use of data. OkCupid was an early proponent of the “quantified self” big data paradigm, asking you to answer multiple-choice questions (and ask some of your own) in order to find your “match percentage”—your compatibility, more or less—with other users. The questions can be about anything: love, sex, work, politics, hobbies, arithmetic.

Because OkCupid’s Christian Rudder is a sensible guy, he disclaims anything but entertainment value from the stats he publishes. “OkCupid doesn’t really know what it’s doing,” he wrote Monday in a blog post on his OkTrends blog, setting himself apart from the pseudoscience Facebook peddled in its academic paper. Rudder offers three examples of OkCupid’s “experiments.” Two of them seem fairly innocuous: hiding all users’ pictures for a day, and selectively hiding some users’ profile text. The third, in which people were shown false match percentages, is far more manipulative, but it still doesn’t smack of “We succeeded in making users less happy!”—which was people’s real problem with the Facebook experiment.

“Once the experiment was concluded,” Rudder notes, “the users were notified of the correct match percentage.” (Part of Rudder’s message seems to be “Look, OkCupid is more honest than Facebook!” With friends like OkCupid, Facebook doesn’t need ethics panels.)

Advertisement

Rudder’s breezy style does obscure some deeper issues. He presents some charts in order to show that looks completely trump personality and profile text, and quips, “So, your picture is worth that fabled thousand words, but your actual words are worth…almost nothing.”

140730_BIT_Ratings

Courtesy of OkCupid

The problem is that Rudder’s data is biased, partly due to the selection bias baked into OkCupid’s model. Not all OkCupid users rate all other OkCupid users; OkCupid controls which people you see (and rate) through its algorithms. It’s not as though people are presented with a random sample of potential matches—they’re being given a prefiltered list of more likely matches. Presented with people with whom you are already more compatible, you tend to let looks trump personality. But OkCupid has presumably filtered out users whose personalities aren’t simpatico with yours.

I have some anecdotal evidence for this conclusion. When a single friend of mine decided to look at his least compatible matches on OkCupid, he found a bunch of white supremacists. (I wish I were kidding.) He never would have come across these people in the normal course of events, since the low match percentage would have kept them safely hidden. So yeah, words may be worth nothing, but only once you’ve weeded out the white supremacists and other nonstarters.

Doing clean experiments under controlled conditions is difficult enough. In the real world, it is often near impossible, which is a fundamental problem with big data. Data is almost always contaminated. Providing rigorous evidence for Rudder’s conclusions would require far more control and experimentation than even Facebook would be able to achieve.

I don’t mean to say that OkCupid’s analytics are little more than the next pseudoscientific development after astrology and Myers-Briggs personality types. Rudder’s conclusions are suggestive (as are Facebook’s), but they don’t meet the bar for scientific rigor, as Rudder himself admits.

Rudder makes a very astute observation at the end of his post, which is that people are more likely to continue conversing with people with whom they have a high match percentage, even if that match percentage is actually false. True or false, data can dictate people’s responses, which can set up a nasty feedback loop in which false data becomes more true. This is one of the biggest sources of big data contamination: It does not work on a closed system. Instead, it puts unproven ideas into the social mediasphere—for example, that certain Google autocomplete results are racist and sexist—and those ideas get bounced around, reiterated, and reflected in the very phenomena big data is measuring. If, say, Facebook decides that users who have written a status update about the “daily grind” are really into grindcore music, or if Amazon accidentally recommends The Road to everyone who buys The Road Less Traveled, it’s easy to imagine that those companies could successfully encourage correlations that don’t exist. That’s why big data’s greatest trick has been convincing the world that it works.

TODAY IN SLATE

Culturebox

The Ebola Story

How our minds build narratives out of disaster.

The Budget Disaster That Completely Sabotaged the WHO’s Response to Ebola

PowerPoint Is the Worst, and Now It’s the Latest Way to Hack Into Your Computer

The Shooting Tragedies That Forged Canada’s Gun Politics

A Highly Unscientific Ranking of Crazy-Old German Beers

Education

Welcome to 13th Grade!

Some high schools are offering a fifth year. That’s a great idea.

Culturebox

The Actual World

“Mount Thoreau” and the naming of things in the wilderness.

Want Kids to Delay Sex? Let Planned Parenthood Teach Them Sex Ed.

Would You Trust Walmart to Provide Your Health Care? (You Should.)

  News & Politics
The World
Oct. 22 2014 2:05 PM Paul Farmer Says Up to Ninety Percent of Those Infected Should Survive Ebola. Is He Right?
  Business
Business Insider
Oct. 22 2014 2:27 PM Facebook Made $595 Million in the U.K. Last Year. It Paid $0 in Taxes
  Life
The Eye
Oct. 22 2014 1:01 PM The Surprisingly Xenophobic Origins of Wonder Bread
  Double X
The XX Factor
Oct. 22 2014 10:00 AM On the Internet, Men Are Called Names. Women Are Stalked and Sexually Harassed.
  Slate Plus
Tv Club
Oct. 22 2014 5:27 PM The Slate Walking Dead Podcast A spoiler-filled discussion of Episodes 1 and 2.
  Arts
Brow Beat
Oct. 22 2014 10:39 PM Avengers: Age of Ultron Looks Like a Fun, Sprawling, and Extremely Satisfying Sequel
  Technology
Future Tense
Oct. 22 2014 2:59 PM Netizen Report: Twitter Users Under Fire in Mexico, Venezuela, Turkey
  Health & Science
Wild Things
Oct. 22 2014 2:42 PM Orcas, Via Drone, for the First Time Ever
  Sports
Sports Nut
Oct. 20 2014 5:09 PM Keepaway, on Three. Ready—Break! On his record-breaking touchdown pass, Peyton Manning couldn’t even leave the celebration to chance.