OkCupid’s User Experiments Show Us the Limits of Big Data

Decoding the tech world.
July 30 2014 2:12 PM

Big Data Is Overrated

And OkCupid’s user experiments prove it.

(Continued from Page 1)

So problem No. 1 is that big data is frequently bad data. For a closer look at big data’s pitfalls, there’s no better example than OkCupid, which has been both honest and irreverent about its own use of data. OkCupid was an early proponent of the “quantified self” big data paradigm, asking you to answer multiple-choice questions (and ask some of your own) in order to find your “match percentage”—your compatibility, more or less—with other users. The questions can be about anything: love, sex, work, politics, hobbies, arithmetic.

Because OkCupid’s Christian Rudder is a sensible guy, he disclaims anything but entertainment value from the stats he publishes. “OkCupid doesn’t really know what it’s doing,” he wrote Monday in a blog post on his OkTrends blog, setting himself apart from the pseudoscience Facebook peddled in its academic paper. Rudder offers three examples of OkCupid’s “experiments.” Two of them seem fairly innocuous: hiding all users’ pictures for a day, and selectively hiding some users’ profile text. The third, in which people were shown false match percentages, is far more manipulative, but it still doesn’t smack of “We succeeded in making users less happy!”—which was people’s real problem with the Facebook experiment.

“Once the experiment was concluded,” Rudder notes, “the users were notified of the correct match percentage.” (Part of Rudder’s message seems to be “Look, OkCupid is more honest than Facebook!” With friends like OkCupid, Facebook doesn’t need ethics panels.)

Advertisement

Rudder’s breezy style does obscure some deeper issues. He presents some charts in order to show that looks completely trump personality and profile text, and quips, “So, your picture is worth that fabled thousand words, but your actual words are worth…almost nothing.”

140730_BIT_Ratings

Courtesy of OkCupid

The problem is that Rudder’s data is biased, partly due to the selection bias baked into OkCupid’s model. Not all OkCupid users rate all other OkCupid users; OkCupid controls which people you see (and rate) through its algorithms. It’s not as though people are presented with a random sample of potential matches—they’re being given a prefiltered list of more likely matches. Presented with people with whom you are already more compatible, you tend to let looks trump personality. But OkCupid has presumably filtered out users whose personalities aren’t simpatico with yours.

I have some anecdotal evidence for this conclusion. When a single friend of mine decided to look at his least compatible matches on OkCupid, he found a bunch of white supremacists. (I wish I were kidding.) He never would have come across these people in the normal course of events, since the low match percentage would have kept them safely hidden. So yeah, words may be worth nothing, but only once you’ve weeded out the white supremacists and other nonstarters.

Doing clean experiments under controlled conditions is difficult enough. In the real world, it is often near impossible, which is a fundamental problem with big data. Data is almost always contaminated. Providing rigorous evidence for Rudder’s conclusions would require far more control and experimentation than even Facebook would be able to achieve.

I don’t mean to say that OkCupid’s analytics are little more than the next pseudoscientific development after astrology and Myers-Briggs personality types. Rudder’s conclusions are suggestive (as are Facebook’s), but they don’t meet the bar for scientific rigor, as Rudder himself admits.

Rudder makes a very astute observation at the end of his post, which is that people are more likely to continue conversing with people with whom they have a high match percentage, even if that match percentage is actually false. True or false, data can dictate people’s responses, which can set up a nasty feedback loop in which false data becomes more true. This is one of the biggest sources of big data contamination: It does not work on a closed system. Instead, it puts unproven ideas into the social mediasphere—for example, that certain Google autocomplete results are racist and sexist—and those ideas get bounced around, reiterated, and reflected in the very phenomena big data is measuring. If, say, Facebook decides that users who have written a status update about the “daily grind” are really into grindcore music, or if Amazon accidentally recommends The Road to everyone who buys The Road Less Traveled, it’s easy to imagine that those companies could successfully encourage correlations that don’t exist. That’s why big data’s greatest trick has been convincing the world that it works.

David Auerbach is a writer and software engineer based in New York. His website is http://davidauerba.ch.

TODAY IN SLATE

Foreigners

More Than Scottish Pride

Scotland’s referendum isn’t about nationalism. It’s about a system that failed, and a new generation looking to take a chance on itself. 

What Charles Barkley Gets Wrong About Corporal Punishment and Black Culture

Why Greenland’s “Dark Snow” Should Worry You

Three Talented Actresses in Three Terrible New Shows

Why Do Some People See the Virgin Mary in Grilled Cheese?

The science that explains the human need to find meaning in coincidences.

Jurisprudence

Happy Constitution Day!

Too bad it’s almost certainly unconstitutional.

Is It Worth Paying Full Price for the iPhone 6 to Keep Your Unlimited Data Plan? We Crunch the Numbers.

What to Do if You Literally Get a Bug in Your Ear

  News & Politics
Books
Sept. 17 2014 10:36 AM MacArthur Fellow Alison Bechdel Recounts Telling Her Mother About Her Best-Selling Memoir MacArthur Fellow Alison Bechdel recounts telling her mother about her best-selling memoir.
  Business
Moneybox
Sept. 16 2014 4:16 PM The iPhone 6 Marks a Fresh Chance for Wireless Carriers to Kill Your Unlimited Data
  Life
The Eye
Sept. 16 2014 12:20 PM These Outdoor Cat Shelters Have More Style Than the Average Home
  Double X
The XX Factor
Sept. 15 2014 3:31 PM My Year As an Abortion Doula
  Slate Plus
Slate Fare
Sept. 17 2014 9:37 AM Is Slate Too Liberal?  A members-only open thread.
  Arts
Behold
Sept. 17 2014 11:06 AM Inside the Exclusive World of Members-Only Clubs
  Technology
Future Tense
Sept. 17 2014 11:14 AM How Does That Geometry Problem Make You Feel? Computer tutors that can read students’ emotions.
  Health & Science
Bad Astronomy
Sept. 17 2014 11:18 AM A Bridge Across the Sky
  Sports
Sports Nut
Sept. 15 2014 9:05 PM Giving Up on Goodell How the NFL lost the trust of its most loyal reporters.