How Big Data and Spam Bots Threaten Online Discussion

What's to come?
Oct. 26 2012 9:42 AM

Muzzled by the Bots

Intermediaries online are more powerful, and more subtle, than ever before.

Iranians protest against the publication by a French satirical weekly of a cartoons depicting Prophet Mohammed.
Iranians outside France's embassy protest the publication of a cartoon about the Prophet Mohammed on Sept. 23, 2012

Photograph by ATTA Kenare/AFP/Getty Images.

“Disintermediation” is often heralded as the defining feature of the digital age. Thanks to innovative new technologies, middlemen of all stripes are believed to go the way of the dodo. Once editors, publishers, and bookstores wither, the story goes, our public life will finally be liberated from their biases, inefficiencies, and hidden agendas. To quote Amazon's Jeff Bezos—a master slayer of intermediaries if there ever were one—“even well-meaning gatekeepers slow innovation. When a platform is self-service, even the improbable ideas can get tried, because there’s no expert gatekeeper ready to say “that will never work!” Even if Bezos is right, he's missing one important aspect of this story: The digitization of our public life is also giving rise to many new intermediaries that are mostly of invisible—and possibly suspect—variety.

Consider blogging. When the first generation of bloggers got online in the late 1990s, the only intermediaries between them and the rest of the world were their hosting companies and their Internet service providers. Anyone starting a blog in 2012 is likely to end up on a commercial platform like Tumblr or WordPress, with all of their blog comments run through a third-party company like Disqus. But the intermediaries don't just stop there: Disqus itself cooperates with a company called Impermium, which relies on various machine learning tools to check whether comments posted are spam. It's the proliferation—not elimination—of intermediaries that has made blogging so widespread.  The right term here is “hyperintermediation,” not “disintermediation.”

Impermium's new service goes even further: The company claims to have developed a technology to “identify not only spam and malicious links, but all kinds of harmful content—such as violence, racism, flagrant profanity, and hate speech—and allows site owners to act on it in real-time, before it reaches readers.” It says it has 300,000 websites as clients (which is not all that surprising, if it's incorporated into widely used third-party tools like Disqus). As far as intermediaries go, this sounds very impressive: a single Californian company making decisions over what counts as hate speech and profanity for some of the world's most popular sites without anyone ever examining whether its own algorithms might be biased or excessively conservative.

Advertisement

Impermium's model is interesting because it adds a “big data” layer to the usual process of determining what counts as spam or hate speech. It used to be that anyone who mentions “Viagra” in his comment or blog post would be deemed a spammer and thus blocked immediately. Now Impermium claims that, by leveraging user data that come from its network of 300,000 participating websites, it can actually distinguish jokes about Viagra from spam about Viagra.

This might seem liberating: Adding context to the moderation decision could save legitimate jokes. However, in other contexts, this marriage of big data and automated content moderation might also have a darker side, particularly in undemocratic regimes, for whom a war on spam and hate speech—waged with the help of domestic spam-fighting champions—is just a pretense to suppress dissenting opinions. In their hands, solutions like Impermium's might make censorship more fine-grained and customized, eliminating the gaps that plague “dumb” systems that censor in bulk.

Bloggers in China, for example, regularly employ euphemisms and allusions to trick the censorship algorithms of the country's online platforms. A seemingly innocuous expression like “river crab” often stands in for “Internet censorship” while “vacation therapy” has been used to refer to arrests of government officials. Left uncensored—since they don't use big words like “human rights” or “democracy”—such expressions quickly become memes and trigger critical discussions about Chinese politics.

With the help of “big data,” content-moderation software can check the relative frequency with which such expressions have been used on other popular sites and investigate the actual commentators using them—who are their friends? what other articles have they commented on?—to spot suspicious euphemisms. Or they might investigate where some of the posts containing those euphemisms come from. Just imagine what kind of new censorship possibilities open up once moderation decisions can incorporate geolocational information (what some researchers already call “spatial big data”): Why not block comments, videos, or photos uploaded by anyone located in, say, Tahrir Square or some other politically explosive location?

TODAY IN SLATE

Politics

Talking White

Black people’s disdain for “proper English” and academic achievement is a myth.

Hong Kong’s Protesters Are Ridiculously Polite. That’s What Scares Beijing So Much.

The One Fact About Ebola That Should Calm You: It Spreads Slowly

Operation Backbone

How White Boy Rick, a legendary Detroit cocaine dealer, helped the FBI uncover brazen police corruption.

A Jaw-Dropping Political Ad Aimed at Young Women, Apparently

The XX Factor
Oct. 1 2014 4:05 PM Today in GOP Outreach to Women: You Broads Like Wedding Dresses, Right?
Music

How Even an Old Hipster Can Age Gracefully

On their new albums, Leonard Cohen, Robert Plant, and Loudon Wainwright III show three ways.

How Tattoo Parlors Became the Barber Shops of Hipster Neighborhoods

This Gargantuan Wind Farm in Wyoming Would Be the Hoover Dam of the 21st Century

Moneybox
Oct. 1 2014 8:34 AM This Gargantuan Wind Farm in Wyoming Would Be the Hoover Dam of the 21st Century To undertake a massively ambitious energy project, you don’t need the government anymore.
  News & Politics
Politics
Oct. 1 2014 7:26 PM Talking White Black people’s disdain for “proper English” and academic achievement is a myth.
  Business
Buy a Small Business
Oct. 1 2014 11:48 PM Inking the Deal Why tattoo parlors are a great small-business bet.
  Life
Outward
Oct. 1 2014 6:02 PM Facebook Relaxes Its “Real Name” Policy; Drag Queens Celebrate
  Double X
The XX Factor
Oct. 1 2014 5:11 PM Celebrity Feminist Identification Has Reached Peak Meaninglessness
  Slate Plus
Behind the Scenes
Oct. 1 2014 3:24 PM Revelry (and Business) at Mohonk Photos and highlights from Slate’s annual retreat.
  Arts
Brow Beat
Oct. 1 2014 9:39 PM Tom Cruise Dies Over and Over Again in This Edge of Tomorrow Supercut
  Technology
Future Tense
Oct. 1 2014 6:59 PM EU’s Next Digital Commissioner Thinks Keeping Nude Celeb Photos in the Cloud Is “Stupid”
  Health & Science
Science
Oct. 1 2014 4:03 PM Does the Earth Really Have a “Hum”? Yes, but probably not the one you’re thinking.
  Sports
Sports Nut
Oct. 1 2014 5:19 PM Bunt-a-Palooza! How bad was the Kansas City Royals’ bunt-all-the-time strategy in the American League wild-card game?