How Big Data and Spam Bots Threaten Online Discussion

What's to come?
Oct. 26 2012 9:42 AM

Muzzled by the Bots

Intermediaries online are more powerful, and more subtle, than ever before.

Iranians protest against the publication by a French satirical weekly of a cartoons depicting Prophet Mohammed.
Iranians outside France's embassy protest the publication of a cartoon about the Prophet Mohammed on Sept. 23, 2012

Photograph by ATTA Kenare/AFP/Getty Images.

“Disintermediation” is often heralded as the defining feature of the digital age. Thanks to innovative new technologies, middlemen of all stripes are believed to go the way of the dodo. Once editors, publishers, and bookstores wither, the story goes, our public life will finally be liberated from their biases, inefficiencies, and hidden agendas. To quote Amazon's Jeff Bezos—a master slayer of intermediaries if there ever were one—“even well-meaning gatekeepers slow innovation. When a platform is self-service, even the improbable ideas can get tried, because there’s no expert gatekeeper ready to say “that will never work!” Even if Bezos is right, he's missing one important aspect of this story: The digitization of our public life is also giving rise to many new intermediaries that are mostly of invisible—and possibly suspect—variety.

Consider blogging. When the first generation of bloggers got online in the late 1990s, the only intermediaries between them and the rest of the world were their hosting companies and their Internet service providers. Anyone starting a blog in 2012 is likely to end up on a commercial platform like Tumblr or WordPress, with all of their blog comments run through a third-party company like Disqus. But the intermediaries don't just stop there: Disqus itself cooperates with a company called Impermium, which relies on various machine learning tools to check whether comments posted are spam. It's the proliferation—not elimination—of intermediaries that has made blogging so widespread.  The right term here is “hyperintermediation,” not “disintermediation.”

Impermium's new service goes even further: The company claims to have developed a technology to “identify not only spam and malicious links, but all kinds of harmful content—such as violence, racism, flagrant profanity, and hate speech—and allows site owners to act on it in real-time, before it reaches readers.” It says it has 300,000 websites as clients (which is not all that surprising, if it's incorporated into widely used third-party tools like Disqus). As far as intermediaries go, this sounds very impressive: a single Californian company making decisions over what counts as hate speech and profanity for some of the world's most popular sites without anyone ever examining whether its own algorithms might be biased or excessively conservative.


Impermium's model is interesting because it adds a “big data” layer to the usual process of determining what counts as spam or hate speech. It used to be that anyone who mentions “Viagra” in his comment or blog post would be deemed a spammer and thus blocked immediately. Now Impermium claims that, by leveraging user data that come from its network of 300,000 participating websites, it can actually distinguish jokes about Viagra from spam about Viagra.

This might seem liberating: Adding context to the moderation decision could save legitimate jokes. However, in other contexts, this marriage of big data and automated content moderation might also have a darker side, particularly in undemocratic regimes, for whom a war on spam and hate speech—waged with the help of domestic spam-fighting champions—is just a pretense to suppress dissenting opinions. In their hands, solutions like Impermium's might make censorship more fine-grained and customized, eliminating the gaps that plague “dumb” systems that censor in bulk.

Bloggers in China, for example, regularly employ euphemisms and allusions to trick the censorship algorithms of the country's online platforms. A seemingly innocuous expression like “river crab” often stands in for “Internet censorship” while “vacation therapy” has been used to refer to arrests of government officials. Left uncensored—since they don't use big words like “human rights” or “democracy”—such expressions quickly become memes and trigger critical discussions about Chinese politics.

With the help of “big data,” content-moderation software can check the relative frequency with which such expressions have been used on other popular sites and investigate the actual commentators using them—who are their friends? what other articles have they commented on?—to spot suspicious euphemisms. Or they might investigate where some of the posts containing those euphemisms come from. Just imagine what kind of new censorship possibilities open up once moderation decisions can incorporate geolocational information (what some researchers already call “spatial big data”): Why not block comments, videos, or photos uploaded by anyone located in, say, Tahrir Square or some other politically explosive location?



Meet the New Bosses

How the Republicans would run the Senate.

Even by Russian Standards, Moscow’s Anti-War March Was Surprisingly Grim

I Wrote a Novel Envisioning a Nigerian Space Program. Then I Learned Nigeria Actually Has One.

The Best Thing About the People’s Climate March in NYC

Friends Was the Last Purely Pleasurable Sitcom

The Eye

This Whimsical Driverless Car Imagines Transportation in 2059

Medical Examiner

Did America Get Fat by Drinking Diet Soda?  

A high-profile study points the finger at artificial sweeteners.

The Government Is Giving Millions of Dollars in Electric-Car Subsidies to the Wrong Drivers

John Oliver Debunks the Miss America Pageant’s Claim That It Gives Out $45 Million in Scholarships

Trending News Channel
Sept. 20 2014 11:13 AM Watch Flashes of Lightning Created in a Lab  
  News & Politics
The World
Sept. 22 2014 12:30 PM Turkey Just Got Forty-Six Hostages Back From ISIS. How Did That Happen?
Sept. 22 2014 12:44 PM The U.S. Is So, So Far Behind Europe on Clean Energy
The Shortcut
Sept. 22 2014 12:31 PM Down With Loose Laces A simple trick to tighten your running shoes for good.
  Double X
The XX Factor
Sept. 22 2014 12:29 PM Escaping the Extreme Christian Fundamentalism of "Quiverfull"
  Slate Plus
Sept. 22 2014 8:08 AM Slate Voice: “Why Is So Much Honey Clover Honey?” Mike Vuolo shares the story of your honey.
Sept. 22 2014 1:10 PM One Photographer’s Beautiful and Devastating Response to Climate Change
Future Tense
Sept. 22 2014 12:14 PM Family Court Rules That You Can Serve Someone With Legal Papers Over Facebook
  Health & Science
Sept. 22 2014 12:15 PM The Changing Face of Climate Change Will the leaders of the People’s Climate March now lead the movement?
Sports Nut
Sept. 18 2014 11:42 AM Grandmaster Clash One of the most amazing feats in chess history just happened, and no one noticed.