What Words Do Bing and Google Ban From Autocomplete?

What's to come?
Aug. 2 2013 11:43 AM

Sex, Violence, and Autocomplete Algorithms

What words do Bing and Google censor from their suggestions?

78746045
There go your plans to have Google help you out with your search for boob-related things.

Fuse

Warning: This article contains explicit language.

Autocomplete is one of those modern marvels of real-time search technology that almost feels like it’s reading your mind. Thanks to analyzing and mining what millions of other users have already searched for and clicked on, Google knows that when you start typing a query with a “d,” you’re most likely looking for a dictionary. Besides the efficiency gains of not having to type as much, suggestions can be serendipitous and educational, spurring alternative query ideas. In the process our search behavior is subtly influenced by exposure to query possibilities we may not have considered if left to ourselves.

So what happens when unsavory things, perhaps naughty or even illegal, creep into those suggestions? As a society we probably don’t want to make it easier for pedophiles to find pictures of naked children or to goad the violently predisposed with new ideas for abuse. Such suggestions get blocked and filtered—censored—for their potential to influence us.

As Google writes in its autocomplete FAQ, “we exclude a narrow class of search queries related to pornography, violence, hate speech, and copyright infringement.” Bing, on the other hand, makes sure to “filter spam” as well as to “detect adult or offensive content,” according to a recent post on the Bing blog. Such human choices set the stage for broadly specifying what types of things get censored, despite Google’s claims that autocompletions are, for the most part, “algorithmically determined … without any human intervention.”

Advertisement

What exactly are the boundaries and editorial criteria of that censorship, and how do they differ among search engines? More importantly, what kinds of mistakes do these algorithms make in applying their editorial criteria? To answer these questions, I automatically gathered autosuggest results from hundreds of queries related to sex and violence in an effort to find those that are surprising or deviant. (See my blog for the methodological detail.) The results aren’t always pretty.

Illicit sex

Armed with a list of 110 sex-related words, gathered from the linguistic extremes of both academic linguists and that tome of slang the Urban Dictionary, I first sought to understand which words resulted in zero suggestions (which likely means the word is blocked). In the following diagram, you can see words blocked only by Google or Bing, and by both or neither. For example, both algorithms think “prostitute” is just dandy, suggesting options for prostitute “phone numbers” or “websites.” They’re not about sexual deprivation: Bing is happy to complete searches for “masturbate” and “hand job.” Conspicuously, Bing does block query suggestions for “homosexual,” raising the question: Is there such a thing as a gay-friendly search engine? In response, a Microsoft spokesperson commented that, “Sometimes seemingly benign queries can lead to adult content,” and consequently are filtered from autosuggest. By that logic, it would seem that “homosexual” merely leads to “too much” adult content, causing the algorithm to flag and filter it.

130802_FUT_Diagram1-EX

Initially it would appear Google is stricter, blocking more sex-related words than Bing. But really they just have different strategies. Instead of outright blocking all suggestions for “dick” as Google does, Bing will just scrub the suggestions so you only see the clean ones, like “dick’s sporting goods.” Sometimes Bing will rewrite the query, pretending a dirty word was a typo instead. For instance, querying for “fingering” leads to wholesome dinner suggestions for “fingerling potato recipes,” and searching for “jizz” offers suggestions on “jazz,” for the musically minded searcher, of course. Both algorithms are pretty good about letting through more clinical terminology, such as “vaginas,” “nipples,” or “penises.”

For something like child pornography, the legal stakes get much higher. According to Ian Brown and Christopher Marsden in their book Regulating Code, “Many governments impose some censorship in their jurisdiction according to content that is illegal under national laws.” So it’s not entirely surprising that, in order to head off more direct government intervention, corporations like Google and Microsoft self-regulate by trying to scrub their autocomplete results clean of suggestions that lead to child pornography.

TODAY IN SLATE

Frame Game

Hard Knocks

I was hit by a teacher in an East Texas public school. It taught me nothing.

Republicans Like Scott Walker Are Building Campaigns Around Problems That Don’t Exist

Why Greenland’s “Dark Snow” Should Worry You

If You’re Outraged by the NFL, Follow This Satirical Blowhard on Twitter

The Best Way to Organize Your Fridge

The World

Iran and the U.S. Are Allies

They’re just not ready to admit it yet.

Sports Nut

Giving Up on Goodell

How the NFL lost the trust of its most loyal reporters.

Chief Justice John Roberts Says $1,000 Can’t Buy Influence in Congress. Looks Like He’s Wrong.

Farewell! Emily Bazelon on What She Will Miss About Slate.

  News & Politics
Foreigners
Sept. 16 2014 4:08 PM More Than Scottish Pride Scotland’s referendum isn’t about nationalism. It’s about a system that failed, and a new generation looking to take a chance on itself. 
  Business
Moneybox
Sept. 16 2014 2:35 PM Germany’s Nationwide Ban on Uber Lasted All of Two Weeks
  Life
The Eye
Sept. 16 2014 12:20 PM These Outdoor Cat Shelters Have More Style Than the Average Home
  Double X
The XX Factor
Sept. 15 2014 3:31 PM My Year As an Abortion Doula
  Slate Plus
Slate Plus Video
Sept. 16 2014 2:06 PM A Farewell From Emily Bazelon The former senior editor talks about her very first Slate pitch and says goodbye to the magazine.
  Arts
Brow Beat
Sept. 16 2014 1:27 PM The Veronica Mars Spinoff Is Just Amusing Enough to Keep Me Watching
  Technology
Future Tense
Sept. 16 2014 1:48 PM Why We Need a Federal Robotics Commission
  Health & Science
Science
Sept. 16 2014 1:39 PM The Case of the Missing Cerebellum How did a Chinese woman live 24 years missing part of her brain?
  Sports
Sports Nut
Sept. 15 2014 9:05 PM Giving Up on Goodell How the NFL lost the trust of its most loyal reporters.