What Words Do Bing and Google Ban From Autocomplete?

What's to come?
Aug. 2 2013 11:43 AM

Sex, Violence, and Autocomplete Algorithms

What words do Bing and Google censor from their suggestions?

78746045
There go your plans to have Google help you out with your search for boob-related things.

Fuse

Warning: This article contains explicit language.

Autocomplete is one of those modern marvels of real-time search technology that almost feels like it’s reading your mind. Thanks to analyzing and mining what millions of other users have already searched for and clicked on, Google knows that when you start typing a query with a “d,” you’re most likely looking for a dictionary. Besides the efficiency gains of not having to type as much, suggestions can be serendipitous and educational, spurring alternative query ideas. In the process our search behavior is subtly influenced by exposure to query possibilities we may not have considered if left to ourselves.

So what happens when unsavory things, perhaps naughty or even illegal, creep into those suggestions? As a society we probably don’t want to make it easier for pedophiles to find pictures of naked children or to goad the violently predisposed with new ideas for abuse. Such suggestions get blocked and filtered—censored—for their potential to influence us.

As Google writes in its autocomplete FAQ, “we exclude a narrow class of search queries related to pornography, violence, hate speech, and copyright infringement.” Bing, on the other hand, makes sure to “filter spam” as well as to “detect adult or offensive content,” according to a recent post on the Bing blog. Such human choices set the stage for broadly specifying what types of things get censored, despite Google’s claims that autocompletions are, for the most part, “algorithmically determined … without any human intervention.”

Advertisement

What exactly are the boundaries and editorial criteria of that censorship, and how do they differ among search engines? More importantly, what kinds of mistakes do these algorithms make in applying their editorial criteria? To answer these questions, I automatically gathered autosuggest results from hundreds of queries related to sex and violence in an effort to find those that are surprising or deviant. (See my blog for the methodological detail.) The results aren’t always pretty.

Illicit sex

Armed with a list of 110 sex-related words, gathered from the linguistic extremes of both academic linguists and that tome of slang the Urban Dictionary, I first sought to understand which words resulted in zero suggestions (which likely means the word is blocked). In the following diagram, you can see words blocked only by Google or Bing, and by both or neither. For example, both algorithms think “prostitute” is just dandy, suggesting options for prostitute “phone numbers” or “websites.” They’re not about sexual deprivation: Bing is happy to complete searches for “masturbate” and “hand job.” Conspicuously, Bing does block query suggestions for “homosexual,” raising the question: Is there such a thing as a gay-friendly search engine? In response, a Microsoft spokesperson commented that, “Sometimes seemingly benign queries can lead to adult content,” and consequently are filtered from autosuggest. By that logic, it would seem that “homosexual” merely leads to “too much” adult content, causing the algorithm to flag and filter it.

130802_FUT_Diagram1-EX

Initially it would appear Google is stricter, blocking more sex-related words than Bing. But really they just have different strategies. Instead of outright blocking all suggestions for “dick” as Google does, Bing will just scrub the suggestions so you only see the clean ones, like “dick’s sporting goods.” Sometimes Bing will rewrite the query, pretending a dirty word was a typo instead. For instance, querying for “fingering” leads to wholesome dinner suggestions for “fingerling potato recipes,” and searching for “jizz” offers suggestions on “jazz,” for the musically minded searcher, of course. Both algorithms are pretty good about letting through more clinical terminology, such as “vaginas,” “nipples,” or “penises.”

For something like child pornography, the legal stakes get much higher. According to Ian Brown and Christopher Marsden in their book Regulating Code, “Many governments impose some censorship in their jurisdiction according to content that is illegal under national laws.” So it’s not entirely surprising that, in order to head off more direct government intervention, corporations like Google and Microsoft self-regulate by trying to scrub their autocomplete results clean of suggestions that lead to child pornography.

TODAY IN SLATE

History

Slate Plus Early Read: The Self-Made Man

The story of America’s most pliable, pernicious, irrepressible myth.

Rehtaeh Parsons Was the Most Famous Victim in Canada. Now, Journalists Can’t Even Say Her Name.

Mitt Romney May Be Weighing a 2016 Run. That Would Be a Big Mistake.

Amazing Photos From Hong Kong’s Umbrella Revolution

Transparent Is the Fall’s Only Great New Show

The XX Factor

Rehtaeh Parsons Was the Most Famous Victim in Canada

Now, journalists can't even say her name.

Doublex

Lena Dunham, the Book

More shtick than honesty in Not That Kind of Girl.

What a Juicy New Book About Diane Sawyer and Katie Couric Fails to Tell Us About the TV News Business

Does Your Child Have Sluggish Cognitive Tempo? Or Is That Just a Disorder Made Up to Scare You?

  News & Politics
History
Sept. 29 2014 11:45 PM The Self-Made Man The story of America’s most pliable, pernicious, irrepressible myth.
  Business
Moneybox
Sept. 29 2014 7:01 PM We May Never Know If Larry Ellison Flew a Fighter Jet Under the Golden Gate Bridge
  Life
Dear Prudence
Sept. 29 2014 3:10 PM The Lonely Teetotaler Prudie counsels a letter writer who doesn’t drink alcohol—and is constantly harassed by others for it.
  Double X
Doublex
Sept. 29 2014 11:43 PM Lena Dunham, the Book More shtick than honesty in Not That Kind of Girl.
  Slate Plus
Slate Fare
Sept. 29 2014 8:45 AM Slate Isn’t Too Liberal, but … What readers said about the magazine’s bias and balance.
  Arts
Brow Beat
Sept. 29 2014 9:06 PM Paul Thomas Anderson’s Inherent Vice Looks Like a Comic Masterpiece
  Technology
Future Tense
Sept. 29 2014 11:56 PM Innovation Starvation, the Next Generation Humankind has lots of great ideas for the future. We need people to carry them out.
  Health & Science
Medical Examiner
Sept. 29 2014 11:32 PM The Daydream Disorder Is sluggish cognitive tempo a disease or disease mongering?
  Sports
Sports Nut
Sept. 28 2014 8:30 PM NFL Players Die Young. Or Maybe They Live Long Lives. Why it’s so hard to pin down the effects of football on players’ lives.