What Words Do Bing and Google Ban From Autocomplete?

What's to come?
Aug. 2 2013 11:43 AM

Sex, Violence, and Autocomplete Algorithms

What words do Bing and Google censor from their suggestions?

78746045
There go your plans to have Google help you out with your search for boob-related things.

Fuse

Warning: This article contains explicit language.

Autocomplete is one of those modern marvels of real-time search technology that almost feels like it’s reading your mind. Thanks to analyzing and mining what millions of other users have already searched for and clicked on, Google knows that when you start typing a query with a “d,” you’re most likely looking for a dictionary. Besides the efficiency gains of not having to type as much, suggestions can be serendipitous and educational, spurring alternative query ideas. In the process our search behavior is subtly influenced by exposure to query possibilities we may not have considered if left to ourselves.

So what happens when unsavory things, perhaps naughty or even illegal, creep into those suggestions? As a society we probably don’t want to make it easier for pedophiles to find pictures of naked children or to goad the violently predisposed with new ideas for abuse. Such suggestions get blocked and filtered—censored—for their potential to influence us.

As Google writes in its autocomplete FAQ, “we exclude a narrow class of search queries related to pornography, violence, hate speech, and copyright infringement.” Bing, on the other hand, makes sure to “filter spam” as well as to “detect adult or offensive content,” according to a recent post on the Bing blog. Such human choices set the stage for broadly specifying what types of things get censored, despite Google’s claims that autocompletions are, for the most part, “algorithmically determined … without any human intervention.”

Advertisement

What exactly are the boundaries and editorial criteria of that censorship, and how do they differ among search engines? More importantly, what kinds of mistakes do these algorithms make in applying their editorial criteria? To answer these questions, I automatically gathered autosuggest results from hundreds of queries related to sex and violence in an effort to find those that are surprising or deviant. (See my blog for the methodological detail.) The results aren’t always pretty.

Illicit sex

Armed with a list of 110 sex-related words, gathered from the linguistic extremes of both academic linguists and that tome of slang the Urban Dictionary, I first sought to understand which words resulted in zero suggestions (which likely means the word is blocked). In the following diagram, you can see words blocked only by Google or Bing, and by both or neither. For example, both algorithms think “prostitute” is just dandy, suggesting options for prostitute “phone numbers” or “websites.” They’re not about sexual deprivation: Bing is happy to complete searches for “masturbate” and “hand job.” Conspicuously, Bing does block query suggestions for “homosexual,” raising the question: Is there such a thing as a gay-friendly search engine? In response, a Microsoft spokesperson commented that, “Sometimes seemingly benign queries can lead to adult content,” and consequently are filtered from autosuggest. By that logic, it would seem that “homosexual” merely leads to “too much” adult content, causing the algorithm to flag and filter it.

130802_FUT_Diagram1-EX

Initially it would appear Google is stricter, blocking more sex-related words than Bing. But really they just have different strategies. Instead of outright blocking all suggestions for “dick” as Google does, Bing will just scrub the suggestions so you only see the clean ones, like “dick’s sporting goods.” Sometimes Bing will rewrite the query, pretending a dirty word was a typo instead. For instance, querying for “fingering” leads to wholesome dinner suggestions for “fingerling potato recipes,” and searching for “jizz” offers suggestions on “jazz,” for the musically minded searcher, of course. Both algorithms are pretty good about letting through more clinical terminology, such as “vaginas,” “nipples,” or “penises.”

For something like child pornography, the legal stakes get much higher. According to Ian Brown and Christopher Marsden in their book Regulating Code, “Many governments impose some censorship in their jurisdiction according to content that is illegal under national laws.” So it’s not entirely surprising that, in order to head off more direct government intervention, corporations like Google and Microsoft self-regulate by trying to scrub their autocomplete results clean of suggestions that lead to child pornography.

TODAY IN SLATE

Technocracy

Forget Oculus Rift

This $25 cardboard box turns your phone into an incredibly fun virtual reality experience.

The Congressional Republican Digging Through Scientists’ Grant Proposals

Renée Zellweger’s New Face Is Too Real

Sleater-Kinney Was Once America’s Best Rock Band

Can it be again?

Whole Foods Is Desperate for Customers to Feel Warm and Fuzzy Again

The XX Factor

I’m 25. I Have $250.03.

My doctors want me to freeze my eggs.

The XX Factor
Oct. 20 2014 6:17 PM I’m 25. I Have $250.03. My doctors want me to freeze my eggs.
Politics

Smash and Grab

Will competitive Senate contests in Kansas and South Dakota lead to more late-breaking races in future elections?

George Tiller’s Murderer Threatens Another Abortion Provider, Claims Free Speech

Walmart Is Crushing the Rest of Corporate America in Adopting Solar Power

  News & Politics
The World
Oct. 21 2014 3:13 PM Why Countries Make Human Rights Pledges They Have No Intention of Honoring
  Business
Moneybox
Oct. 21 2014 5:57 PM Soda and Fries Have Lost Their Charm for Both Consumers and Investors
  Life
The Vault
Oct. 21 2014 2:23 PM A Data-Packed Map of American Immigration in 1903
  Double X
The XX Factor
Oct. 21 2014 3:03 PM Renée Zellweger’s New Face Is Too Real
  Slate Plus
Behind the Scenes
Oct. 21 2014 1:02 PM Where Are Slate Plus Members From? This Weird Cartogram Explains. A weird-looking cartogram of Slate Plus memberships by state.
  Arts
Brow Beat
Oct. 21 2014 1:47 PM The Best Way to Fry an Egg
  Technology
Technology
Oct. 21 2014 5:38 PM Justified Paranoia Citizenfour offers a look into the mind of Edward Snowden.
  Health & Science
Climate Desk
Oct. 21 2014 11:53 AM Taking Research for Granted Texas Republican Lamar Smith continues his crusade against independence in science.
  Sports
Sports Nut
Oct. 20 2014 5:09 PM Keepaway, on Three. Ready—Break! On his record-breaking touchdown pass, Peyton Manning couldn’t even leave the celebration to chance.