What Words Do Bing and Google Ban From Autocomplete?

What's to come?
Aug. 2 2013 11:43 AM

Sex, Violence, and Autocomplete Algorithms

What words do Bing and Google censor from their suggestions?

(Continued from Page 1)

As shown in the next figure, both algorithms do get much stricter when you add “child” before the search term. Bing blocks “child nipple,” for instance. But there are some conspicuous failures as well. While you might think it wry that Google and Bing suggest completions for “prostitute,” the fact that Google also offers completions of “child prostitute” for “images” or “movies” is far more alarming. Moreover, searching for “child genital” or “child lover” on Google or Bing, as well as “child lust” on Google, all lead to disturbing suggestions that relate to child pornography. Querying “child lover,” for instance, offers suggestions for “child lover pics,” “child lover guide,” and “child lover chat.” Given Google and Microsoft’s available technology and resources, and combined with their ostensible commitment, it’s hard to believe that these types of errors slipped through the cracks.

A Google representative acknowledged that the company does sometimes miss things but says that it’s an active and iterative process to improve the algorithm and filter out shocking or offensive suggestions. A committee meets periodically to review complaints and suggest changes to the engineering team, which then works to tweak, tune, and bake that into the next version of the algorithm. With hundreds of updates per year, the algorithm is constantly changing—perhaps even by the time you read this article. A Microsoft rep reached for comment indicated that the people behind Bing are likewise continually improving their algorithmic filters and that if suggestions that relate to child pornography are brought to their attention, they’ll remove them.

130802_FUT_Diagram2-EX

Promoting violence?

Another editorial rule that Google incorporates into its autocomplete algorithm is to exclude suggestions that promote violence. To test its boundaries, I collected and analyzed autocomplete responses for a list of 348 verbs in the Random House “violent actions” word menu, which includes words like “brutalize” and “choke.” In particular I queried using the templates “How to X” and “How can I X” in order to find instances where the algorithm was steering users toward knowledge of how to act violently.

Advertisement

As a reflection of what people are searching for, it’s perhaps a commentary on the content of video games that many of the suggestions for violent actions were about things like how to beat a boss in a particular game. Certain queries, like “how to molest” or “how to brutalize,” were blocked as expected, but other searches did evoke suggestions about how to accomplish violence toward people or animals.

Among the more gruesome suggestions that were not blocked: “how to dismember a human body,” “how to rape a man/child/people/woman,” and “how do I scalp a person.” Some suggestions were oriented toward animal cruelty, like “how to poison a cat,” and “how to strangle a dog.” Despite any annoyance you might have with the neighbor’s barking dog, that still doesn’t make it morally permissible to strangle it—such suggestions should also be blocked.

Algorithmic governance, meet algorithmic accountability

The queries that are prohibited, like Bing’s bizarre obstruction of completions for “homosexual,” are sometimes as surprising as the things not blocked, such as the various suggestions leading to child pornography or explicit violence. As we look to algorithms to enforce morality, we need to acknowledge that they too are not perfect. And I don’t think we can ever expect them to be—filtering algorithms will always have some error margin where they let through things we might still find objectionable. But with some vigilance, we can hold such algorithms accountable and better understand the underlying human (and corporate) criteria that drive such algorithms’ moralizing.

The editorial criteria that Google and Bing embed in their algorithms tacitly reflect company values and a willingness to self-regulate in order to protect people from socially deviant suggestions. Yet this self-regulation is largely opaque, making if difficult to understand how these mostly automated systems make the decisions they do. In the absence of corporate transparency, and as more aspects of society become algorithmically driven, reverse-engineering such algorithms using data and algorithms offers one potential way to systematically penetrate that opacity and recreate an, albeit low-resolution, semblance of how everything works.

This article arises from Future Tense, a collaboration among Arizona State University, the New America Foundation, and Slate. Future Tense explores the ways emerging technologies affect society, policy, and culture. To read more, visit the Future Tense blog and the Future Tense home page. You can also follow us on Twitter.

Dr. Nicholas Diakopoulos is a Tow Fellow at the Columbia University Graduate School of Journalism. You can find out more about his research and other projects on his website.

  Slate Plus
Slate Picks
Nov. 21 2014 1:38 PM What Happened at Slate This Week? See if you can keep pace with the copy desk, Slate’s most comprehensive reading team.