Could Artificial Intelligence Debunk Twitter Rumors Before They Spread?

What's to come?
Dec. 14 2012 8:52 PM

Building a Better Truth Machine

Could machine-learning algorithms help debunk Twitter rumors before they spread?

158378010
Emergency workers stand in front of the Sandy Hook School in Newtown, Conn.

Photo by Spencer Platt/Getty Images

Will Oremus Will Oremus

Will Oremus is Slate's senior technology writer.

When something momentous is unfolding—the Arab Spring, Hurricane Sandy, Friday’s horrific elementary school shooting in Connecticut—Twitter is the world's fastest, most comprehensive, and least reliable source of breaking news. If you were on the microblogging site Friday afternoon, you were among the first to hear the death toll, watch the devastated reactions, and delve into the personal details of the man the media initially identified as a killer. But there’s also a good chance you were taken in by some of the many falsehoods that were flying, like a letter one of the young victims purportedly wrote to his mother before the shooter entered the classroom. And, of course, all of those social media pages that were making the rounds turned out to belong to innocent people, including the real suspect’s brother.

The hoaxes and mistaken identities pale in importance next to Friday’s real tragedy, of course. The victims and the apparent perpetrator were already dead, and nothing anyone wrote on social media was going to change that. But they caused unnecessary hurt and confusion. And in ongoing events like natural disasters, the stakes of Twitter misinformation can be much higher. During Sandy, for instance, some tweets helped emergency responders figure out where to direct resources. Others provoked needless panic, such as one claiming that the Coney Island hospital was on fire. And a few were downright dangerous, such as the one claiming that people should stop using 911 because the lines were jammed.

Advertisement

A redeeming feature of Twitter is the relative speed with which its users manage to sniff out and debunk the most widely circulated falsehoods. On Friday, for instance, word that the media had fingered the wrong suspect was circulating on Twitter while TV networks were still running with the false reports. The New Yorker's Sasha Frere-Jones has called the site a "self-cleaning oven." In Sandy's wake, Buzzfeed's John Hermann declared it a "truth machine." But those metaphors can mislead. Keeping Twitter clean takes time and human effort, and in the interim, the gunk can cause serious problems. The Andy Carvins, Alexis Madrigals, and Tom Phillipses can do only so much. Often the false information spreads further and faster than the correction.

For the average Twitter user, that's merely a pitfall to keep in mind. But for emergency responders and humanitarian workers who would use the site to guide their efforts, it's a potentially deadly obstacle. "Verification is one of the biggest challenges to using social media in the humanitarian space," says Patrick Meier of the Qatar Foundation's Computing Research Institute, an expert on the use of technology in crisis response.

That antipathy is starting to thaw as Twitter proves its worth to aid workers in fast-moving events, such as the Haiti quake, during which reliable information is scarce. "Before the earthquake in Haiti, if you brought up social media with my colleagues at the U.N., it was basically a nonstarter,” Meier says. Still, the fear remains that reliance on Twitter will result in costly mistakes. Their solution: Build a better truth machine.

I spoke with Meier last week about new academic research that looks at the potential of machine-learning algorithms to automatically assess the credibility of information tweeted during a disaster. The idea is that, while it takes humans precious time and resources to vet the numerous tidbits that flood Twitter during a disaster, computers might one day be able to quickly and automatically offer at least a preliminary assessment of their credibility. Basically, Meier joked, "We're trying to scale Andy Carvin," the NPR online strategist who became a Twitter celebrity for his superhuman efforts to relay and vet breaking-news items during the Arab Spring.

A 2010 paper from Yahoo Research analyzed tweets from that year's 8.8 Chile earthquake and found that legitimate news—such as word that the Santiago airport had closed, that a supermarket in Concepcion was being looted, and that a tsunami had hit the coastal town of Iloca—propagated on Twitter differently than falsehoods, like the rumor that singer Ricardo Arjona had died or that a tsunami warning had been issued for Valparaiso. One key difference might sound obvious but is still quite useful: The false rumors were far more likely to be tweeted along with a question mark or some other indication of doubt or denial.

Building on that work, the authors of the 2010 study developed a machine-learning classifier that uses 16 features to assess the credibility of newsworthy tweets. Among the features that make information more credible:

- Tweets about it tend to be longer and include URLs.

- People tweeting it have higher follower counts.

- Tweets about it are negative rather than positive in tone.

- Tweets about it do not include question marks, exclamation marks, or first- or third-person pronouns.

Several of those findings were echoed in another recent study from researchers at India's Institute of Information Technology who also found that credible tweets are less likely to contain swear words and significantly more likely to contain frowny emoticons than smiley faces.

In a new paper, to be published in the journal Internet Research next month, the authors of the Chile earthquake study—Carlos Castillo, Marcelo Mendoza, and Barbara Poblete—test out their algorithm on fresh data sets and find that it works pretty well. According to Meier, their machine-learning classifier had an AUC, or “area under the curve,” of 0.86. That means that, when presented with a random false tweet and a random true tweet, it would assess the true tweet as more credible 86 percent of the time. (An AUC of 1 is perfect; an AUC of 0.5 is no better than random chance.)

My guess is that a knowledgeable and experienced human Twitter user could do better than that under most circumstances. And of course, if a given algorithm became widespread, committed trolls like the Hurricane Sandy villain @ComfortablySmug could find ways to game it. Still, an algorithm has the potential to work much faster than a human, and as it improves, it could evolve into an invaluable "first opinion" for flagging news items on Twitter that might not be true. Meier said that perhaps those flagged tweets could then be posted to a platform that invites humans to vet them and eventually either verify or debunk them for good. Even that wouldn't fully prevent Twitter lies from spreading or misleading people. But it might at least make their purveyors a little less comfortable and a little less smug.

This article arises from Future Tense, a collaboration among Arizona State University, the New America Foundation, and Slate. Future Tense explores the ways emerging technologies affect society, policy, and culture. To read more, visit the Future Tense blog and the Future Tense home page. You can also follow us on Twitter.

TODAY IN SLATE

History

The Self-Made Man

The story of America’s most pliable, pernicious, irrepressible myth.

Does Your Child Have Sluggish Cognitive Tempo? Or Is That Just a Disorder Made Up to Scare You?

Mitt Romney May Be Weighing a 2016 Run. That Would Be a Big Mistake.

Amazing Photos From Hong Kong’s Umbrella Revolution

Rehtaeh Parsons Was the Most Famous Victim in Canada. Now, Journalists Can’t Even Say Her Name.

Television

See Me

Transparent is the fall’s only great new show.

Doublex

Lena Dunham, the Book

More shtick than honesty in Not That Kind of Girl.

What a Juicy New Book About Diane Sawyer and Katie Couric Fails to Tell Us About the TV News Business

Rehtaeh Parsons Was the Most Famous Victim in Canada. Now, Journalists Can’t Even Say Her Name.

  News & Politics
Politics
Sept. 30 2014 11:57 AM Iowa Radical The GOP’s Senate candidate doesn’t want voters to know just how conservative she really is.
  Business
Moneybox
Sept. 30 2014 11:25 AM Naomi Klein Is Wrong Multinational corporations are doing more than governments to halt climate change.
  Life
The Vault
Sept. 30 2014 11:51 AM Thomas Jefferson's 1769 Newspaper Ad Seeking a Fugitive Slave 
  Double X
Doublex
Sept. 29 2014 11:43 PM Lena Dunham, the Book More shtick than honesty in Not That Kind of Girl.
  Slate Plus
Slate Picks
Sept. 30 2014 11:42 AM Listen to Our September Music Roundup Hot tracks from a cooler month, exclusively for Slate Plus members.
  Arts
Behold
Sept. 30 2014 12:10 PM Violence, Love, and Hope: Growing Up in the Bronx in the 1980s
  Technology
Future Tense
Sept. 30 2014 11:55 AM The Justice Department Is Cracking Down on Sales of Spyware Used in Stalking
  Health & Science
Bad Astronomy
Sept. 30 2014 7:30 AM What Lurks Beneath the Methane Lakes of Titan?
  Sports
Sports Nut
Sept. 28 2014 8:30 PM NFL Players Die Young. Or Maybe They Live Long Lives. Why it’s so hard to pin down the effects of football on players’ lives.