Facebook invented a spam filter for clickbait. Will it save you the right click?

Facebook Has a New Algorithm to Identify “Clickbait” Headlines—and Punish Them

Facebook Has a New Algorithm to Identify “Clickbait” Headlines—and Punish Them

Future Tense
The Citizen's Guide to the Future
Aug. 4 2016 5:17 PM

Facebook Built an Algorithm to Fight Clickbait. Will It Save You the Right Click?

Facebook clickbait algorithm
Mark Zuckerberg scoffs at your pathetic attempts to mislead Facebook users.

Photo by David Ramos/Getty Images

Facebook has a fancy new weapon in its crusade against clickbait, and it works a lot like a spam filter.

Will Oremus Will Oremus

Will Oremus is Slate’s senior technology writer. Email him at will.oremus@slate.com or follow him on Twitter.

The social network announced Thursday that it has developed software to identify clickbait headlines by analyzing their text. The software will be incorporated into the company’s News Feed ranking system to downgrade posts whose headlines appear to be manipulative or misleading—and to punish the publications, websites, and Facebook pages that routinely purvey them. The more clickbait headlines a given page or web domain produces, the less often its links will appear in Facebook users’ News Feeds.

Advertisement

Here’s how the company explained the clickbait problem in a blog post announcing the change:

We’ve heard from people that they specifically want to see fewer stories with clickbait headlines or link titles. These are headlines that intentionally leave out crucial information, or mislead people, forcing people to click to find out the answer. For example: “When She Looked Under Her Couch Cushions and Saw THIS … I Was SHOCKED!”; “He Put Garlic in His Shoes Before Going to Bed And What Happens Next Is Hard to Believe”; or “The Dog Barked at the Deliveryman and His Reaction Was Priceless.”

It’s a worthy battle, and one that Facebook has been fighting for years with only partial success. Its new approach could be a valuable tool to the company’s arsenal, provided it’s implemented carefully. But it also illustrates some of the challenges the company faces in its role as arbiter of what people read on the Internet. In effect, the company is trying to automate what amounts to a subjective value judgment. And the result could be something of a blunt instrument.

In the past, Facebook has relied on users’ behavior to identify clickbait. In 2014, it announced that its News Feed software would begin limiting the reach of links that large numbers of people clicked on but then spent little time reading, which the company took as a sign that the content was not what the headline had promised. The company has told me that it also now throttles posts that many users “like” on Facebook but then come back and “unlike” after reading them.

Facebook clickbait
Facebook has been trying to fight clickbait for years.

Screenshot courtesy of Facebook

Advertisement

Those changes had some salutary effects, says Adam Mosseri, Facebook’s vice president of product management. But the company’s own user satisfaction surveys showed that clickbait remained a common complaint. “People see these kinds of headlines over and over again, and they resent it,” Mosseri told me. “They even report it as a bug sometimes.”

Now Facebook is adding a second line of defense to weed out the clickbait that the behavioral data don’t capture. The company says it assigned a team of people to read thousands of headlines and mark those that appeared to be clickbait. The labeled headlines were then used to train a type of computer program called a machine-learning classifier. Based on the training data, the classifier develops a sort of internal model of the key features that differentiate clickbait headlines from non-clickbait headlines. Its task is then to apply that model to label each new headline it encounters “in the wild” in the News Feed, classifying them as clickbait or not according to its training. This is, as the company notes, very similar to how an email spam filter works.

But email spam filters aren’t perfect, and Facebook’s classifier faces some unique obstacles. First, clickbait is a little murkier define than spam—you might even say it’s in the eye of the beholder. For example, the viral media site BuzzFeed is sometimes accused of purveying clickbait, but editor in chief Ben Smith has mounted a spirited defense of its headlines’ integrity. “31 Genius Hacks for Your Elementary School Art Class“ might not be hard news, he says, but it does give you a very good idea of what to expect if you click. Likewise “A 5-Year-Old Girl Raised Enough Money to Take Her Father Who Has Terminal Cancer to Disney World.”

So how exactly is Facebook defining clickbait for the purposes of this filter? That’s a good question, and Facebook offered a two-part answer. A clickbait headline, the company said, meets at least one of the following criteria (emphasis Facebook’s):

Advertisement
  1. The headline withholds information required to understand what the content of the article is
  2. The headline exaggerates the article to create misleading expectations for the reader.

For example, Facebook adds:

… The headline “You’ll Never Believe Who Tripped and Fell on the Red Carpet…” withholds information required to understand the article (What happened? Who Tripped?) The headline “Apples Are Actually Bad for You?!” misleads the reader (apples are only bad for you if you eat too many every day).

It’s hard to argue with those particular headlines as archetypes of the clickbait genre, and if Facebook’s software proves adept at classifying this sort of junk, then few will weep at its disappearance from the News Feed. But, as you may have noticed, Facebook’s definition relies not solely on the text of the headline, but on the headline’s relationship to the content behind it. Its classifier, however, does not attempt to examine the content—it only has access to the headline. As a result, it risks mistaking the trappings of clickbait headlines for clickbait itself.

Advertisement

That could be a problem, because it isn’t just shady, obscure click-farm sites that have attempted to game Facebook’s algorithm with baity headlines. Even reputable sites doing serious journalism have felt compelled to juice up their headlines in recent years as Facebook’s News Feed has become perhaps the most influential arbiter of what people read online. There are also plenty of edge cases in which a grabby headline arguably suits the story it accompanies. For instance, I engaged in some baiting of my own when I adorned a 2014 Slate post with a social-media teaser that read, “You’ll Never Guess the Top-Rated Restaurant in America on Yelp. Don’t Even Try.“ I imagine Facebook’s software would have little trouble tossing that one in the clickbait pile, and even its human trainers might well do the same. (So might you, dear reader!) But I chose it because the point of the post was not so much to identify the restaurant as to to highlight just how obscure and unexpected Yelp’s top-rated restaurants are, and to examine the factors in the ranking system that explain why that’s the case. The feedback I got from readers was largely positive—no one seemed to wish that someone had saved them a click.

Mosseri acknowledged those nuances when I spoke with him Wednesday ahead of the announcement. The classifier, he said, is meant to work in conjunction with the behavioral data, and the company understands that it will be imperfect. That’s why its News Feed rankings will downgrade sites only in proportion to the frequency with which their posts are identified as clickbait. The result, he said, should be that the worst offenders see their Facebook audience drop off, while the occasional offenders shouldn’t see much change. The goal, Mosseri added, is not to punish sites for certain types of headlines, but to “change the incentives” that gave rise to clickbait in the first place.

The business case for Facebook’s clickbait crusade is straightforward: If users feel like their feeds are full of misleading junk, they might eventually turn elsewhere than Facebook for a more sober news fix that respects their intelligence. (It’s worth noting that the change does not apply to Facebook’s ad algorithms, just the ones that govern organic News Feed content.) Mosseri often readily acknowledges the business interests that underlie Facebook’s News Feed changes, but in this case he insisted that wasn’t the main goal. This change, he emphasized, is the first the company has announced since it published a statement of values for the News Feed algorithm in June. The company is making it in service of one of those values in particular—the one that prioritizes “authentic communication” over clickbait, spam, scams, and the like.

Whether you buy that or not, it’s hard to argue with the general intent of the change, which is to counteract the incentives that have turned online media into a “curiosity gap” contest. Facebook’s News Feed ranking system helped to create those distorted incentives, and it bears at least some of the responsibility for realigning them in a way that better serves the interests of readers. Ultimately, it’s in the interests of the media, too: Every site that produces genuine, original content would be better off if it didn’t have to compete with those that exist solely to trick people into viewing their ads.

But fixing the clickbait problem has already proven harder for Facebook than creating it. Now we’ll see if the latest change works as intended—or if it just reshuffles the parameters that the click farms are so adept at reverse-engineering.

Future Tense is a partnership of SlateNew America, and Arizona State University.