It’s hard to miss Google’s “in the news” box, which often sits just above the first of the organic web results on any given search page. With Wikipedia-fed informational sidebars, campaign issue blocks, and other elements, Google continues to push organic web links farther and farther down in the results page, instead privileging these types of algorithmically curated content on your results page. Judging by the healthy SEO industry around helping sites get included in Google News, that little box with a list of three links to news articles has a lot of power to generate traffic.
That is why we at the computational journalism lab at the University of Maryland decided to investigate it. To get a better glimpse of how news articles make the list, we collected all the links from the “in the news” section of results for the searches of “Hillary Clinton” and “Donald Trump” between May 31 and July 9. In total, we gathered 5,604 links, collected every hour using an automated scraper running on an Amazon server in Northern Virginia. Some links point to the same stories—articles that were successful enough to be listed more than once in the news box, like the piece CNN published on June 3 about a Clinton foreign policy speech, titled “Hillary Clinton’s Evisceration of Donald Trump.” Perhaps because of the highly vivid title (and other reasons we will discuss in a minute), that article was incredibly successful, showing up 90 times between June 2 and June 6.
The links selected in the news box point to 113 different sources (though some of them are from the same media organization, such as Fox News and Fox News Insider—we decided to keep those separate according to the original name that Google gave to the websites). Individually, two news organizations dominate. Of the 5,604 links we collected from the news boxes on searches about Hillary Clinton and Donald Trump from May 31 to July 9, a total of 2,476 (44.2 percent) come from just two sources: CNN and the New York Times.
Not all links in the Google News box are equal. We know that in traditional Google search results, to be ranked first is a great boon for traffic: About one-third of the clicks go to the No. 1 link. It’s safe to assume that in the “in the news” list, the top ranking also provides a healthy click-through rate. So what makes a link win that top spot? Across all our data we find that CNN and the New York Times also dominate the top position, with 1,211 out of 1,868 (64.8 percent) of those links. We also found that the link in the first slot always has a snippet of text, and in 99.4 percent of the cases has a photo too. There are cases when first links don’t have images, such as this example from the New York Times—but it did have a video. So, it would appear stories with some visual content are preferred by Google’s selection algorithm when considering links for the top position.
There are, of course, other ways a news organization can gain a higher proportion of links in the “in the news” list. Timeliness seems to be a piece of that equation. Each article has an “X hours ago” label displayed when surfaced on Google. We take that label here as the “freshness” or “age” as it indicates the time since the article was initially published or updated. Below, you can see the distribution of these freshness values across our data set.
The largest proportion of links had been published about two hours prior. This might mean that the tool is selecting more articles that were published two hours ago or that updates tend to fall into that window. That is the case of that very successful CNN article “Hillary Clinton’s Evisceration of Donald Trump.” The “in the news” list picked it up not only when it was originally published, but also when it was updated, and reset the clock when it was republished.
Overall, the fraction of articles with different freshness tapers off over the course of a day, though there is a small bump again for articles with some staying power. Nineteen different articles showed up in the list even when they were already at least one day old, and most of them showed up more than once. But these might be impacted by the weekly news cycle: Seven out of 10 times that an old link appeared on the “in the news” list, it was a Saturday or Sunday. It’s not that new articles don’t still show up during the weekend but that older articles are more prevalent on these days. During weekdays, only about 2 percent of articles are more than one day old whereas during weekends that number goes up to 13 percent. One possibility is that there’s less news generated on weekend days, and therefore less competition for “in the news” slots, leading to some older articles filling in.
It’s not surprising that among the main sources of news articles for Google’s list we can find CNN, the New York Times, the big television networks, and the Washington Post. What is surprising is that not all of those sources are strictly news organizations. Some non-news sources are news generators: Twitter was linked to 214 times (117 from Clinton, 94 from Trump, and three from Chelsea Clinton), YouTube 58 times (including this lovely Japanese parody ad for Trump), and the FBI 24 times (all related to the investigation about Hillary’s email server).
And then there are news sites of questionable journalistic standards. In three cases the list linked to Breitbart News (with articles such as “Exclusive—Secret Service Agent Book: Raging Hillary Clinton Threw Bible at Agent’s Colleague” and “Hillary Clinton Blames Whites, Cops for Deaths of Young Black Men”). And one time it linked to the conspiracy theorist website InfoWars (“DONALD TRUMP SEX SCANDAL LEAKED”).
All in all, we can say that some links that pop up in the “in the news” list go against Google’s guidelines for what should be listed in Google News results. The guidelines say that “Publishers who strive to uphold the basic principles of good journalism will provide a much better user experience and consequently likely enjoy better ranking than those who spend their time looking for loopholes they can exploit.” What does that mean, though? Does this affect tabloids? What about a government-sponsored news outlet, such as Russia Today, that has one link listed (“WikiLeaks Will Publish ‘Enough Evidence’ to Indict Hillary Clinton, Warns Assange”)?
It’s important to note that the “in the news” list hasn’t always been called that. Until 2014, it was called “news for… ”, which made more clear that only actual news would be included. But in November of that year, users noticed non-news links appearing in the Google News box. Now there was room for company statements, for instance, and content from Reddit and YouTube.
In fact, when a news source becomes more than a news source and becomes also a topic of news discussion, it can get more leverage in the “in the news” list. It’s what happened to the Washington Post in June. Before the week of June 12, the Washington Post wasn’t even in the Top 5 sources of links listed in the Google News box when the searches were about Hillary Clinton or Donald Trump. Then, in the week of June 12 to June 18, the newspaper had a very strong week, coming in third, behind CNN and New York Times.
When the news were just about Donald Trump, the Washington Post even surpassed the biggest two news sources for “in the news” links in that fateful week.
That week, the Washington Post had a string of very successful articles that showed up constantly in the “in the news” list. The first one was “Donald Trump Seems to Connect President Obama to Orlando Shooting,” published June 13. That article showed up 30 times on the list. Had the reporter, Jenna Johnson, cracked the Google News code? Maybe. But there was another factor in play. Trump himself catapulted that article to a higher visibility when he took to Twitter to complain about it and used it as an argument to revoke the Post credentials to his events.
The inclusion of non-news links adds a new layer of complexity to an already unclear algorithmic selection of links. If before we didn’t exactly know what a news organization had to do to increase its share in that space—or how CNN and the New York Times can get such a big chunk—now we also don’t know what are the parameters that might warrant the inclusion of a social media link or a press release into the “in the news” list.
Earlier this year, faced with controversy over political bias in its trending topics list, Facebook changed the way it curates the data and issued a disclosure explaining how that tool works. It was an example of how transparency can help assuage fears of bias that users might have. As an important broker of information, Google should consider doing the same, publishing its criteria for what news sources or newsmakers can be selected for “in the news” and leveling the playing field. That way, a few lucky news organizations don’t dominate a majority of the political news attention directed by the search engine—and fewer untrustworthy sites can sneak in.