Bitwise

Big Anecdata

Studies that explain systemic racism using Google searches are fascinating. Don’t trust them.

Do your Web searches prove whether you’re racist?

Photo by antb/Shutterstock

Can Google diagnose racism? In a recent study published in the journal PLOS ONE, researchers claimed to have found a link between a geographic area’s black mortality rate and the frequency of Google searches for the racial epithet nigger. Since few people would say outright that they’re racist, the authors of the study suggest that Google search statistics could be a valuable metric for gauging the actual level of racist sentiment in an area—and even be an indicator of systemic racial oppression. Like many excursions into big data—the mining of gigantic, Google-size data sets for insights—this one is messy, raising more questions than it answers. But it’s also unusually powerful, because a statistical approach to racial issues confronts us with uncomfortable facts, even if it’s sometimes hard to know what to make of them.

These sorts of big-data studies are leading indicators of our era. While our culture still exalts the freedom and moral autonomy of the individual, macro-level studies, whether economic or sociological, increasingly provide us with evidence of a far more predetermined world. Google Flu Trends, which tracked Google searches for flu symptoms and flu questions by geographical area, purports to be as good or better a leading indicator of flu outbreaks as reports from doctors. As I wrote last year, the secrecy of Google’s methodology and the poorness of some of its results had called Flu Trends’ success somewhat into question, but the underlying correlation between flu-related searches and cases of the flu remains. (Disclosure: I used to work for Google, and my wife still does.) Just as economic studies can show flattening pay and a growing income divide even though no single action created those circumstances, a single person can deny being racist and be difficult to challenge, yet statistics can prove that our society en masse remains racist. And so we have to ask what we can do about it. And we have to accept that the macro-world is increasingly part of our everyday micro-lives.

Racism can’t be quantified like dollars and cents, however, so when it comes to studies measuring it or other sentiments, we should be very careful in coding and interpreting the factors. In this new study, there are two correlative links that require explanation. The first is the inferred link between the searches themselves and the presence of racist attitudes. The study, led by University of Maryland epidemiologist David Chae, builds on Google search data collected by data scientist Seth Stephens-Davidowitz, which he used to show a correlation between racial animus and lower presidential vote percentages for Barack Obama in a 2014 study. Stephens-Davidowitz used Google Trends (as well as some clever tricks that pulled finer-grain data from Google’s interface) to identify the search volume for the N-word by U.S. media market. (He screened out variant spellings commonly used in rap lyrics, such as nigga, though false positives may remain in the data, like a search for certain comedy albums by Richard Pryor.) What caused people to type in the epithet? Frequently, jokes: “A huge proportion of the searches I looked at were for jokes about African-Americans.” The data found the highest volume of such searches in nearly all of Appalachia, parts of the Deep South, and isolated other areas. Notably, the Utica, New York, region had the second-highest search volume, topped only by the Bluefield-Beckley-Oak Hill region of West Virginia. The study couldn’t say why Utica appeared to be so fond of racist jokes, but such statistical results point to potential problem spots in a way that interviewing people on the street might not.

But while the increased presence of racially charged searches intuitively suggests racial animosity in an area, the reverse isn’t necessarily true. Stephens-Davidowitz’s data could well provide a sufficient condition for racial animus, but it is not a necessary one. Obviously, you don’t need to use racist language to be racist. This point bears on Chae, et al.’s study, which goes one step further in attempting to establish a correlation between racist Google searches and black mortality from 2004 to 2009. According to the researchers, the study suggests that “racism shapes patterns in mortality and generates racial disparities in health.”

Proportion of Google queries containing the “N-word” by designated market area, 2004–2007. c/o Chae DH, Clouston S, Hatzenbuehler ML, Kramer MR, Cooper HLF, et al. (2015) Association between an Internet-Based Measure of Area Racism and Black Mortality. PLoS ONE 10(4): e0122963. doi:10.1371/journal.pone.0122963

Chae, et al.’s study goes on step further, finding a correlation between Stephens-Davidowitz’s Google-driven “area racism” results and black mortality rates in the same area. The study concludes that “living in an area characterized by a one standard deviation greater proportion of racist Google searches is associated with an 8.2% increase in the all-cause mortality rate among Blacks. This effect estimate amounts to over 30,000 deaths among Blacks annually nationwide.” That is a notable finding, but it is a broad correlation that lumps together some disparate data. When controlling for white mortality rates, the 8.2 percent increase drops to 3.6 percent, suggesting that area racism as the study defines it is correlated with higher mortality rates across the board. (Thus I disagree with the authors’ choice to promote the 8.2 percent number in the press release.) The study generalizes across all causes of death, but the authors do break down specific causes, finding that while area racism does correlate with an increase in blacks’ deaths by heart disease, stroke, and cancer, it does not correlate with an increase in deaths by diabetes. The study suggests that these deaths may owe to increased “psychosocial stress” stemming from racial discrimination. But without controlling for other variables like quality of health care, occupation, diet, and the level of social services—all of which can also reflect the racial biases of an area—it is highly speculative to identify psychosocial stress as a major determining factor. While studies have shown that experiencing racial discrimination does cause notable increases in stress levels, the large-scale impact of that stress by itself has yet to be measured.

As a result the study is more suggestive than it is convincing. It is indicative of a serious problem, but it can’t quite identify what that problem is due to the generality of the variables. And it again raises the question of whether such Google searches are sufficiently reliable indicators. The area-racism results broadly match up with conventional wisdom—Appalachia and the Deep South have especially troubled racial histories—but that shouldn’t exonerate other areas that just happen not to look up racist jokes with racial epithets in them. In the future, I’d like to see researchers correlate mortality rates with other indicators of area racism to see if the Google searches are better or worse indicators.

For now, as an exemplar of careful statistical research I would instead point to “An Analysis of the New York City Police Department’s ‘Stop-and-Frisk’ Policy in the Context of Claims of Racial Bias,” Andrew Gelman, Jeffrey Fagan, and Alex Kiss’s meticulous study that identified real racial differences in the enforcement of New York City’s stop-and-frisk policy. Because they obtained exact data on 125,000 pedestrian stops, the authors could report that blacks and Hispanics were stopped roughly twice as often as whites, even after controlling for different crime rates among populations, and thus conclude that “the police are disproportionately stopping minorities.” In contrast, Chae, et al.’s study posits a troubling correlation without pinning down a specific agent of that correlation.

Chae, et al. write, “Using ‘big data’ and aggregating millions of Internet searches yields a high signal-to-noise ratio despite potential sources of measurement error.” This is true to a point. The results are meaningful, but the interpretation of search results is always going to be dicey, owing to the sheer lack of context surrounding a Google search. This study appears to pick out a search term that has more fixed implications regardless of context because of how charged it is, yet even here, it cannot be taken as a strong measure of racism, only as a weak indicator. I may have driven up Brooklyn’s score just in the course of researching this article, despite having no racist intent.

For anyone who seeks to understand the world, the danger lies not in big data but in how we code that data. To draw the conclusions they do, Chae and his colleagues code “people searching for racial epithets” as “evidence of area racism.” Generalizing typical sentiment from a group of heterogeneous sentiments shouldn’t be done haphazardly; in this case it might have been warranted, but in many it would not be. Sociologist Richard Biernacki discussed these sorts of problems in his 2012 book Reinventing Evidence in Social Inquiry, taking apart a 1987 study by Wendy Griswold that purported to link the popularity of George Lamming’s novels with various literary characteristics mentioned in reviews of those books. He found that the coding was highly dubious, as when Griswold interpreted mentions of “ambiguity” as a positive assessment in reviews and an indicator of popularity, when in fact reviewers tended to mention it as a negative quality, a distinction Griswold never discussed. (Ambiguity, it turns out, is ambiguous.) Combined with automated sentiment analysis, studies can become totally worthless. None of the big-data analyses of Gamergate showed much of anything about harassment, positive or negative, despite attempts on both sides to spin the results. Newsweek’s Taylor Wofford claimed that a Brandwatch study of Gamergate tweets showed that Gamergate was mostly about harassment, except that Brandwatch’s classifier wasn’t able to determine whether 90 percent of the tweets were positive or negative. The study showed nothing, but Newsweek wrote it up anyway. Another study coded @ tweets that didn’t provoke responses as “harassment” regardless of their content, which means we’re all harassers now. Fancy charts and scientific dressing can too easily obscure sloppy coding, creating the sort of rushes to judgment we’re seeing in the embrace of quantitative metrics of teacher and school performance and the highly dubious field of automated essay grading.

On the other hand, such “anecdata”—what one can call imprecise big data—can serve as a useful synecdoche. We live now in a world where individual stories, even terrible ones, can be explained away as aberrations. It’s taken an unfortunate number of high-profile incidents like the deaths of Michael Brown and Freddie Gray to suggest that law enforcement may have a serious racial bias problem. Studies like Gelman’s in particular are useful in suggesting that these incidents are not aberrations but are in fact representative of a systemic problem. Studies like Chae’s aren’t as powerful, but they do provide additional data points for arguing that racism remains present and harmful. This will probably not come as a shock to most Slate readers, but there are those to whom this case must still be made, and it’s worth making that case. The danger is only in taking these studies’ conclusions as proof rather than evidence.

This article is part of Future Tense, a collaboration among Arizona State UniversityNew America, and Slate. Future Tense explores the ways emerging technologies affect society, policy, and culture. To read more, visit the Future Tense blog and the Future Tense home page. You can also follow us on Twitter.