Police pull over more than 50,000 drivers on a typical day in the United States, more than 20 million stops each year. Some of these stops become part of our collective consciousness. Sandra Bland was arrested while driving through Texas and then died in her jail cell three days later. Philando Castile was stopped and killed in Minnesota, his death filmed by his girlfriend. (Last week, a jury acquitted the officer involved of all charges in the shooting.)
These and other incidents have sparked a national conversation on policing, particularly on the role of racial bias in such encounters. What would have happened had Bland or Castile been white? Absent overt discrimination, we rarely find definitive answers in individual cases. But data and statistics could allow us to identify the larger patterns.
Two years ago, we assembled an interdisciplinary team of statisticians, computer scientists, and journalists at Stanford University to study possible bias in policing. The first major obstacle we faced was that police data is often inaccessible, making rigorous analysis difficult. Traffic stops—what we hoped to analyze—are the primary means through which police interact with the public, yet there was no national database of stop records.
We filed public record requests with all 50 states to obtain details of each stop carried out by state patrol officers over the last 10 years. So far, we’ve collected over 100 million records from 31 states. The remaining 19 had a variety of responses: Some didn’t collect the data in electronic form or didn’t record the race of stopped drivers; others didn’t reply to our requests for information. Among the 31 states that did provide data, only 20 gave us information detailed enough to assess racial disparities. And in these 20 states, the data came in myriad formats, requiring thousands of hours to clean and standardize.
On Monday, we’re releasing all the data we’ve collected and cleaned, and code and tutorials to help others delve into this trove of information. We’re also releasing the preliminary results of our own analysis, using data from the 20 states with the most detailed information. (Full disclosure: This work has not yet undergone peer review, but we have released both the data and code to reproduce our analysis.) In our analysis, we focused on vehicle searches, in part because they have a clear goal—recovering contraband—which makes it easier to measure potential biases.
We found that black and Hispanic drivers were often searched on the basis of less evidence than whites. This double standard was widespread, not confined to any one state or geographic region.
To reach this conclusion, we first examined search rates. In nearly every jurisdiction, we found that stopped black and Hispanic drivers were searched more often than whites, about twice as often on average. But such statistics alone are not clear evidence of bias. If minorities also happen to carry contraband at higher rates (a hypothetical possibility, not a fact), these elevated search rates may simply reflect routine police work. Disentangling discrimination from effective policing is a challenge.
The Nobel Prize–winning economist Gary Becker proposed an elegant solution to this conundrum in the 1950s: Becker suggested one should look not at search rates but at success rates. Absent discrimination, he argued, officers should find contraband on searched minorities at the same rate as on searched whites. If searches of minorities turn up contraband at lower rates than searches of whites, for example, it indicates officers are applying a double standard and searching minorities on the basis of less evidence. If searches of whites turn up contraband at lower rates, it suggests officers are searching whites on the basis of less evidence.
Turning to the data, we found that searches of Hispanic drivers yield contraband at lower rates than searches of whites, and that searches of black drivers yield contraband at similar rates to searches of whites. Becker’s outcome test suggests Hispanic drivers are searched on the basis of less evidence than white and black drivers.
Becker’s test is a compelling measure of bias, but it’s also imperfect. The test can fail to detect discrimination, when it’s there, and can indicate discrimination when it’s not there, as we and other researchers have found.
For example, imagine that police officers stop just two types of white drivers: Some of the white drivers have a 5 percent likelihood of carrying contraband, and others have a 75 percent chance of carrying contraband. Suppose there are also just two types of black drivers: Some black drivers have a 5 percent chance of carrying contraband, and others have a 50 percent chance of carrying contraband. In this hypothetical world, police officers might reasonably search drivers who have at least a 25 percent chance of carrying something illegal—regardless of race and in an unbiased manner. In that world, the success rate of white searches is 75 percent and the success rate of black searches is 50 percent. Officers use the same standard to search each driver, and so do not discriminate, but the success rates differ. In this example, Becker’s test indicates discrimination even though there isn’t any.
Building off Becker’s ideas, we developed a more robust statistical test of discrimination based on search rates, search success rates, and the interplay between them. Our threshold test, which is forthcoming in the Annals of Applied Statistics, seeks to directly infer the standard of evidence officers require before carrying out a search—akin to the 25 percent threshold in the hypothetical scenario above.
When we applied the threshold test to the data we found evidence of bias against both black and Hispanic motorists. The inferred thresholds for searching minorities were consistently lower than the thresholds for searching whites in jurisdictions across the country. If officers held black and Hispanic drivers to the same standard as whites, tens of thousands of searches of minorities might be avoided each year.
The patterns we find are suggestive of racial bias in policing. But as with all tests of discrimination, there is a limit to what one can conclude from such statistical analysis alone. For example, lower search thresholds for these groups could be the result of nondiscriminatory factors if officers had valid reasons to suspect more serious criminal activity when searching black and Hispanic drivers compared to whites. Such possibilities would have to be assessed independently of our analysis—our study is just one step in understanding complex police interactions.
To that end, we’re launching the Stanford Open Policing Project to facilitate further discussion of police practices. In addition to releasing the data we’ve already collected—more than 130 million state patrol records and counting—we’re starting to gather city-level data. We hope this resource helps researchers, policymakers, and journalists assess and improve police contact with the public.