The Growing Problem of Data-Driven Discrimination

What's to come?
June 6 2014 8:22 AM

Data-Driven Discrimination

How algorithms can help perpetuate poverty and inequality.

Bus Stop.
A transit agency may pledge to open its data to inspire the creation of apps like “Next Bus,” which simplify how we plan trips. But poorer localities often lack the resources to produce or share transit data, meaning some neighborhoods become dead zones.

Photo by Mark Kolbe/Getty Images

This article originally appeared in the New America Foundation’s Weekly Wonk.

In 1977, the U.S. agency of Housing and Urban Development audited the real estate industry and discovered that blacks were shown fewer properties (or were told they were unavailable) and treated less courteously than their white counterparts. Today, the Information Age has introduced modern discrimination problems that can be harder to trace: From search engines to recommendation platforms, systems that rely on big data could be unlocking new powers of prejudice. But how do we figure out which systems are disadvantaging vulnerable populations—and stop them?

Here’s where it gets tricky: Unlike the mustache-twiddling racists of yore, conspiring to segregate and exploit particular groups, redlining in the Information Age can happen at the hand of well-meaning coders crafting exceedingly complex algorithms. One reason is because algorithms learn from one another and iterate into new forms, making them inscrutable to even the coders responsible for creating them, it’s harder for concerned parties to find the smoking gun of wrongdoing. (Of course, some coders or overseeing institutions are less well-meaning than others—see the examples to come).

Advertisement

So, how do we even begin to unravel the puzzle of data-driven discrimination? By first examining some of its historical roots. A recent Open Technology Institute conference suggested that high-tech, data-driven systems reflect specific, historical beliefs about inequality and how to deal with it. (One of us works at OTI; OTI is part of the New America Foundation, which is a partner with Slate and Arizona State University in Future Tense.) Take welfare in the United States. In the ’70s, policymakers began floating the idea that they could slash poverty levels by getting individuals off welfare rolls. As part of that process, the government computerized welfare case management systems—which would make it easier to track who was eligible to receive benefits and who should be kicked off. Today, these case management systems are even more efficient at determining program eligibility. The upshot? Computerized systems reduce caseloads in an increasingly black-box manner. The downside? They do so blindly—kicking out recipients whether or not they’re able to get back on their feet. That’s contributing to greater inequity, not less.

That’s not all, though. Even when systems are well-designed, it can be “garbage (data) in, discrimination out.” A transportation agency may pledge to open public transit data to inspire the creation of applications like “Next Bus,” which simplify how we plan trips and save time. But poorer localities often lack the resources to produce or share transit data, meaning some neighborhoods become dead zones—places your smartphone won’t tell you to travel to or through.

Unfortunately, the implications of flawed data collection may not become apparent for years—after we have made policy decisions about our transit system, for example. Researchers refer to this issue of time as a sort of conditioning problem that arises from several different sources. In one case discussed, discriminatory conditioning happens because of the information itself. Take, for example, genetic information. In the U.S., police can collect DNA from individuals at point of arrest. This information identifies you much in the same way a fingerprint does. But your DNA also links you with others—your family members from generations before, relatives living today, and future generations. While it’s hard to predict how law enforcement or others might use this information in the future, the networked nature of DNA makes it a high-risk candidate for implicating an entire group, and not just an individual.

In other cases, discriminatory conditioning happens because of the pervasiveness of collecting and sharing information, making it hard to control who knows what about you. Most Web pages regularly embed code that communicates to third parties to load an icon, cookie, or advertisement. Try searching for a disease—say AIDS—and click on a top result. Chances are the page will include icons for other applications not connected to the health site. The resulting effect—data leakage—is difficult to avoid: A Web page must communicate information about itself (e.g., “http://www…com/HIV”) to icons so that the site loads correctly. That could be devastating for those who wish to conceal health conditions from data brokers or other third parties that might access and act upon that data profile.

Or consider the case of highly networked environments, where information about what you’re doing in a particular space gets sucked up, matched and integrated with existing profiles, and analyzed in order to spit back recommendations to you. Whether at home, out shopping, or in public, few people can be invisible. Homes come outfitted with appliances that sense our everyday activities, “speak” to other appliances, and report information to a provider, like an electric utility company. While it’s presumptuous to say that retailers or utility companies are destined to abuse data, there’s a chance that information could be sold down the data supply chain to third parties with grand plans to market predatory products to low-income populations or, worse yet, use data to shape rental terms or housing opportunities. What it boils down to is a lack of meaningful control over where information travels, which makes it more troublesome to intervene if and when a problem arises in the future.

So what’s possible moving forward? Waiting is definitely not the answer. With collective and personal control, autonomy, and dignity at stake, it would be wrong to leave governments or industry to respond to problems without independent research input. A relatively simple strategy would be to ensure collaboration and coordination between social and computational research. There’s also much to be done in terms of gaining greater access to datasets that various laws otherwise impede (e.g., computer fraud and abuse, intellectual property, or trade secrets). Crowdsourcing the discovery of data-driven discrimination is another possibility, where, like the HUD audits, users that are similar on all but one trait monitor and report experiences with a variety of automatable systems.

Trying many approaches and testing them out now may seem like an ambitious agenda, and it is. But in a period of such uncertainty—about how laws, market practices, social norms and practices, or code can safeguard collective and personal dignity, autonomy, and rights—experimentation and iteration is critical to exposing harm or benefit. Only then will we generate stories and evidence rigorous enough to reveal discrimination when it happens.

But for now, that uncertainty can’t get resolved quickly enough as we head into an era of more and more data collection, analysis, and use. There’s a real threat that things are going to go badly, and disproportionately burden the poorest and most marginalized among us. The twin dynamics will only accelerate the divide. Despite the complexity of this task, the time to confront data-driven discrimination is now.

Seeta Peña Gangadharan is a senior research fellow at the New America Foundation's Open Technology Institute.

Samuel Woolley is a doctoral student in the Department of Communication at the University of Washington.

TODAY IN SLATE

Politics

The Democrats’ War at Home

How can the president’s party defend itself from the president’s foreign policy blunders?

Congress’ Public Shaming of the Secret Service Was Political Grandstanding at Its Best

Michigan’s Tradition of Football “Toughness” Needs to Go—Starting With Coach Hoke

Windows 8 Was So Bad That Microsoft Will Skip Straight to Windows 10

Homeland Is Good Again! For Now.

Politics

Cringing. Ducking. Mumbling.

How GOP candidates react whenever someone brings up reproductive rights or gay marriage.

Building a Better Workplace

You Deserve a Pre-cation

The smartest job perk you’ve never heard of.

The Ludicrous Claims Women Are Pitched at “Egg Freezing Parties”

Piper Kerman on Why She Dressed Like a Hitchcock Heroine for Her Prison Sentencing

Behold
Oct. 1 2014 11:48 AM An Up-Close Look at the U.S.–Mexico Border
  News & Politics
The World
Oct. 1 2014 12:20 PM Don’t Expect Hong Kong’s Protests to Spread to the Mainland
  Business
Moneybox
Oct. 1 2014 1:11 PM This Company Wants to Fight World Hunger With Flies 
  Life
The Eye
Oct. 1 2014 1:04 PM An Architectural Crusade Against the Tyranny of Straight Lines
  Double X
The XX Factor
Oct. 1 2014 1:01 PM Can Activists Save Reyhaneh Jabbari?  
  Slate Plus
Behind the Scenes
Oct. 1 2014 10:54 AM “I Need a Pair of Pants That Won’t Bore Me to Death” Troy Patterson talks about looking sharp, flat-top fades, and being Slate’s Gentleman Scholar.
  Arts
Brow Beat
Oct. 1 2014 1:13 PM The Essence of Gender Roles in Action Movies, in One Supercut
  Technology
Future Tense
Oct. 1 2014 11:48 AM Watch a Crowd Go Wild When Steve Jobs Moves a Laptop in This 1999 Demonstration of WiFi
  Health & Science
Bad Astronomy
Oct. 1 2014 12:01 PM Rocky Snow
  Sports
Sports Nut
Sept. 30 2014 5:54 PM Goodbye, Tough Guy It’s time for Michigan to fire its toughness-obsessed coach, Brady Hoke.