Testing Testing

Employment tests may be racially biased—but what if they’re less biased than human beings?

March 03, 20097:02 AM

Could tests for job candidates actually reduce discrimination?

The few companies taking on new hires these days may consider themselves lucky—they’re not yet bankrupt, and they have their pick among the legions of newly unemployed and overqualified. But even these fortunate few are showing greater care with whom they take on— no one wants to be saddled with dead weight on the payroll in the midst of economic calamity. Rather than relying solely on interviewers to do the hiring, companies can choose from among a range of simple and easily administered tests to screen prospective employees for everything from arithmetic skills to personality traits like conscientiousness and extroversion. These tests do help managers pick better workers—it’s useful to have cashiers with basic math skills and salesclerks with outgoing personalities. However, they also leave employers vulnerable to discrimination lawsuits under the Equal Employment Opportunities Act, since minorities often perform poorly on the tests.

But a recent study published in the Quarterly Journal of Economics, by economists David Autor of MIT and David Scarborough of Black Hills State University, questions whether these oft-vilified tests are necessarily bad for minorities at all. (Scarborough also works for Kronos, a company that sells job testing products.) They argue that the tests—while perhaps biased—may nevertheless serve as a check on the judgment and prejudices of all-too-human interviewers. In fact, the authors find that when a large retailer started using a job screening test in 1999, the fraction of blacks and Hispanics hired didn’t change, while the quality of hires of all races increased as a result of testing.

How do employment tests—on seemingly objective criteria like math or organization skills—discriminate against minorities in the first place? In part, it’s the effect of genuine racial disparities in the quality of job applicants. On average, minorities have less education, attend lower-quality schools, and as a result end up with lower math and language abilities. Since these are the skills emphasized in pencil-and-paper testing, minority candidates may get screened out of jobs for which they would otherwise be effective employees (think, for example, of screening for mail courier positions based on a math exam). Also, people who write the tests are typically well-off white males who may unconsciously introduce cultural biases through their choice of vocabulary or social situations in the questions they devise.

Of course, the human beings who make hiring decisions may well be biased themselves. Symphony orchestras were dominated by men before the advent of blind auditions, and a résumé from someone named Lakisha—a common African-American name—is less likely to elicit a callback than one from Emily. Autor and Scarborough’s insight is that adding a test—even a racially biased one—will only aggravate the problem of discrimination if it is more biased than the average HR person. A test that favors white applicants may even reduce discrimination if its bias is less extreme than human prejudice.

To see how the interplay between human prejudice and testing bias plays out in the real workplace, the authors examined the impact of a job test rollout by a large nationwide retail chain. (The company’s identity is kept confidential by the authors.) Starting in June 1999, the company’s stores began installing electronic Kronos-Unicru kiosks, where job applicants entered demographic information and took a short personality test (Agree or disagree: “You can be rude when you need to be”; “You hold back from talking a lot in a group”). The results were instantly tabulated, color-coded as red (rude); yellow (possibly shy); or green (outgoing and friendly); and sent to the store manager. By June 2000, all of the company’s 1,363 stores had kiosks in place. So in the year the kiosks were installed, some applicants were assessed solely on the basis of a manager’s interview while others got both human and test-based evaluation, simply due to random chance of where the kiosks were installed first.

The tests did a better job of screening out ill-mannered introverts than the interview alone—candidates hired with the help of test feedback lasted 10 percent longer than those without. Minorities did underperform whites on the test—nearly a third more black candidates and nearly 20 percent more Hispanics were red-coded relative to whites. But if the test was more biased than store managers charged with picking employees, fewer minorities should have been hired as a result of the new kiosks. As it happened, the fraction of each race hired was unchanged—10 percent of white applicants, and 7 percent each of blacks and Hispanics were hired. (Additionally, the increase in job tenure was the same 10 percent for all races who took the test. If the test had made the hiring process more biased, then the tenure gains should have been lower for white recruits, as managers picked “too many” whites from the pool of tested applicants.)

It’s also not clear that the test or store managers were biased at all—white hires lasted a third longer than black employees and also longer than Hispanics, so the hiring rate of whites (and their higher test scores) may well have been the result of better job qualifications.

The results of the study don’t let test-makers off the hook—they still need to strive to create questions that give applicants of comparable ability equal footing. But we also shouldn’t “shoot the messenger” if a test reveals uncomfortable disparities among races. Tests are used for many high-stakes decisions in America—the SAT for college admission, the Armed Forces Qualifying Test for the military, Police Entrance Exams. The fact that minorities perform poorly on these tests shouldn’t be shoved under the rug—assessing the extent of racial inequality is the first step to understanding and remedying the underlying problems.

Autor and Scarborough’s findings also imply that lawsuits targeting job testing programs might be misguided. Employers may respond by avoiding any test where minorities underperform whites. But while this could protect them from charges of discrimination, it’s not clear that it will improve the situation of minority applicants, who might be victimized by more subtle, less traceable ways of avoiding minority hires. It could be that we’re best off accepting an imperfect tool for picking employees in what may be an even more flawed and prejudiced world.