The XX Factor

Value-Added Rankings for Teachers

This NYT article about ranking teachers based on students’ test scores-what’s known as value-added ranking-is the first thing I’ve read that really lays out the tensions about the meaning of this data. It made me a lot more nervous about releasing the information publically right now, as a dozen news organizations including the Times have sued to do, in a case that’s pending. The rankings sound like they have the potential to become a really useful tool, but also like they are not ready for prime time yet. If the data are released now and unfairly hurt some teachers’ reputations, could the backlash prevent the rankings from developing into what we really need?

Value-added rankings are supposed to tell us how much the quality of an individual teacher contrributed to boosting the scores of his or her students. Researchers say that with confidence, they can tell us that a teacher who is in the bottom 10 percent year after year is doing a bad job and a teacher in the top 10 percent is doing a good job. But about the middle, they tell us much less:

“In math, judging a teacher over three years, the average confidence interval was 34 points, meaning a city teacher who was ranked in the 63rd percentile actually had a score anywhere between the 46th and 80th percentiles, with the 63rd percentile as the most likely score. Even then, the ranking is only 95 percent certain. The result is that half of the city’s ranked teachers were statistically indistinguishable.”

Another more serious problem for individual teachers: The rankings have a high error rate, which means you need years of data to get a fairly accurate picture:

“One national study published in July by Mathematica Policy Research, conducted for the Department of Education, found that with one year of data, a teacher was likely to be misclassified 35 percent of the time. With three years of data, the error rate was 25 percent. With 10 years of data, the error rate dropped to 12 percent. The city has four years of data.”

One more problem: It turns out that the city was using a narrow and too easy test. Now they’re trying to fix that, but “Daniel Koretz, a Harvard professor whose research helped persuade the state to toughen standards, said that as a result it was impossible to know whether rising scores in a classroom were due to inappropriate test preparation or gains in real learning.” The city won’t have rankings with the higher standards til the next academic year. And would we really need several year’s worth to know anything really useful?

It would be nice to think that the city could release its data now, with a full explanation of the flaws, and everyone would calmly take it for what its worth and refrain from jumping to conclusions about individual teachers. But is that likely? When the Los Angeles Times released this data about L.A. teachers last August, some teachers said they got burned . Do we know about yet to be able to tell whether that cost was worthwhile, on balance? I’m sure it depends who you ask, but it seems to me that value-added rankings, in their current guise, invovle making individual teachers bear a big risk of being misjudged in return for information that could improve the system. Isn’t that a lot to ask?