How to Fix the SAT

Give out fewer scores.

May 21, 20147:21 AM

Illustration by Rob Donnelly. — Illustration by Rob Donnelly

Earlier this year, the College Board announced sweeping changes to the SAT. New vocab, less obscure. New essay, now optional. A repeal of the “anti-guessing” policy. A retooled and refocused math section. The board stopped just short of switching to lavender-scented paper and offering 10 p.m. to 2 a.m. testing sessions for the benefit of morning-averse 11^th graders.

But the College Board omitted the one reform I’d most favor. It’s a simple change that could help undercut the behemoth test prep industry and defuse some of the anxiety surrounding the test, and it wouldn’t require altering a single question.

They just need to reduce the number of scores.

I don’t mean “give out lower scores.” I mean “give out fewer scores.” Common sense and hard data dictate that there is no meaningful distinction between a 710 and a 720, or between a 560 and a 580. The SAT should embrace reality and stop assigning different scores to virtually identical performances.

Decades ago, the SAT gave scores down to the point. George W. Bush allegedly scored a 1206. Bill O’Reilly landed a 1585. Eventually, the College Board decided that these pinpoint figures gave a deceptive sense of precision to their decidedly imprecise test. So they started delivering scores in increments of 10, a practice they continue today.

But these increments are still too small. By the College Board’s own numbers, a student’s section score will fluctuate an average 20 to 30 points between testing sessions. In fact, it warns specifically against reading too much into small differences: “There must be a 60-point difference between reading, mathematics, or writing scores before more skill can be assumed in one area than another.”

Whether we’re comparing students to one another or to their own performance in another domain, gaps smaller than 60 points are likely to be meaningless. So why report them at all?

In his book Proofiness, Charles Seife gives this problem of deceptive over-precision a name: disestimation. He explains that it “imbues a number with more precision than it deserves, dressing a measurement up as absolute fact instead of presenting

it as the error-prone estimate that it really is.”

If you know that your bathroom scale often errs by 5 pounds, then you shouldn’t report your weight as “150.4.” You should say, “About 150.” And if this same volatile scale claims that I weigh 151.3 pounds, that doesn’t mean I’m necessarily heavier than you. The measurement isn’t that reliable. The best we can conclude is that we weigh roughly the same amount.

On the SAT, scores for each section range from 200 to 800, in 10-point increments. That yields an incredible 61 different possible scores for each of the three sections. Thus, based on just an hour’s worth of multiple-choice questions, the College Board claims to divide high school students into more than five dozen different groups according to their mathematical, reading, and writing abilities.

Compare that to a typical Advanced Placement exam. It takes three times longer than an SAT section, and it asks essay questions, graded by expert teachers. By any fair reckoning, this should supply richer data than the SAT, allowing for more fine-grained distinctions. And how many different scores does the AP assign?

Just five.

Or consider an actual high school course. When I’ve taught trigonometry, I’ve assigned daily quizzes, nightly homework, and a dozen hour-long tests. It’s a wealth of information that dwarfs what the SAT can gather on a Saturday morning. And do I give 61 different scores?

Nope. From A-plus to F, there are only 13 possible final results.

In fact, no other major standardized test—not the LSAT, MCAT, or even the ACT—draws such hairsplitting distinctions as the SAT, which divides students into an improbable 181 groups on the basis of less than four hours’ work on a sleepy weekend morning.

This kind of disestimation inevitably tricks us. As Seife explains: “Nice round numbers are sending a subliminal signal … announcing that you can’t trust them very far because they’re crude approximations. Long, ultra-specific numbers send exactly the opposite message: that they come from measurements that are more trustworthy and closer to absolute truth.”

By drawing distinctions where there’s no real difference, the SAT feeds the panicked college-prep culture. Students and parents chase higher scores, thinking that they mean something. With scores fluctuating so dramatically, it makes a twisted sort of sense for students to retake the SAT three, four, or even five times, since most colleges consider a student’s best score rather than the average. It even makes sense to cut Princeton Review an $800 check for tutoring and prep classes if they can boost your score by five increments, or get you over the hump from 1290 to 1300. (Never mind that random chance will often provide a comparable boost.) Admissions officers are typically savvy enough to ignore these meaningless distinctions, but students and parents are far more likely to get suckered in by the SAT’s false precision.

The solution for the College Board is simple. Stop disestimating. Revise the gap between adjacent scores so that it actually means something.

At the extreme, the College Board could throw out the ancient 200-to-800 scale and implement a system more like their AP tests, with scores ranging from 1–5.

If that’s too drastic (and to be fair, by lumping together a 560 with a 670 we’d probably be throwing out useful information), the College Board could try something more conservative: scoring not in increments of 10, but in increments of 50.

This would leave untouched the familiar range of scores, while reducing the number of categories to 13—coincidentally, the same as the number of different grades one can earn in an academic class.

Would this exaggerate the difference between a 520 and a 530 (now a 500 and a 550)? Potentially, yes. But that’s inevitable with any grading system. Score an 89.7 percent in a tough math class (as one of my tutoring students did this year), and you’ll get a B-plus for the semester, even though an extra point or two on a single test would’ve boosted you to an A-minus. That’s a meaningless distinction—would an extra two points on one test have signaled that you were a better student? But every scoring system will make both meaningful and meaningless distinctions. The question is how to maximize the former and minimize the latter, and the current SAT doesn’t strike a very good balance.

The SAT has made this change before, when it moved from one-point increments to 10-point ones. Even so, compared with assessments like AP tests and final course grades, the SAT still offers a bizarre and unjustifiable level of precision. It’s time to rethink the test again, and start giving scores that accurately reflect the SAT’s inherent level of uncertainty.

Math