Who nailed the election results?

Science, technology, and life.
Dec. 9 2004 5:46 PM

Let's Go to the Audiotape

Who nailed the election results? Automated pollsters.

(Continued from Page 1)

Start with the sum method. Rasmussen and Gallup overlapped in four battleground states: the big three (Florida, Ohio, and Pennsylvania) plus Minnesota. In all four, Rasmussen beat Gallup. Rasmussen's average error in these states was 3.3 points compared to Gallup's 6.2. SurveyUSA overlapped with Gallup in the big three states plus Iowa. Again, the automated pollster whipped Gallup. SurveyUSA's average error was 3.5 points. Gallup's was 6.4.

Mason-Dixon fared better, but not by much. It conducted surveys in five states that Rasmussen also polled: the big three plus Michigan and Minnesota. Mason-Dixon's average error in these states was 5.5 points. Rasmussen's was 3.2. Mason-Dixon overlapped SurveyUSA in 10 states: the big three, Arkansas, Colorado, Iowa, Michigan, Missouri, Nevada, and Oregon. Mason-Dixon was off in these states by an average of 5.6 points. SurveyUSA was off by 3.3.

Advertisement

Zogby came closer but still couldn't beat the robo-pollsters. Rasmussen went head-to-head with Zogby in the big three, Michigan, and Minnesota. Zogby erred in these states by an average of 4.3 points. Rasmussen erred by just 3.2. SurveyUSA squared off against Zogby in the big three, Colorado, Iowa, Michigan, and Nevada. Zogby was off in these states by an average of 4.5 points. SurveyUSA was off by just 3.4.

Human pollsters argue that the sum method favors automated polls, because when respondents are asked to choose a candidate, they're more likely to punch "1" or "2" on their phones than to punch "3" for other or undecided. This drives down the number of other/undecided responses, lifting both major candidates closer to their final numbers. If one poll has Kerry winning a state 46-45 with 9 percent undecided, and Kerry actually wins 50-49, the sum method punishes that pollster for every other/undecided respondent (calculating an 8-point error) and fails to reward the pollster for nailing the spread. Instead, the sum method rewards a second pollster who recorded fewer other/undecided responses and called the state for Bush, 51-48. The second pollster outscores the first by the sum method (missing Bush's number by 2 and Kerry's by 2), despite blowing the spread by 4 points (calling a 3-point win for Bush when Kerry actually won by a point).

What happens to the pollster comparisons if we switch to the spread method? Both of the automated pollsters still beat Gallup. Head to head, SurveyUSA missed the spreads by an average of 2.3 points; Gallup missed by an average of 5.4. Rasmussen cleaned Gallup's clock, missing the spreads by an average of 1.6 points compared to Gallup's 6.2. Rasmussen also whipped Zogby, erring by 1.0 points compared to Zogby's 3.2. But the contest between SurveyUSA and Zogby was tighter: The human pollster was off by an average of 3.6 points, compared to the robo-pollster's 2.5.

Throw in Mason-Dixon, and the comparison gets even tighter. In the five states where Rasmussen overlapped with Mason-Dixon, the two pollsters essentially tied. If you compare election returns (measured to a tenth of a percent) to the most precise published poll results (measured in whole integers), each pollster missed by the exact same average: 1.42 points.

Mason-Dixon says it would be more scientific to compare whole-integer poll results to whole-integer (rounded) election returns. This method would lower Mason-Dixon's average error. We understand that error rates averaged to a tenth of a percent are tenuous when the poll numbers from which they're computed are whole integers. But we can't agree that rounding off election returns improves the situation. Alternatively, Mason-Dixon argues that if we're using election returns calculated to a tenth of a percent, the best scientific comparison would be to poll results measured to a tenth of a percent, which again would lower Mason-Dixon's average error. We agree that this would be more scientific. But Rasmussen didn't release its results to a tenth of a percent, so we can't compare the two pollsters at that level of precision. Anyway, the performances are so close, and the variation in averages depending on decimal place is so tiny when compared to the much bigger margin of error on each poll, that it's impossible to call the race between Rasmussen and Mason-Dixon one way or the other. It's a tie.

The match-up between Mason-Dixon and SurveyUSA is a different story. In the 10 states where they went head to head, the human pollster prevailed. Mason-Dixon erred by an average of 1.8 points, beating SurveyUSA's 2.6. For this lonely victory over the machines, Mason-Dixon deserves the polling industry's Gary Kasparov award.

How did the robots largely beat the humans? For starters, they aren't robots. They're recordings of human voices. Pollsters who use this technology argue that the uniformity achieved by automation—every respondent hears the questions read exactly the same way—outweighs any distortions caused by people hanging up or lying to the recordings. They also argue that the interviewers who read questions and record answers in "human" polls are all too human. A human poll may bear the name of a major newspaper or television network, but the interviews are usually "outsourced" to a company you've never heard of and conducted by whoever is willing to make the phone calls—which sound a lot like telemarketing—for modest wages.

We won't settle the relative merits of the two approaches in this article or this election. But when the two major automated pollsters score either second and first--or third and tied for first, depending on how you count it--in round-robin match-ups with the three major human pollsters, it's time to broaden the experiment in automated polling and compare results to see what's working and why. Clearly, the automated pollsters are onto something, and the human pollsters who have fallen behind will have to figure out how to beat it—or join it.

Correction, Dec. 11, 2004: This article originally said that the measure by which Rasmussen and SurveyUSA beat all three human pollsters was the spread method. This was incorrect. The error calculations supplied were for the sum method. We recalculated the average error for each pollster using the spread method and determined that Mason-Dixon beat SurveyUSA. We apologize to Mason-Dixon and to indignant humans everywhere.