Are Bush and McCain Really Neck and Neck?
Matt AlsdorfPosted Friday, Dec. 10, 1999, at 1:52 PM ETA recent poll found that John McCain leads George W. Bush among New Hampshire Republicans by 3 percentage points, with a 5 percent margin of error. Does this mean they are really in a dead heat?
Possibly. You actually need two numbers to understand a poll. One is the margin of error--an estimate of how large a discrepancy might exist between a survey's results and the true value. (It's unlikely, for instance, that 1,000 people polled in some national survey will speak accurately for all 260 million Americans. The margin of error measures just how wrong those 1,000 people could be.) The margin of error is typically 3 to 5 percent and is almost always cited alongside the results. The other important number is the degree of confidence, which is the likelihood that the real value falls within the margin of error. Most polls use a 95 percent degree of confidence, so it's usually omitted from news reports.
Here's an example of how the two numbers work together. This week, a Reuters/WHDH-TV poll showed McCain leading Bush 35 percent to 32 percent with a 5 percent margin of error. This means that there is a 95 percent chance (there's the degree of confidence) that between 30 and 40 percent of the population supports McCain (that's McCain's 35 percent plus or minus the margin of error), and a 95 percent chance that between 27 and 37 percent supports Bush.
So, based on this poll, how sure can we be that McCain is truly ahead of Bush? The answer is, less than 95 percent sure. The reported margin of error relates to each candidate's individual score. But the margin of error on the spread (i.e., the difference between their scores) is much larger, since it incorporates the potential variations in both scores. As a rule of thumb, you can multiply the reported margin of error by 1.7 to get the margin of error for the spread. Since 1.7 times 5 is 8.5, the Reuters/WHDH-TV survey would need to show McCain with at least an 8.5 percent advantage over Bush for his lead to be 95 percent certain. Since his lead is only three percentage points, it's being called a dead heat. But the small lead is not meaningless; although it is less than 95 percent certain that McCain leads Bush, it is still more likely than not that he is really ahead.
(Visit the American Statistical Association's Web site for extensive information on common methods--and errors--in measuring public opinion.)
The Explainer thanks many
Slate readers for suggesting this topic.
Next question?
Matt Alsdorf is a
Slate editorial assistant.
The readers respond:
Question: How many more people would they have to poll to raise the certainty significantly? How much would this add to the cost?
--WC
(To reply, click here.)
An easy rule of thumb for finding the 95% confidence range is this: Divide 1 by the square root of the sample size. So for a sample of 400 people we would get 1/sqrt(400) = 1/20 = .05, so the 95% margin of error would be plus-or-minus 5 percent.
To double the accuracy, you would need a sample 4 times as large.
--ed johnson
(To reply, click here.)
Matt Alsdorf undoubtedly knows what I am about to relate, but one of his examples seems likely to further a common misconception about polls.
Other things being equal, the accuracy of a poll depends on the sample size, not the population size. Neither the degree of confidence nor the margin of error depends on the size of the population, unless the population is very small relative to the sample. Thus, aside from one sometimes-important caveat (a case when "other things" are not equal), a poll with a sample size of 1,000 will be of approximately the same accuracy whether it is taken for the whole country or just New Hampshire. This assumes that each sample is selected properly. (The caveat is that a very diverse population may be more difficult to poll accurately than a relatively homogeneous one, especially if the pollster is selecting a stratified random sample.)
--Ken Allen
(To reply, click here.)
If a poll says it has 95% confidence, that should be read as 95% confidence if everything went perfectly.
It makes a world of difference who writes the questions. Very subtle differences in phraseology can produce dramatically different responses. And sometimes the flaws in the questions are unsubtle. I remember a local radio station saying that 85% of Reagan supporters were willing to blow up the world. I was one who took part in that poll, and we had to choose between being for the other guy or being willing to blow up the world. The radio guy concluded by saying that the poll was accurate to plus or minus 4%.
The other problem is whether you got a representative sample. Remember the famous photo of a victorious President Truman holding up a newspaper headline that said he had lost? Well, that poll was off because it was a telephone poll, and the people too poor to own phones tended to be for Truman.
In their attempt to get a representative sample, the pollsters carefully choose whom they will ask. That is the opposite of randomly selecting people, but they have sophisticated techniques for making it pseudo-random. Nonetheless, they can be off.
So, when a poll claims 95% confidence of being within 5% of accurate, it is really saying that that's true only if the sample was perfect and the questions were crafted by a natural deity.
--Charlie Clack
(To reply, click here.)
Explainer states:
"This week, a Reuters/WHDH-TV poll showed McCain leading Bush 35 percent to 32 percent with a 5 percent margin of error. This means that there is a 95 percent chance (there's the degree of confidence) that between 30 and 40 percent of the population supports McCain (that's McCain's 35 percent plus or minus the margin of error), and a 95 percent chance that between 27 and 37 percent supports Bush."
This interpretation is inaccurate, as any statistician will tell you. The problem is that the author equates "95% confidence" with "95% chance." A margin of error is a statistical measure of confidence, not chance. Here's the difference:
Start with the statement that McCain's support in the sample is at 35% with a margin of error of 5% (with the usual 95% confidence).
Confidence in this context means that assuming McCain's support is at 35% in the general population, the chance of the survey organization selecting a random sample that supports McCain more than 30% and less than 40% ... is 95%. This is not the same as Explainer's interpretation that there is a 95% chance that the true value of McCain's support is between 30 and 40%.
It's a small but critical difference. Confidence intervals make an assumption about the true value but do not give the "chance" of it falling within a particular range.
Explainer's error is common in medical literature and was recently called an "understandable but categorically wrong interpretation" in a June 15, 1999 article in Annals of Internal Medicine by Professor Steven Goodman of Hopkins. The article, called "The P-Value Fallacy," is a technical but excellent explanation of this problem (and can be found on www.acponline.com).
--Josh
(To reply, click here.)
I'm struggling to remember my advanced statistical inference, but I believe Josh is a) technically correct, and b) qualitatively irrelevant. He makes the point that the following are not the same:
1) True confidence intervals, which show the [conventionally 95%] probability that McCain's measured support will be between 30% and 40%, if his true support is 35%.
2) The [conventionally 95%] chance [i.e., probability] that McCain's true support is between 30% and 40%, if his measured support is 35%.
#1 is what is exactly statistically true; #2 is what we care about. After all, it's the measured value we know and the true value we're trying to infer.
However, to defend Explainer and common usage, I believe the following to be a theorem somewhere:
When the assumed distribution of measured values around the true value is symmetric [as it is if we assume a Normal distribution, and almost always is in practice], and the measured value is "some distance from" the boundaries [0% and 100%, in this case], the differences between #1 and #2 are vanishing small.
So, we're entitled to our small hand wave and saying that there's [approximately] a 95% chance that McCain's true support is between 30% and 40%.
For the purists, I realize that a Normal distribution cannot have boundaries, etc., but my point is about the magnitude of the inaccuracy, not million-decimal-place precision. If these conditions are grossly violated--such as if we're dealing with effects in parts per million, almost zero--then my defense of approximation doesn't work.
--Lex
(To reply, click here.)
What did you think of this article?
Join The Fray: Our Reader Discussion Forum
SPONSORED CONTENT
The readers respond:
Question: How many more people would they have to poll to raise the certainty significantly? How much would this add to the cost?
--WC
(To reply, click here.)
An easy rule of thumb for finding the 95% confidence range is this: Divide 1 by the square root of the sample size. So for a sample of 400 people we would get 1/sqrt(400) = 1/20 = .05, so the 95% margin of error would be plus-or-minus 5 percent.
To double the accuracy, you would need a sample 4 times as large.
--ed johnson
(To reply, click here.)
Matt Alsdorf undoubtedly knows what I am about to relate, but one of his examples seems likely to further a common misconception about polls.
Other things being equal, the accuracy of a poll depends on the sample size, not the population size. Neither the degree of confidence nor the margin of error depends on the size of the population, unless the population is very small relative to the sample. Thus, aside from one sometimes-important caveat (a case when "other things" are not equal), a poll with a sample size of 1,000 will be of approximately the same accuracy whether it is taken for the whole country or just New Hampshire. This assumes that each sample is selected properly. (The caveat is that a very diverse population may be more difficult to poll accurately than a relatively homogeneous one, especially if the pollster is selecting a stratified random sample.)
--Ken Allen
(To reply, click here.)
If a poll says it has 95% confidence, that should be read as 95% confidence if everything went perfectly.
It makes a world of difference who writes the questions. Very subtle differences in phraseology can produce dramatically different responses. And sometimes the flaws in the questions are unsubtle. I remember a local radio station saying that 85% of Reagan supporters were willing to blow up the world. I was one who took part in that poll, and we had to choose between being for the other guy or being willing to blow up the world. The radio guy concluded by saying that the poll was accurate to plus or minus 4%.
The other problem is whether you got a representative sample. Remember the famous photo of a victorious President Truman holding up a newspaper headline that said he had lost? Well, that poll was off because it was a telephone poll, and the people too poor to own phones tended to be for Truman.
In their attempt to get a representative sample, the pollsters carefully choose whom they will ask. That is the opposite of randomly selecting people, but they have sophisticated techniques for making it pseudo-random. Nonetheless, they can be off.
So, when a poll claims 95% confidence of being within 5% of accurate, it is really saying that that's true only if the sample was perfect and the questions were crafted by a natural deity.
--Charlie Clack
(To reply, click here.)
Explainer states:
"This week, a Reuters/WHDH-TV poll showed McCain leading Bush 35 percent to 32 percent with a 5 percent margin of error. This means that there is a 95 percent chance (there's the degree of confidence) that between 30 and 40 percent of the population supports McCain (that's McCain's 35 percent plus or minus the margin of error), and a 95 percent chance that between 27 and 37 percent supports Bush."
This interpretation is inaccurate, as any statistician will tell you. The problem is that the author equates "95% confidence" with "95% chance." A margin of error is a statistical measure of confidence, not chance. Here's the difference:
Start with the statement that McCain's support in the sample is at 35% with a margin of error of 5% (with the usual 95% confidence).
Confidence in this context means that assuming McCain's support is at 35% in the general population, the chance of the survey organization selecting a random sample that supports McCain more than 30% and less than 40% ... is 95%. This is not the same as Explainer's interpretation that there is a 95% chance that the true value of McCain's support is between 30 and 40%.
It's a small but critical difference. Confidence intervals make an assumption about the true value but do not give the "chance" of it falling within a particular range.
Explainer's error is common in medical literature and was recently called an "understandable but categorically wrong interpretation" in a June 15, 1999 article in Annals of Internal Medicine by Professor Steven Goodman of Hopkins. The article, called "The P-Value Fallacy," is a technical but excellent explanation of this problem (and can be found on www.acponline.com).
--Josh
(To reply, click here.)
I'm struggling to remember my advanced statistical inference, but I believe Josh is a) technically correct, and b) qualitatively irrelevant. He makes the point that the following are not the same:
1) True confidence intervals, which show the [conventionally 95%] probability that McCain's measured support will be between 30% and 40%, if his true support is 35%.
2) The [conventionally 95%] chance [i.e., probability] that McCain's true support is between 30% and 40%, if his measured support is 35%.
#1 is what is exactly statistically true; #2 is what we care about. After all, it's the measured value we know and the true value we're trying to infer.
However, to defend Explainer and common usage, I believe the following to be a theorem somewhere:
When the assumed distribution of measured values around the true value is symmetric [as it is if we assume a Normal distribution, and almost always is in practice], and the measured value is "some distance from" the boundaries [0% and 100%, in this case], the differences between #1 and #2 are vanishing small.
So, we're entitled to our small hand wave and saying that there's [approximately] a 95% chance that McCain's true support is between 30% and 40%.
For the purists, I realize that a Normal distribution cannot have boundaries, etc., but my point is about the magnitude of the inaccuracy, not million-decimal-place precision. If these conditions are grossly violated--such as if we're dealing with effects in parts per million, almost zero--then my defense of approximation doesn't work.
--Lex
(To reply, click here.)