Read more of Slate's coverage of Iran's June 12 election and its aftermath.
Were Iran's election numbers too good to be true? That's the question that the blog Tehran Bureau raised hours after Friday's election, when it noted a strange trend in the government's electoral data: Each time a new vote total was released, President Mahmoud Ahmadinejad had won a nearly identical percentage, around 67 percent. As more results rolled in, his tally climbed in linear lock step.
We're used to watching the lead fluctuate wildly in American elections as returns come in, particularly early in the night, so the perfect straight line on Tehran Bureau's graph suggested the numbers were faked—and ham-handedly at that. Within hours, the graph was showing up in tweets and blogs all over the world. Atlantic blogger Andrew Sullivan saw it as conclusive evidence. "They didn't even attempt to disguise the fraud," he wrote. "This graph is a red flag to Iran and the world."
This kind of statistical gumshoeing has a long history. In 1936, for example, English biologist and statistician R.A. Fisher went gunning for Gregor Mendel, whose experimental results Fisher believed had been tweaked to be more favorable to Mendel's ideas. "Fictitious data can seldom survive a careful scrutiny," Fisher wrote, "and, since most men underestimate the frequency of large deviations arising by chance, such data may be expected generally to agree more closely with expectation than genuine data would." In other words, it was precisely the beautiful agreement of experiment with theory that exposed Mendel's thumb on the scale. Only once in 15,000 times, Fisher computed, could one expect such strong conformity. (The controversy over Mendel's research practices continues to this day, with notable scientists lining up on both men's sides.)
More recently, John Darsee, a rising star in cardiology, was caught reporting an unusually consistent series of measurements. When his supervisor demanded to see the original printed-out readings, Darsee said he'd thrown them out to make room in a filing cabinet. In the end, Darsee lost his position at Harvard, and 82 of his research papers had to be junked.
So it's natural to be suspicious when you see that the vote total at each of the six official reporting times follows a linear formula almost exactly. In fact, that's precisely what we expect from the way the data were reported. As more and more of the total vote was counted, it would have taken larger and larger surges by one or the other candidate to noticeably tip the proportions. Political stats whiz Nate Silver made a roughly analogous chart of the 2008 U.S. presidential election, based on the imaginary scenario in which states reported in alphabetical order, and found a linear trend just about as strong as the one reported in Iran.
A better way to assess the plausibility of the Iran data is to examine the six batches of votes separately, instead of the cumulative way it appeared in Tehran Bureau's graph. You see a big first batch, 36 percent of the total vote, which comes in 70 percent for Ahmadinejad. Next come two smaller batches, 18 percent and 21 percent of the electorate, respectively, each of which Ahmadinejad wins with about 66 percent of the vote. The last three batches are smaller still—10 percent, 6 percent, and 8 percent of the population—and the incumbent takes these by 67 percent, 64 percent, and 62 percent margins. So Ahmadinejad's official share really is fairly consistent from batch to batch.
But unbelievably so? Fisher used sophisticated statistical techniques to track down Mendel's fiddling, but we can get away with much less. We're simply asking the following: How much do we expect Ahmadinejad's percentage to deviate from his overall total of 67.2 percent, based on a generally diverse electorate that will vary in allegiances from place to place? The answer is given by a standard deviation, a mathematical measure that tells about how far we expect any given measurement to stray from the overall average value. Here's one way to scratch out an estimate: Let's say the 27 million Iranians who voted last week are divided into 1,000 different regions with 27,000 voters each. For the sake of argument, we'll say that half of these regions are 87.2 percent for Ahmadinejad—20 points over his overall average—while half are 47.2 percent for Ahmadinejad—20 points below. For each region, the deviation from Ahmadinejad's overall vote total is exactly 20 percent, and when you add them all up, you get his 67.2 percent average.
Now let's look at that first batch of votes, made up of 360 of our 1,000 regions (corresponding to the first real batch of 36 percent of the votes). Absent any reason to think that this particular sample is skewed compared with the overall vote, we can employ the following beautiful and simple formula: "The standard deviation of the average over N regions is the standard deviation of each region divided by the square root of N."