Hunting for statistical fraud in Iran's elections.

Hunting for statistical fraud in Iran's elections.

Hunting for statistical fraud in Iran's elections.

A mathematician's guide to the news.
June 17 2009 10:46 AM

Guilt by Calculation

It takes more than an Excel sheet to prove the Iranian election was fixed.

Read more of Slate's coverage of Iran's June 12 election and its aftermath.

(Continued from Page 1)

So the amount by which it's reasonable to expect that batch to differ from overall average of 67.2 percent is 20 percent divided by the square root of 360, or 1.05 percent. In other words, even if we assume a wide variance in the support for Ahmadinejad in any region—20 points in either direction—a batch consisting of 36 percent of the electorate is likely to wander from the average only by somewhere in the neighborhood of 1 percent.

And Ahmadinejad's reported total of 70 percent for the first 36 percent of the vote misses his average by substantially more than that, suggesting even messier data than our scenario predicts. The same argument estimates the standard deviations of the other five batches as 1.5 percent, 1.4 percent, 2 percent, 2.6 percent, and 2.2 percent, respectively. In other words, these figures, though they may seem eerily consistent at first glance, are actually just what we would expect. That's the nature of large batches of data, governed by what's called the Law of Large Numbers: Averages of widely varying quantities can, and usually do, yield results that look almost perfectly uniform. Given enough data, the outliers tend to cancel one another out.

Of course, these estimates depend vitally on the arbitrary guesses about the sizes of the regions and their individual vote totals we made when setting up our estimate. But every reasonable guess I tried yielded the same result; on purely statistical grounds, the Iranian election numbers look more or less reasonable. It might be a different story if Ahmadinejad had drawn between 67.1 percent and 67.3 percent in all six batches, suggesting a standard deviation of less than 0.1 percent—or if 500 mini-batches of data, each making up 0.2 percent of the vote, were all in that 62 percent to 70 percent range. (One reason American readers may be more used to seeing wide swings in the vote totals is that our fine-grained media start reporting results when just a few percent of the votes are in.)

I'm not saying the election wasn't fixed; Juan Cole and Richard Sexton offer more reasons for doubting the government's numbers. On the other side, Ken Ballen and Patrick Doherty argue that their pre-election polling is consistent with a big Ahmadinejad win. Either way, the final verdict on the Iranian election won't be settled by drawing a graph. The official numbers may or not be authentic, but they're definitely messy enough to be true.