SCIENCE

Using Math to Examine Iran’s Election Results

June 25, 2009

Statisticians and political scientists have been having a field day with the results from the Iranian elections earlier this month. Was the election rigged? We may never know, but there is enough buried in the math to make us think that it might have been so. Even then, though, there is also enough to make us believe that everything is legit. Here are a few analyses that I found particularly interesting:

Clean Data

Immediately after the election, doubts were raised over the legitimacy of the data because each time a new batch of voting results were released (they come out in pieces in Iran, similar to the United States), the percentage of votes going to President Mahmoud Ahmadinejad was the same: 67 percent. Data usually isn’t clean, and some started to wonder whether the results had been fabricated.

An analysis by University of Wisconsin math professor Jordan Ellenberg in Slate, however, delves deeper into the data to show that it was actually messier than might be expected. The results didn’t come out city by city but in large batches that combined data from several areas, which meant that Ahmadinejad’s vote totals were really averages. And the Law of Large Numbers dictates, as Ellenberg wrote:

Averages of widely varying quantities can, and usually do, yield results that look almost perfectly uniform. Given enough data, the outliers tend to cancel one another out.

Ellenberg concludes that the data is “definitely messy enough to be true.”

Benford’s Law

Several analyses have looked at the first digits of the Iran election results to see if they comply with Benford’s Law, which is:

In lists of numbers from many (but not all) real-life sources of data, the leading digit is distributed in a specific, non-uniform way. According to this law, the first digit is 1 almost one third of the time, and larger digits occur as the leading digit with lower and lower frequency, to the point where 9 as a first digit occurs less than one time in twenty. This distribution of first digits arises logically whenever a set of values is distributed logarithmically.

One analysis using this tack, by Boudewijn Roukema of the Nicolaus Copernicus University in Poland, concluded that there were nearly twice as many vote counts beginning with the digit 7 for Mehdi Karroubi than would be expected with Benford’s Law. In addition, Roukema suspected that the results for Ahmadinejad, in which there were fewer 1s and more 2s and 3s than expected, would have been likely if someone chose to manipulate the results by changing the 1s at the beginning of the vote totals to 2s and 3s. It would also have led to an overestimate of Ahmadinejad’s totals by several million votes.

Walter Mebane, a political scientist and statistician at the University of Michigan, also used Benford’s Law in his analysis, in which he also finds several irregularities in the Iran election results. But even he admits that though his results are “compatible with widespread fraud,” they are also “compatible with Ahmadinejad having actually won.”

The Last Two Digits

Two graduate students in political science at Columbia University took yet a third take at the data. In an analysis that they summarized in a Washington Post op-ed, they examined the last two digits of the vote counts from 29 provinces for each of the four candidates (e.g., if someone received 14,579 votes, only the 7 and 9 were considered in the analysis).

The last two digits in election results are random noise, and the distribution of digits should be fairly even—each digit should appear around 10 percent of the time. Humans, though, are poor random number generators and when we make up numbers, we tend to choose some numbers more frequently than others. In the Iran results, only 4 percent of the numbers end in the digit 5 while the digit 7 appears 17 percent of the time. Results that deviate this much would be expected in about four of every 100 elections.

Humans also have problems creating numbers that have non-adjacent digits (i.e., you are less likely to come up with 72 than with 23), but these numbers should also follow random patterns and about 70 percent of the pairs should consist of non-adjacent digits. However, in the Iran results, just 62 percent do so. Again, these results would be expected in about 4 of every 100 elections. But the combination of the two results would be expected in only 1 of every 200 elections. Improbable, perhaps, but not impossible.

Where does that leave us? We may never know if the reported results are real or not. My personal favorite bit of data from all of this, though, requires no calculations to lead to questions of election legitimacy. This quotation, from Abbas-Ali Kadkhodaei, a spokesman for Iran’s Guardian Council, would make almost anyone think twice:

Statistics provided by Mohsen Resaei in which he claims more than 100% of those eligible have cast their ballot in 170 cities are not accurate—the incident has happened in only 50 cities.

(For more on the Iran election result analyses, check out Nate Silver on fivethirtyeight.com)