Barack Obama refuses to release the names of the 2 million-plus people who have given his campaign less than $200. According to campaign officials, it would be too difficult and time-consuming to extract this information from its database. So how come we were able to do it in a couple hours? Not literally—we don't have access to the campaign's list of donors—but we created a database of similar size and format in a Web-ready file and posted it online. (You can view a sample text version of it here. The full version is 824 MB.)
But before we get into the technical details (though, if you're with the Obama campaign and want to skip ahead, please do), it's worth dwelling on the reasons for the Obama campaign's reluctance to disclose this information. It can't be legal: No law prevents Obama from releasing these names.
Politically, there would be several advantages in releasing the names. Obama has campaigned (effectively) on a platform of making government more transparent, citing his efforts to do so in Chicago and Washington as signature achievements. He has also disclosed the bundlers who raise large amounts of money for his campaign. Finally, making the list public would rebut McCain's broad and unsubstantiated claims that the list (and the huge sums of money it represents) is shot through with fraud.
Of course, releasing the information would also be politically risky, since the inevitable errors in a database so huge (errors of the kind McCain also had, like a contribution from "Adorable Manabat") would give McCain an opportunity to scream fraud. Then again, he does that sometimes even without evidence.
And from a purely logistical standpoint, we have a hard time believing the campaign lacks the expertise to do this. We know the information is already in a very sophisticated database—it has to be, because the Obama campaign has been manipulating the information for more than a year as it continues to raise money from these small-fry donors. It also uses the information to contact and track donors to make sure they get out and vote on Election Day.
So much for the arguments. Now for the technical details. We created a randomly generated dummy database in Excel that consisted of 50,000 donors. Each entry had a field for all the data normally disclosed in a typical FEC filing for donors who give $200 or more: first name, last name, two address lines, city, state, ZIP code, employer, occupation, the amount of the donation, and the date it was given. (Excel 2003 maxes out at around 65,000 rows, and the Obama campaign is certainly using something much more sophisticated.)
To create an xml database from this data that approximates the size of Obama's donor database, we wrote a short script in Excel's built-in version of Visual Basic that looped through the database of 50,000 pretend donors 50 times, for a total of 2.5 million entries, adding each entry to an xml file. Even on a wheezing, overworked Dell Optiplex GX280 (2.8 GHz processor, 504 MB of RAM), this took exactly two hours. The resulting xml file was 824 MB—big, but not unheard of. Any competent developer could take this file and make a searchable application from it.
Web developers would be quick to point out that a huge xml file like this is too bulky for an online application to easily parse. For the Obama campaign to create a searchable database like the one the McCain campaign released, it would probably need to take a few extra steps to convert the xml document into something that can handle the size of the dataset, like MySQL. But simply for the purposes of releasing the raw data, a universal format like xml is sufficient.
Unsurprisingly, a campaign spokesman rejects the premise of our little experiment, saying the task they face is far more difficult than we think. The campaign's last FEC report, he notes, runs to 176,000 pages. But the number of pages isn't the relevant metric here; it's the size and shape of the database. And we're talking about something far less complex than an FEC report. Finally, since it's online, it requires no printer. All we're doing is rearranging 1's and 0's.
Obama aides also deflect the question about the names of the campaign's low-dollar donors by saying that the McCain has lapsed in reporting the names of more than 100,000 donors. They're right—and they illustrate the point by helpfully pointing to an online spreadsheet. Which also proves our point that it's easy to put this data together in a digestible form. So how 'bout it, guys?