The Main Revelation from the GOP Data Firm’s Leak? Political Data Is Vastly Overvalued.

June 19, 20176:44 PM

Can we finally let the air out of our political-data obsession?
John Moore/Getty Images

Let’s say you had some data that you wanted to make accessible to anyone in the entire world. You might, as many people do, accomplish this by renting some server space from Amazon and posting your data there—and anyone who typed in the appropriate web address could then access it. It’s easy. It’s also exactly what Deep Root Analytics apparently did when the firm was hired by the Republican National Committee last year to compile voter data for a fee of nearly $1 million. Unfortunately, of course, Deep Root wasn’t supposed to have made that data public.

But it had. Last week, UpGuard cyber risk analyst Chris Vickery discovered more than 1 terabyte of the data, containing personal information about 198 million U.S. citizens, sitting completely unprotected on that Amazon server, available to anyone who had been given—or could find—its web address (which, incidentally, was the Amazon subdomain “dra-dw” for Deep Root Analytics Data Warehouse).

The unprotected data included voter names, birthdays, addresses, voting and registration history, and phone numbers. According to UpGuard, the data also included analysis of voters’ stances on different policy issues—in one 50 GB database, voters were “scored with a decimal fraction between zero and one across forty-six columns” to represent how likely the voter was to agree with certain policy and belief statements, including clear statements like: “AmericaFirstForeignPolicy_agree,” “FinancialServicesHarmful_agree,” “FossilFuelsNeedToMoveAwayFrom,” “LowerTaxes_agree,” “PharmaCompsDoGreatDamage_agree,” “RepealObamacare_agree,” “NonReluctantDJTVoter,” and “StopIllegalImmigration_agree.”

The good news (if you want to call it that) is that a lot of this data was probably already pretty easy to access (or easy to guess). The better news is that there’s no clear evidence that it was ever accessed or stolen by any unauthorized parties prior to Vickery’s discovery.

From a technical standpoint, there’s very little to be said about a case of such extraordinary, blatant negligence. There were no out-of-date security measures or sophisticated intrusion tactics needed here because Deep Root simply put no security in place in the first place. (The firm did reportedly have another 24 terabytes of data stored with Amazon that were password-protected.)

From a policy perspective, it’s perhaps a little jarring—but not terribly surprising—that there will likely be no consequences whatsoever for Deep Root for failing to protect this data. It’s not illegal for a company to post a lot of personal information online. And the repository was removed from the public-facing server the evening of June 14, after Vickery reported it to unnamed “federal authorities,” and before UpGuard’s and Gizmodo’s public posts about the incident were published on Monday.

The only government body that even attempts to enforce some measure of consequences for companies that fail to protect user data is the Federal Trade Commission, which occasionally investigates firms for data practices that it believes qualify as “unfair” or “deceptive.” That’s unlikely to happen to Deep Root, both because the current political climate will discourage the FTC from choosing a conservative data analytics firm as a target, and because to classify Deep Root’s data protection practices (or lack thereof) as “unfair,” the FTC would have to be able to identify some concrete (probably financial) harm that had befallen the people whose data was made available.

It’s not entirely clear what harm, if any, could be caused by the disclosure of the information collected by Deep Root. People’s names and addresses and birth dates can be used for identity theft, but much of that data can be collected through various innocent online means. And anyway, there’s no evidence to suggest this particular data set was ever stolen, and might now be used maliciously.

The most striking elements of the Deep Root data set have nothing to do with security, or the fact that the data was left so utterly unprotected. What’s astonishing is that it was compiled from so many different firms and sources, and that many of these firms and sources employed data collection methods that seem spectacularly unsophisticated. The files compiled by Deep Root include data collected from a number of other data and market research firms, including Data Trust, The Kantar Group, TargetPoint Consulting, Inc., and American CrossRoads, according to Gizmodo.

With a price tag of nearly $1 million and the combined wisdom of all those different firms pooling their various data collection efforts, one might have expected the Deep Root data set to contain a wealth of insightful and important voter targeting information. And perhaps it does—it’s protected now, so we don’t know the full contents. But what we do know suggests that it was not that revelatory—for one thing, it contained an astonishing amount of seemingly completely worthless data, most notably a large cache of Reddit posts.

The obvious lesson to draw from Deep Root’s exposure is that it’s unwise to post personal data on a public-facing web server. But there also a broader lesson about how the world has come to see data these days, that reveals just how desperate politicians and campaigns are for data and what a glut of worthless data there is being sold and resold across the political world. Rather than making political data analytics seem like a serious or sophisticated undertaking to effectively target individual voters, Deep Root’s data breach and what it revealed is behind the curtain makes these analytics companies look like a largely haphazard operation in which everyone is falling over themselves to buy data of any variety—the more the better—regardless of its value.