How will the technologies of the future help protect us against terrorism? The new book Safe: The Race To Protect Ourselves in a Newly Dangerous World examines innovative techniques for sniffing out attacks before they happen and for limiting damage if a strike does occur. In today's excerpt, in the last of a three-part series, Martha Baer, Katrina Heron, Oliver Morton, and Evan Ratliff consider a new technology that allows the government to root through citizens' private data without behaving like a police state. Tuesday's excerpt explained how to recognize potential terrorists with facial heat sensors and automated video cameras. Yesterday's selection looked at a computer chip that could be the best weapon against bioterrorism.
The security detail at a riverboat casino was surprised when, a few years ago, they observed a rather obvious scam by a cheater who seemed to be working alone. The player was slipping bets onto the roulette table just after the ball had dropped, guaranteeing himself a win. The scam is called "past posting," and without a distraction team the dealer should have easily spotted it. When security did so on the surveillance camera and then swooped in to pick up the cheater, the dealer was rightly aghast.
That would have been that, perhaps, except that a security officer took down the cheater's information—name, address, phone number—and ran it through a piece of software called Non-Obvious Relationship Awareness, or NORA. The computer program took the cheater's information and scanned it through databases of fellow cheaters, known gambling felons, and casino employees. In a matter of seconds, the system came back with a hit. The cheater had shared a home phone number with the dealer; they had been colluding the whole time.
Jeff Jonas, the Las Vegas software entrepreneur behind NORA, thinks his software's ultimate calling may be to catch terrorists. Jonas' version of data mining is an attempt to address one of the many what-could-have-beens that followed Sept. 11, 2001. The question of whether or not the attacks could have been prevented is all the more haunting because of the information, now enshrined in The 9/11 Commission Report, we now know was available. There were the two hijackers listed on a State Department watch list but never tracked down. There was the July 2001 internal FBI memorandum—the famous Phoenix memo—that raised concerns about Middle Eastern men attending U.S. flight schools. There were the two Sept. 10 al-Qaida telephone intercepts that the NSA translated on Sept. 12. And there was the Aug. 16 arrest of Zacarias Moussaoui, who had been taking flying lessons in Minnesota.
The idea behind using data mining in counterterrorism is to sort through a vast sea of information and find the patterns or clues that point to a terrorist plot—no easy feat. From a political perspective, the problem is even thornier. We're caught today between a widespread fear that our privacy is at stake and an equally palpable fear that critical information may be lost in the flood of data. The larger challenge in counterterrorism is not simply to try and locate suspicious people. It is to connect the dots of a terrorist plot in advance without infringing on the privacy of innocent citizens.
It wasn't until November 2002, when the New York Times revealed the existence of a new government program called Total Information Awareness, that data mining and its connection to terrorism burst into the American consciousness. TIA was a data analysis research venture funded by the Defense Advanced Research Projects Agency that aimed to find out whether information technology could be used to uncover the digital footprints of terrorists. Critics argued that the complex web of technologies that TIA hoped to bring together were intended not just to scan data about overseas terrorists, but to illegally do the same to American citizens at home. To privacy advocates, TIA looked like the most extreme example of a post-9/11 administration spun out of control.
After it was exposed to the public, Total Information Awareness was never able to salvage any public credibility. In October 2003, President Bush signed the Department of Defense appropriations bill, which contained a provision banning the use of funds for TIA. The legislative counsel for the American Civil Liberties Union declared TIA's shutdown "a resounding victory for individual liberty."
But the death of TIA was not the end of data mining's application for security questions. In fact many of the most controversial TIA projects simply switched funding sources to classified ones. Finding a way to scan and exchange data remains an active interest of intelligence agencies. One question, then, is whether there are technical ways to mine networked data and preserve both secrecy and privacy at the same time. Jonas thinks he has an answer, which he says came to him after he heard that the government had trouble keeping its watch-list data under wraps. He also knew—from the TIA controversy and the firestorm of criticism over airlines such as JetBlue giving passenger data to the government—that Americans are becoming increasingly skeptical of corporations handing over their personal data to the government. What Jonas came up with is a means to anonymize information but still allow it to be searched for links. He named it ANNA, and he says it's the answer to "how to know everything about everyone without knowing anything about anyone."
ANNA works like this: The software takes a set of data and applies a mathematical encryption formula that converts each piece of data—a name, an address, a phone number—into an indecipherable string of characters. The name al-Midhar, for example, could be transformed into cbd034409c22929518fa494f99dc9964. It's called a one-way hash, and in the case of ANNA, the hash function serves to create an anonymous version of the information stored in the database. Each string of numbers is unique, so if two pieces of data differ by even a letter or a comma, the resulting hash will be completely different. ANNA also takes the common data errors found by NORA—misspellings of names, transposed birth dates—and hashes them as well. Then it does the same for the names and other information on the watch list (which might include birth dates, addresses, or Social Security numbers). Once all the data is hashed, NORA or another system could search for matches between the unique numbers without ever revealing the underlying data.
Let's say the government is looking for a particular suspect, John Doe, and wants to find out if certain companies have any data about him. It runs a hash on "John Doe," his birth date, Social Security number, and any other information it has on him. The result is a string of letters and numbers. It then hands that string over to the companies, which have run the same hash function on all of their data. Then the company simply looks for matching strings in its database. If it finds one, it alerts the government, which then could obtain a court order to un-anonymize the data.
Because a hash cannot be unencrypted, there is no way to use the number to discover the data it stands for—you have to have the original. That means that the government could in theory hand its whole encrypted watch list over to companies without having to reveal it, and companies could potentially send large amounts of data to the government without violating the privacy of the subjects involved. The data would simply be a database full of strings of numbers.
While Jonas' solution doesn't solve every privacy and information-sharing problem, it would help provide some basic protections. When Congress killed the TIA program, however, it scuttled Jonas' talks with the TIA brass about the concepts behind ANNA. If we do decide, in the end, that data mining should be part of our national security strategy, we shouldn't expect that a computer will somehow uncover the next 9/11 on its own, though it might increase our analysts' chances of doing so. How successful their effort will be depends, in part, on the other half of the data-mining debate—how much information we want the government to access.