Last week the Federal Trade Commission and Google signed a broad privacy settlement that requires the search company to submit to "privacy audits" every two years. The agreement ended a dispute that began last year, when Google launched Buzz, the ill-fated social-messaging system built into Gmail.
Buzz was certainly a privacy boondoggle for Google—a black eye for a company that had been trying to position itself as the good guy to Facebook's bad guy. I agree with the FTC that Google should pay for the mistakes it made (the company has apologized and says it's fixed its privacy procedures to prevent another such imbroglio). And if they're done judiciously, the privacy audits may prove helpful in ensuring that Google stays on the up and up.
But that's what I worry about: Will the audits in fact be done judiciously? There's a good chance that privacy regulators—spurred by a public that doesn't really know what it wants when it comes to online privacy—may go too far, blocking Google from collecting and analyzing information about its users. That will be a terrible outcome, because while we all reflexively hate the thought of a company analyzing our digital lives, we also benefit from this practice in many ways that we don't appreciate.
I know I sound naïve, but bear with me. Yes, Google collects a lot of information about all of us. It does so on purpose, and for all sorts of reasons. Some of these reasons we don't like very much—Google, like all big Web companies, sells ads, and it can get more money for those ads when they're targeted to you. This practice pays for the Web, and it's the reason you don't pay a fee to conduct a Google search. Still, I understand why people are wary of online data collection. Too often, though, our conversations about online privacy end right here.
Broadly speaking, there are two types of data that Web companies keep on us—personally identifiable information (like your name and list of friends), and information that can't be tied to you as an individual. In our discussions about privacy, we rarely make this important distinction. While we focus on the disadvantages of companies collecting our information, we rarely look at the innovations that wouldn't be possible without our personal data. This is especially true when it comes to anonymous data—information that can't be used to identify you, but which serves as the building blocks of amazing things.
Indeed, some of Google's best and most-loved products would not be possible without our data. Take the spell checker: How does Google know you meant Rebecca Black when you typed Rebeca Blacke? Note that this is a trick that no ordinary, dictionary-based spell-checker could perform—these are proper nouns, and we're dealing with an ephemeral personality. But since Google has stored lots of other people's search requests for Black, it knows you're looking for the phenom behind "Friday." The theory behind the spell-checker can be applied more broadly. By studying words that often come together in search terms—for instance, people may either search for "los angeles murder rate" or "los angeles homicide rate"—Google can detect that two completely different words may have the same meaning. This has profound implications for the future of computing: In a very real sense, mining search queries is teaching computers how to understand language (and not just English, either). If Google were forced to forget every search query right after it served up a result, none of these things would be possible.