Digging for Googleholes

Google may be our new god, but it’s not omnipotent.

July 16, 200311:39 AM

The arrival of Google five years ago served as a kind of upgrade for the entire Web. Searching for information went from a sluggish, unreliable process to something you could do with genuine confidence. If it was online somewhere, Google and its ingenious PageRank system would find what you were looking for—and more often than not, the information would arrive in Google’s top 10 results.

But the oracle—recently described as “a little bit like God” in the New York Times—is not perfect. Certain types of requests foil the Google search system or produce results that frustrate more than satisfy. These are systemic problems, not isolated ones; you can reproduce them again and again. The algorithms that Google’s search engine relies on have been brilliantly optimized for most types of information requests, but sometimes that optimization backfires. That’s when you find yourself in a Googlehole.

Googlehole No. 1: All Shopping, All the Time. If you’re searching for something that can be sold online, Google’s top results skew very heavily toward stores, and away from general information. Search for “flowers,” and more than 90 percent of the top results are online florists. If you’re doing research on tulips, or want to learn gardening tips, or basically want to know anything about flowers that doesn’t involve purchasing them online, you have to wade through a sea of florists to find what you’re looking for.

The same goes for searching for specific products: Type in the make and model of a new DVD player, and you’ll get dozens of online electronic stores in the top results, all of them eager to sell you the item. But you have to burrow through the results to find an impartial product review that doesn’t appear in an online catalog.

I suspect this emphasis is due to the convention of linking to an online store when mentioning a product, whether it’s a book, CD, or outdoor grill. In addition, a number of sites—such as DealTime—track the latest prices and availability of thousands of items at online stores, which creates even more product links in Google’s database. Because PageRank assumes that pages that attract a lot of links are more relevant than pages without links, these most-linked-to product pages bubble up to the top.

Google is replicating one of the problems experienced by some of the big portals—sites like Lycos and Infoseek—during the boom years. They sold so much real estate on their pages to online stores and other advertisers that their results became less reliable, which gave Google its opening in the first place. Now the same thing is happening again, only it’s happening organically, without Google manipulating the integrity of its search engine.

Googlehole No. 2: Skewed Synonyms. Search for “apple” on Google, and you have to troll through a couple pages of results before you get anything not directly related to Apple Computer—and it’s a page promoting a public TV show called Newton’s Apple. After that it’s all Mac-related links until Fiona Apple’s home page. You have to sift through 50 results before you reach a link that deals with apples that grow on trees: the home page for the Washington State Apple Growers Association. To a certain extent, this probably reflects the interest of people searching as well as those linking, but is the world really that much more interested in Apple Computer than in old-fashioned apples?

At this stage in the Web’s development, people who create a lot of links—most notably the blogging community—tend to be more technologically inclined than the general population, and thus more likely to link to Apple Computer than something like the Washington State Apple Growers Association. (This process is sometimes known as “googlewashing,” where one group of prolific linkers can alter the online associations with a given word or phrase.) But there’s another factor here, which is that categories that don’t have central, well-known sites devoted to them will fare poorly when they share a keyword with other categories. Maybe there are thousands of pages that deal with apples, but only one Apple Computer or Fiona Apple home page. People interested in growing or eating apples will distribute those links more widely across those thousands of pages, while Mac or Fiona fans will consolidate around fewer pages, driving them higher in Google’s rankings.

Googlehole No. 3: Book Learning. Google is beginning to have a subtle, but noticeable effect on research. More and more scholarly publications are putting up their issues in PDF format, which Google indexes as though they were traditional Web pages. But almost no one is publishing entire books online in PDF form. So, when you’re doing research online, Google is implicitly pushing you toward information stored in articles and away from information stored in books. Assuming this practice continues, and assuming that Google continues to grow in influence, we may find ourselves in a world where, if you want to get an idea into circulation, you’re better off publishing a PDF file on the Web than landing a book deal.

There’s a parallel development in Google’s treatment of Web sites that restrict access to their archives. The New York Times may be an authority in the world of opinion, but its closed archives mean that its articles rarely rank highly in Google results, if they appear at all. Search for “Augusta National,” Howell Raines’ pet obsession from this year, and not a single page from the Times site appears in the top 50 results. Uber-blogger Dave Winer bet the CEO of the New York Times Digital last year that in 2007 bloggers will rank higher than the Times in Google searches. As Winer now puts it: “If you want to be in Google, you gotta be on the Web.”

You can’t really hold Google responsible for these blind spots. Each of them is just a reflection of the way the Web has been organized by the millions who have contributed to its structure. But the existence of Googleholes suggests an important caveat to the Google-as-oracle rhetoric: Google may be the closest thing going to a vision of the “group mind,” but that mind is shaped by the interests and habits of the people who create hypertext links. A group mind decides that Apple Computer is more relevant than the apples that you eat, but that group doesn’t speak for everybody.

We’re wrong to think of Google as a pure reference source. It’s closer to a collectively authored op-ed page—filled with bias, polemics, and a skewed sense of proportion—than an encyclopedia. It’s still the connected world’s most dazzling place to visit, a perfect condensation of the Web’s wider anarchy. Just don’t call it an oracle.