HOME / webhead: Inside the Internet.

Digging for GoogleholesGoogle may be our new god, but it's not omnipotent.

Illustration by Mark Alan StamatyThe arrival of Google five years ago served as a kind of upgrade for the entire Web. Searching for information went from a sluggish, unreliable process to something you could do with genuine confidence. If it was online somewhere, Google and its ingenious PageRank system would find what you were looking for—and more often than not, the information would arrive in Google's top 10 results.

But the oracle—recently described as "a little bit like God" in the New York Times—is not perfect. Certain types of requests foil the Google search system or produce results that frustrate more than satisfy. These are systemic problems, not isolated ones; you can reproduce them again and again. The algorithms that Google's search engine relies on have been brilliantly optimized for most types of information requests, but sometimes that optimization backfires. That's when you find yourself in a Googlehole.

Googlehole No. 1: All Shopping, All the Time. If you're searching for something that can be sold online, Google's top results skew very heavily toward stores, and away from general information. Search for "flowers," and more than 90 percent of the top results are online florists. If you're doing research on tulips, or want to learn gardening tips, or basically want to know anything about flowers that doesn't involve purchasing them online, you have to wade through a sea of florists to find what you're looking for.

The same goes for searching for specific products: Type in the make and model of a new DVD player, and you'll get dozens of online electronic stores in the top results, all of them eager to sell you the item. But you have to burrow through the results to find an impartial product review that doesn't appear in an online catalog.

I suspect this emphasis is due to the convention of linking to an online store when mentioning a product, whether it's a book, CD, or outdoor grill. In addition, a number of sites—such as DealTime—track the latest prices and availability of thousands of items at online stores, which creates even more product links in Google's database. Because PageRank assumes that pages that attract a lot of links are more relevant than pages without links, these most-linked-to product pages bubble up to the top.

Google is replicating one of the problems experienced by some of the big portals—sites like Lycos and Infoseek—during the boom years. They sold so much real estate on their pages to online stores and other advertisers that their results became less reliable, which gave Google its opening in the first place. Now the same thing is happening again, only it's happening organically, without Google manipulating the integrity of its search engine.

Googlehole No. 2: Skewed Synonyms. Search for "apple" on Google, and you have to troll through a couple pages of results before you get anything not directly related to Apple Computer—and it's a page promoting a public TV show called Newton's Apple. After that it's all Mac-related links until Fiona Apple's home page. You have to sift through 50 results before you reach a link that deals with apples that grow on trees: the home page for the Washington State Apple Growers Association. To a certain extent, this probably reflects the interest of people searching as well as those linking, but is the world really that much more interested in Apple Computer than in old-fashioned apples?

At this stage in the Web's development, people who create a lot of links—most notably the blogging community—tend to be more technologically inclined than the general population, and thus more likely to link to Apple Computer than something like the Washington State Apple Growers Association. (This process is sometimes known as "googlewashing," where one group of prolific linkers can alter the online associations with a given word or phrase.) But there's another factor here, which is that categories that don't have central, well-known sites devoted to them will fare poorly when they share a keyword with other categories. Maybe there are thousands of pages that deal with apples, but only one Apple Computer or Fiona Apple home page. People interested in growing or eating apples will distribute those links more widely across those thousands of pages, while Mac or Fiona fans will consolidate around fewer pages, driving them higher in Google's rankings.

Googlehole No. 3: Book Learning. Google is beginning to have a subtle, but noticeable effect on research. More and more scholarly publications are putting up their issues in PDF format, which Google indexes as though they were traditional Web pages. But almost no one is publishing entire books online in PDF form. So, when you're doing research online, Google is implicitly pushing you toward information stored in articles and away from information stored in books. Assuming this practice continues, and assuming that Google continues to grow in influence, we may find ourselves in a world where, if you want to get an idea into circulation, you're better off publishing a PDF file on the Web than landing a book deal.

There's a parallel development in Google's treatment of Web sites that restrict access to their archives. The New York Times may be an authority in the world of opinion, but its closed archives mean that its articles rarely rank highly in Google results, if they appear at all. Search for "Augusta National," Howell Raines' pet obsession from this year, and not a single page from the Times site appears in the top 50 results. Uber-blogger Dave Winer bet the CEO of the New York Times Digital last year that in 2007 bloggers will rank higher than the Times in Google searches. As Winer now puts it: "If you want to be in Google, you gotta be on the Web."

You can't really hold Google responsible for these blind spots. Each of them is just a reflection of the way the Web has been organized by the millions who have contributed to its structure. But the existence of Googleholes suggests an important caveat to the Google-as-oracle rhetoric: Google may be the closest thing going to a vision of the "group mind," but that mind is shaped by the interests and habits of the people who create hypertext links. A group mind decides that Apple Computer is more relevant than the apples that you eat, but that group doesn't speak for everybody.

We're wrong to think of Google as a pure reference source. It's closer to a collectively authored op-ed page—filled with bias, polemics, and a skewed sense of proportion—than an encyclopedia. It's still the connected world's most dazzling place to visit, a perfect condensation of the Web's wider anarchy. Just don't call it an oracle.

Print This ArticlePRINTEmail to a FriendE-MAILShare This ArticleRECOMMEND...Get Slate RSS FeedsRSS
Steven Johnson is the author of five books, including Everything Bad Is Good For You and The Ghost Map, and co-founder of Outside.in.
Illustration by Mark Alan Stamaty.
COMMENTS

Remarks from the Fray:

God helps those who help themselves. I know, it's already a bandwagon, but: google advanced search [www.google.com]. Try it, you'll like it. For instance, I just did a search for the words "apple" and varieties" excluding the word "computer." Here [www.google.com] are the results. I can remember two occasions, from thousands, on which I couldn't find pretty much exactly what I was looking for. Sometimes takes a few tries to hit on the ideal search criteria, but dang, man, if you can output a whole article about the limitations of google, surely you can input a few different search terms. Surely?

--T_Weldon_Berger

(To reply, click here)



…This "critique" of Google relies wholly on examples of incompetent Google use.
Problem #1: Shopping
The average person knows that things that can be sold online... are. So, if you want to KNOW something about a thing that's sold online, you have to be a little more specific. This [www.google.com] is what happens when you just type "flowers" into Google. This [www.google.com] is what happens when you type in "flowers biology". Suddenly, you only get the articles that discuss flowers AND biology. Not a lot of florists in that category. You can't just type a single word into Google and get the exact information you're looking for. Any more than you can tell a person to "do something" and expect them to do what you want. You have to be specific.
Problem #2:Skewed Synonyms
This seems basically like the problem before, because you've chosen a poor example. Maybe "demon" would've been better, where demon software has long since transplanted demonology. Still, let's use your example. You want to know about apples? Your [www.google.com] result yields Apple Computer. But if you type in "apple nutrition" you get this [www.google.com] result. Much better, no? Again, specificity is the key to finding what you want. Generic words will not yield the response you're looking for.
Problem #3:Book Learning
Yes, articles are spewed onto the web. But if you managed to do any research in graduate school, you'd know that articles are ALWAYS the first place to start when researching a subject you know nothing about. Why? BIBLIOGRAPHIES! If you're serious, and you want to research, you hit the articles, and then you hit their sources. It's true that an article on the web will satisfy the casual researcher, in the same way that an encyclopedia article was once sufficient to write a high school book report. But a real researcher has to consult real sources, and knows how to use the web to hunt them down.

--Geoff

(To reply, click here)



Steven Johnson Responds:

Folks, obviously *obviously* you can refine searches on Google to get better results than the ones I suggested, and obviously Google can't read minds. The point is that when Google quite understandably fails to read your mind, it doesn't direct you randomly to the wrong URLs -- it directs you to certain types of URLs more often than others. It has BIAS, in other words. And when millions of people are using the engine every day, many of whom aren't inclined to refine their searches, that bias shouldn't be underestimated. Here's perhaps a better example than the Apple one. My blog alternates between being the number 1 and number 2 result when you search on the word "steven." So Google is guessing on average that more people searching for Steven are looking for me than for Spielberg, Seagal, Soderbergh, etc. I'd like to flatter myself and think this is true, but alas I think it's pretty unlikely. What google is reflecting there is the bias of the blogging community, which benefits me, of course, but it certainly doesn't reflect the general interests of the overall population, or even the general web surfing population. No search engine is perfect, but each search engine is imperfect in specific ways. If we're going to rely on search engines increasingly to make sense of the world, we should 1) come up with more refined search techniques that get us around these imperfections, and 2) be aware of the imperfections themselves, and their biases. This essay was trying to wrestle with the latter -- but it by no means denies the importance of the former...

(To reply, click here)

(7/16)

What did you think of this article?
Join The Fray: Our Reader Discussion Forum
POST A MESSAGE | READ MESSAGES
TODAY'S PICTURES
TODAY'S CARTOONS
TODAY'S DOONESBURY
TODAY'S VIDEO
Hallo, Berlin.55/091106_TP.jpg
Cartoonists' take on gay rights.17/091106_TC.jpg
High praise.4/091106_TD.jpg