We've gotten used to searching the Web like robots. Rather than talking to the search engine as we would speak to a person, we throw it a bunch of keywords ("John Lennon shot") and expect it to know what we're looking for (Mark David Chapman). How would life be different if our search engines were more human? A new piece of software called Powerset will give you some idea. Unlike the Googles or Yahoos of the world, Powerset "reads" the sites it crawls, parsing sentences for meaning with a lot of complicated algorithms. The end result is that Powerset allows you to express yourself conversationally. Want to know who shot John Lennon? Just ask: Who shot John Lennon? Mark David Chapman, Powerset replies.
In theory, this is better than keyword searching. If you have a specific question, it's convenient to ask your browser the same way you'd ask a reference librarian. But does it work that way in practice?
Not quite yet. As of now, Powerset searches only two sites: Wikipedia and Freebase, a giant database of user-generated information. Let's use Rudy Giuliani, someone with a robust Wikipedia presence, as our first guinea pig.
I'll start at the beginning: "Where was Rudy Giuliani born?" "Brooklyn," Powerset answers in large type. This information comes from Freebase, which takes what's called a "bottom-up" approach to content. Each fact in the database is compartmentalized into a particular category—Giuliani's Freebase page lists "Brooklyn" in the slot for "Place of birth."
So far, so good. But that was an easy one that Google can answer in the same number of clicks. Now for a slightly tougher question: "Who did Rudy Giuliani defeat?" The search engine returns this sentence at the top of its results: "In late 1993, David Dinkins was defeated by Rudolph Giuliani in his bid for reelection." Impressive—Powerset scores points for understanding the passive voice in the original phrase and recognizing it as an appropriate response to my active sentence. By contrast, a Google search for that specific query—"Who did Rudy Giuliani defeat"—returns nothing very useful. A more typical Google keyword search—"Rudy Giuliani defeat"—returns similarly scattered pages. Several of the top results are stories asking whether Rudy can defeat Hillary Clinton in 2008. (We don't need a search engine to know the answer to that question.) Google also retrieves a lot of random news articles that just happen to contain Giuliani's name and the word defeat.
Clicking on Rudy Giuliani's name in Powerset's search results takes us to the site's enhanced version of his Wikipedia page. Flip the switch on the "Article Outline" box in the upper right to "Show Factz," which produces a list of the article's subject-verb-object statements: "Giuliani served term," "Giuliani practiced law," "Giuliani indicted figures." The philosophy here is clear: By studying the relationships between words on a page, Powerset can unearth facts that you'd have to dig for on a traditional search engine.
The problem, as any ESL student will tell you, is that the English language is extremely difficult to parse. That means Powerset spits out a lot of garbage. Here are a few other "factz" that Powerset has culled on Rudy: "Giuliani patented walk," "vote votes gains," "grounds send children."
While Powerset is a neat demo, it's nowhere close to an improvement over any of the current titans in the industry, particularly when you consider that it works the best when you're asking for very basic information. While it can field search queries phrased as questions, the results aren't that different from what Google turns up if you limit it to searching Wikipedia. For Powerset, Wikipedia is an ideal testing ground due to its homogeneity, breadth, and familiarity. It's not a space, though, that really requires a better search engine. (Jimmy Wales agrees.)
Powerset's limited scope is, in part, a matter of resources. The site's general manager, Scott Prevost, told me that because it takes so much time to parse the grammar of individual Web pages, it takes Powerset much longer to build its index than it takes Google or Microsoft's Live Search to crawl the Web. Only in the last few years has computing power become cheap enough to make even a modest semantic search engine feasible.
We should have sympathy for anyone who tries to improve the way humans communicate with computers. Years of trial and error have made people very skilled at constructing Google searches. Most Web users now have a stable of basic tricks—putting phrases in quotes, limiting searches to individual domains—and have learned to pick out quickly what they're looking for from a long page of results. Because we're so good at Googling, a natural-language search engine has a high bar. Sure, Powerset gives me the right answer when I ask, "Who wrote The Godfather?" But so long as I can Google "godfather author" and get CNN's obituary of Mario Puzo as the first result, I'm not about to become a Google apostate.