If Roget's is becoming a relic, what lies ahead?
One of the most staggering advances in word reference is the forthcoming Historical Thesaurus of English, which will list all the words of English—modern English words as well as long-gone versions, such as the English spoken in the year 1000. The Historical Thesaurus will be a kind of companion to the famous 20-volume etymological Oxford English Dictionary. In fact, it uses the words from the OED, but, like Roget's, it is subdivided into topics, starting with the most general and then branching into ideas of greater specificity. The very patterns of word usage will document the popularity of ideas throughout history. The Historical Thesaurus will be produced in book format as well as online. Unbelievably, the project was begun pre-computer, in 1964, by a group of brave souls at the University of Glasgow. But given the number of cross-references, the fuzziness of categories, and the massive challenges of manipulation, it's hard to imagine it being finished without computers.
As old as it is, Roget's system of classification still gets play in the research world. Some lexicographers are importing his idea structure into language databases to solve word processing problems, like disambiguating words with more than one meaning. Others are trying to reorganize language data from the ground up, ripping out the hierarchical structure of Roget-style classification and replacing it with more realistically overlapping groups of ideas.
Of course, the entire Web can be used as a corpus, and its ever-changing nature makes it a particularly valuable one. Dictionaries and thesauri, even online versions, record words that remain constant over periods of time. But language is a roiling thing, and its dynamism is sometimes of the moment, not just the year or the century. Bloggers, like Mark Peters, track "nonce" words, which, by definition, have very short life spans. Speakers drop these words almost as soon as they pick them up, so few will be recorded on paper, yet they are still real words. Indeed, endlessish and crapportunity are not just the low-hanging fruit of the new world of word reference; they represent our fundamentally changed relationship with our own language. In the spirit of Roget, if not by his book, all speakers can now freely access information about the perfect Platonic classifications of words but also, crucially, about how they live.
Christine Kenneally is the author of The First Word: The Search for the Origins of Language. Her writings can be found on the blog www.christinekenneally.com.