The Electronic Corpus

Jesse Sheidlower and Dennis Baron

The Electronic Corpus

Jesse Sheidlower and Dennis Baron

The Electronic Corpus
An email conversation about the news of the day.
Oct. 21 1999 11:59 AM

Jesse Sheidlower and Dennis Baron


Dear Dennis,


Sorry for having to cut things short yesterday.

The Future of the Book question is one that's been so thoroughly discussed, from Plato all the way to Gass' article (unless there's one even more recent), that I hardly think I could add anything. But I don't think that books are going away in the near future. Even if we had some sort of electronic book that was convenient and had a good screen, physical books, in dead-tree editions, are still superior on so many grounds that they'll be around for a while. Some technologies, such as print-on-demand, will have a real affect on the publishing world, but that's a different subject. And I say this as someone also perfectly comfortable with computers; I'd have a hard time getting any kind of serious writing done on paper (though I do love fountain pens and think they're not at all inconvenient), and I can comfortably revise and edit onscreen.

In the world of language studies, however, I think computers have already had a tremendous impact, and this impact will grow. The tools available for language analysis are so helpful that it's hard to imagine going back to a time when we didn't have them. The large-scale electronic corpus is indispensable for any sort of dictionary work today. Briefly, a good corpus will gather millions of words, from a wide and carefully selected range of sources, so that you can study it for any sort of patterns you want. If you're concerned with the words alone, you can look up individual words to see how they're used. (A rough idea of how this works can be had by entering a moderately uncommon word such as "nugatory" into an Internet search engine and seeing what you get.) But you can also learn about grammatical patterns that you'd never be able to judge otherwise--the ratio of "different from" to "different than" in texts, for example.

Large-scale full-text databases are another tool that has proved supremely effective. Fred Shapiro is perhaps the most skilled researcher in this area, and he's been able to find incredibly early examples of words and phrases: "politically correct" from the 18th century, or "hopefully" in its disputed sense--"it is to be hoped"--from numerous very formal sources in the 1930s through the 1950s, showing that the usage didn't simply pop up ex nihilo in the 1960s. As more and more databases become available, we're likely to see ever more important discoveries that force us to rethink what we believed about the history and development of our language.

You probably know more than I do about the use of computers for attribution studies. Donald Foster is the best-known scholar here, thanks to his successful exposure of Joe Klein as the author of Primary Colors and his advocacy of the awful "Funeral Elegy" as a part of Shakespeare's canon. But whatever one thinks of this particular poem, he and others have showed us new ways of thinking about old problems, and I'm sure that we'll get many new insights from the tools of this trade.

What sort of effect do you find computers have had on your students? Has there been any sort of change in their writing or reading skills that you can attribute to them? Do you make use of computers as a teaching tool in any way--having students do research on the Web, perhaps? You've been at it a long time, so you're in a good position to judge whether they're helping or hurting in the trenches.



Jesse Sheidlower is principal editor of the North American Editorial Unit of the Oxford English Dictionary and author of The F-Word (clickhereandhereto buy the books). Dennis Baron teaches English at the University of Illinois and is the author of The Guide to Home Language Repair (clickhereto buy the book).