What Is the Longest Disambiguation Page on Wikipedia?

Future Tense
The Citizen's Guide to the Future
May 30 2014 10:47 AM

What Is the Longest Disambiguation Page on Wikipedia?

Have you ever found yourself looking up John Smith on Wikipedia, only to discover that there are 205 different John Smiths with Wikipedia pages? It’s a testament to the breadth of knowledge on Wikipedia, but it can also be kind of annoying: What if you just want to know the real deal about the English explorer John Smith’s encounter with Pocahontas?

FT-140530-disam1

Screenshot from Wikipedia.

Advertisement

I found myself in the above situation recently and decided that it’d be interesting to know what is the longest disambiguation page on all of Wikipedia. John Smith has 205 entries, which seems like a lot, but maybe there are other generic terms that have even more Wikipedia entries?

Luckily Wikipedia provides an alphabetical list of all approximately 250,000 disambiguation pages. I work at Rap Genius, and I modified the Rap Genius Trackback Scraper to iterate through every disambiguation page, count up the number of list items in each page’s "may refer to" section, and store the results in a database.

Without further ado, the top 10 longest Wikipedia disambiguation pages:

St. Mary's Church is the most ambiguous term on Wikipedia, followed by Communist Party and Aliabad, which is apparently a common Persian town name. Now if only we could get one of the many Communist Parties to hold a group meeting at a St. Mary's Church in an Aliabad ...

Other tidbits:

It’s a bit surprising to see so many Persian town names at the top of the list. Closer investigation reveals that a single Wikipedia user, Carlossuarez46, seems to have contributed most of the edits to those pages.

William Smith just beats out John Smith as the most ambiguous person, by a score of 211 to 205.

The top scientific term is the species abbreviation C. elegans, with 223 "may refer to" links.

Church names are heavily represented. The longest St. [name] Church formulations are:

  1. Mary: 584
  2. John: 211
  3. Peter: 197
  4. George: 164
  5. Michael: 159

And the longest First [branch] Church formulations:

  1. Lutheran: 279
  2. Presbyterian: 230
  3. Baptist: 218
  4. Congregational: 94
  5. Church of Christ, Scientist: 70
FT-140530-disam2

Chart by Todd Schneider.

The distribution of disambiguation pages shows a heavy right skew:

  • Median length of 4 "may refer to" links
  • Mean length is 7.1
  • Most common length is 2
  • 25 percent of all disambiguation pages have a length of 2

Here’s a Google Spreadsheet with the top 1,000 longest pages, and you can download the full dataset as a .csv from GitHub

Future Tense is a partnership of SlateNew America, and Arizona State University.

Todd Schneider is a developer at Rap Genius. The Huffington Post has called him "a Reform Jew who went to Yale University and once worked for a hedge fund."

  Slate Plus
Working
Dec. 18 2014 4:49 PM Slate’s Working Podcast: Episode 17 Transcript Read what David Plotz asked a middle school principal about his workday.