How Does Wikipedia Describe Your State?

Arts, entertainment, and more.
May 12 2014 8:45 AM

How Does Wikipedia Describe Your State?

Pennsylvania is chocolate. New Hampshire is die.

140509_CBOX_StateWordAssociation

Illustration by Lisa Larson-Walker

Last month, I wrote about the fun and the pitfalls of viral maps, a feature that included 88 super-simple maps of my own creation. As a follow-up, I’m writing up short items on some of those maps, walking through how I created them and how they succumb (and hopefully overcome) the shortfalls of viral cartography.

Here is one map that used Wikipedia as a data set.

CBOX_BlattWiki_Map1

Data source: Wikipedia articles accessed March 2014. Map by Ben Blatt/Slate.

The user-written encyclopedia, unknowingly, confirms many of our suspicions about each state. Colorado’s Wikipedia page is littered with talk of cannabis, Nevada’s article talks about divorce, and Pennsylvania (the home of Hershey’s) is labeled chocolate.

Advertisement

I calculated which word was used most disproportionately by looking at each word’s frequency within the state’s page compared with its frequency in the pages for all 50 states. This means that words that show up in a whole bunch of articles would be unlikely to make any state’s list. I also needed to use a cutoff to exclude words that appeared very rarely—if a word showed up once on the Texas page and zero times elsewhere, it would be hard to call its use disproportionate. In the map above, I used an arbitrary cutoff, limiting the selection to words used at least three times in other articles.

Here’s a dirty secret of viral maps: Change that arbitrary cutoff, and you’ll get something that looks completely different. Here’s the map with a cutoff of 10.

CBOX_BlattWiki_Map2

Data source: Wikipedia articles accessed March 2014. Map by Ben Blatt/Slate.

North Dakota is suddenly described as males instead of dancers, and New Hampshire goes from decreeing “Live Free or Die” to “Live Free or Felt.”

Why use the disproportionate method at all? Why not just look at the most common words in each article? Well, if we do that, the results are a bit less interesting.

CBOX_BlattWiki_Map3

Data source: Wikipedia articles accessed March 2014. Map by Ben Blatt/Slate.

See more of Slate’s maps.

TODAY IN SLATE

Medical Examiner

The Most Terrifying Thing About Ebola 

The disease threatens humanity by preying on humanity.

I Bought the Huge iPhone. I’m Already Thinking of Returning It.

Scotland Is Just the Beginning. Expect More Political Earthquakes in Europe.

Students Aren’t Going to College Football Games as Much Anymore

And schools are getting worried.

Global Marches Demand Action on Climate Change

Politics

Blacks Don’t Have a Corporal Punishment Problem

Americans do. But when blacks exhibit the same behaviors as others, it becomes part of a greater black pathology. 

Why a Sketch of Chelsea Manning Is Stirring Up Controversy

How Worried Should Poland, the Baltic States, and Georgia Be About a Russian Invasion?

Moneybox
Sept. 19 2014 1:11 PM Americans' Inexplicable Aversion to the 1990s
  News & Politics
Weigel
Sept. 20 2014 11:13 AM -30-
  Business
Business Insider
Sept. 20 2014 6:30 AM The Man Making Bill Gates Richer
  Life
Quora
Sept. 20 2014 7:27 AM How Do Plants Grow Aboard the International Space Station?
  Double X
The XX Factor
Sept. 19 2014 4:58 PM Steubenville Gets the Lifetime Treatment (And a Cheerleader Erupts Into Flames)
  Slate Plus
Tv Club
Sept. 21 2014 1:15 PM The Slate Doctor Who Podcast: Episode 5  A spoiler-filled discussion of "Time Heist."
  Arts
Brow Beat
Sept. 21 2014 2:00 PM Colin Farrell Will Star in True Detective’s Second Season
  Technology
Future Tense
Sept. 19 2014 6:31 PM The One Big Problem With the Enormous New iPhone
  Health & Science
Bad Astronomy
Sept. 21 2014 8:00 AM An Astronaut’s Guided Video Tour of Earth
  Sports
Sports Nut
Sept. 18 2014 11:42 AM Grandmaster Clash One of the most amazing feats in chess history just happened, and no one noticed.