How Does Wikipedia Describe Your State?

Arts, entertainment, and more.
May 12 2014 8:45 AM

How Does Wikipedia Describe Your State?

Pennsylvania is chocolate. New Hampshire is die.

140509_CBOX_StateWordAssociation

Illustration by Lisa Larson-Walker

Last month, I wrote about the fun and the pitfalls of viral maps, a feature that included 88 super-simple maps of my own creation. As a follow-up, I’m writing up short items on some of those maps, walking through how I created them and how they succumb (and hopefully overcome) the shortfalls of viral cartography.

Here is one map that used Wikipedia as a data set.

CBOX_BlattWiki_Map1

Data source: Wikipedia articles accessed March 2014. Map by Ben Blatt/Slate.

The user-written encyclopedia, unknowingly, confirms many of our suspicions about each state. Colorado’s Wikipedia page is littered with talk of cannabis, Nevada’s article talks about divorce, and Pennsylvania (the home of Hershey’s) is labeled chocolate.

Advertisement

I calculated which word was used most disproportionately by looking at each word’s frequency within the state’s page compared with its frequency in the pages for all 50 states. This means that words that show up in a whole bunch of articles would be unlikely to make any state’s list. I also needed to use a cutoff to exclude words that appeared very rarely—if a word showed up once on the Texas page and zero times elsewhere, it would be hard to call its use disproportionate. In the map above, I used an arbitrary cutoff, limiting the selection to words used at least three times in other articles.

Here’s a dirty secret of viral maps: Change that arbitrary cutoff, and you’ll get something that looks completely different. Here’s the map with a cutoff of 10.

CBOX_BlattWiki_Map2

Data source: Wikipedia articles accessed March 2014. Map by Ben Blatt/Slate.

North Dakota is suddenly described as males instead of dancers, and New Hampshire goes from decreeing “Live Free or Die” to “Live Free or Felt.”

Why use the disproportionate method at all? Why not just look at the most common words in each article? Well, if we do that, the results are a bit less interesting.

CBOX_BlattWiki_Map3

Data source: Wikipedia articles accessed March 2014. Map by Ben Blatt/Slate.

See more of Slate’s maps.

TODAY IN SLATE

Politics

Meet the New Bosses

How the Republicans would run the Senate.

The Government Is Giving Millions of Dollars in Electric-Car Subsidies to the Wrong Drivers

Scotland Is Just the Beginning. Expect More Political Earthquakes in Europe.

Cheez-Its. Ritz. Triscuits.

Why all cracker names sound alike.

Friends Was the Last Purely Pleasurable Sitcom

The Eye

This Whimsical Driverless Car Imagines Transportation in 2059

Medical Examiner

Did America Get Fat by Drinking Diet Soda?  

A high-profile study points the finger at artificial sweeteners.

The Afghan Town With a Legitimately Good Tourism Pitch

A Futurama Writer on How the Vietnam War Shaped the Series

  News & Politics
Photography
Sept. 21 2014 11:34 PM People’s Climate March in Photos Hundreds of thousands of marchers took to the streets of NYC in the largest climate rally in history.
  Business
Business Insider
Sept. 20 2014 6:30 AM The Man Making Bill Gates Richer
  Life
Quora
Sept. 20 2014 7:27 AM How Do Plants Grow Aboard the International Space Station?
  Double X
The XX Factor
Sept. 19 2014 4:58 PM Steubenville Gets the Lifetime Treatment (And a Cheerleader Erupts Into Flames)
  Slate Plus
Tv Club
Sept. 21 2014 1:15 PM The Slate Doctor Who Podcast: Episode 5  A spoiler-filled discussion of "Time Heist."
  Arts
Brow Beat
Sept. 21 2014 2:00 PM Colin Farrell Will Star in True Detective’s Second Season
  Technology
Future Tense
Sept. 19 2014 6:31 PM The One Big Problem With the Enormous New iPhone
  Health & Science
Bad Astronomy
Sept. 21 2014 8:00 AM An Astronaut’s Guided Video Tour of Earth
  Sports
Sports Nut
Sept. 18 2014 11:42 AM Grandmaster Clash One of the most amazing feats in chess history just happened, and no one noticed.