Does the Web have a dialect? Study delves into regional variations in Twitter language.

TBH, Internet Dialects Are More Complicated Than They Seem

TBH, Internet Dialects Are More Complicated Than They Seem

Future Tense
The Citizen's Guide to the Future
Dec. 9 2014 5:51 PM

TBH, Internet Dialects Are More Complicated Than They Seem

How does the language used on social media platforms vary with geography?

Photo by Karen Bleier/AFP/Getty Images

We often talk about the Internet, and Twitter especially, as if it has its own dialect. Maybe you know the qualities I mean: abbreviations (smh, imo), a tone of ironic hyperbole (or perhaps ironic understatement—either way the common denominator is irony), a lot of in-jokes and quirky references. But is it true that there is a single, unified “netspeak”?

On one hand, it would make sense if the Internet did have a recognizable dialect. It is the kind of space—busy, noisy, fluid, built for swift and seamless communiqués—that pressurizes and evolves language. You can access expressive tools that only exist online (unless you are comfortable blurting “dancing girls emoji!” to signal cheerful solidarity in face-to-face conversation). But such a view also privileges a notion of the Web as totally discontinuous with the rest of our lives. If people from Seattle talk one way, and people from Washington, D.C., talk another way, shouldn’t those variations play out on Twitter?


New research suggests they do—that is, rather than collapsing online verbiage into undifferentiated “netspeak,” social media tends to reproduce the fault lines in regular spoken language. Physics arXiv Blog has the scoop on work by Jacob Eisenstein, a linguist at the Georgia Institute of Technology in Atlanta:

Eisenstein and co begin with a sample from the Twitter stream of 107 million geo-located messages from more than 2.7 million different user accounts in the US. They filtered from this dataset all the advertising and marketing messages and then associated each of the remaining users with one of the 200 largest metropolitan areas in the US.
They then listed the words most frequently mentioned and focused on the 2600 whose frequency changed significantly between 2009 [and] 2012. Since each appearance of a word is geo-located, this allowed the team to see how the change in usage varied from one metropolitan area to another.

What emerged was an archipelago of e-dialects that mirrored the geographical and cultural divisions of the physical country. For example, the abbreviation ikr (“I know, right?”) occurs six times as often in Detroit as in the rest of the United States; suttin (“something”) mostly occurs in New York City; the emoticon ^-^, which denotes shyness, occurs four times more frequently in Southern California, where a large Korean community may have propelled it into the lexicon. Furthermore, ion (a contraction of “I don’t,” as in “ion even know”) flared in popularity between 2009 and 2012, but only in the Southeastern states. (I’d imagine it also enjoys some currency on Science Twitter.) Af (“as fuck”) and ctfu (“cracking the fuck up”) have flourished in their respective south-Atlantic and north-Atlantic climes. For reasons mysterious, no one appears to use the affirmative rejoinder ard (“I’m going to visit the Liberty Bell.” “Ard.”) outside of Philadelphia.

The study also uncovered overlap among the dialects of far-flung cities. This, researchers wrote, came down to the racial makeup of the tweeting region. “Examples of linguistically linked city pairs that are geographically distant but demographically similar include Washington D.C. and New Orleans (high proportions of African-Americans), Los Angeles and Miami (high proportions of Hispanics), and Boston and Seattle (relatively few minorities, compared with other large cities),” they explained. In fact, the “proportion of African-Americans” is the single biggest predictor of similar usage online.

Does all this mean that Twitter actually has no dialect? That it is a balkanized puzzle of competing norms, most with roots that lie in some demographical and geographical elsewhere? It could—and that is how Eisenstein and team interpret their data: “Rather than moving toward a single unified ‘netspeak,’ ” they write, “language evolution in computer-mediated communication replicates existing fault lines in spoken American English.” But I’m not entirely convinced. Even if “netspeak” is not the kind of dialect to mandate your precise words, it could still amount to a diffuse code for practicing language online, one bound up in an equally nebulous web sensibility. It might involve usages that allow for brevity, such as abbreviations and acronyms, and deflate seriousness or arrogance. It might favor signpost terms like this, yes, or wut; or superlatives like best, worst, or top ten. I don’t know (idk) what it might mean! But between rich studies like Eisenstein’s and our indisputable and growing reliance on the Internet to talk to each other, perhaps we’re closer than ever to finding out.  

Future Tense is a partnership of SlateNew America, and Arizona State University.