Twitter has made possible in recent years, for both linguists doing rigorous research and lay people just having fun, a lot of wonderful text-mining projects. The Twitter bot @haikuinyou, for example, was designed to find accidental haikus, and, perhaps more impressively, @pentametron searches for rhyming tweets in iambic pentameter and then fashions couplets out of them. Talk about found poetry!
And then there is found wordplay. I'm a big fan of @anagramatron, which discovers paired tweets that form serendipitous anagrams of each other. (Example: "Last time I do anything" ⇔ "That's it. I'm dying alone.") Now, courtesy of the lexicographer Jesse Sheidlower, comes @PangramTweets, in which each tweet contains every letter of the alphabet at least once.
Sheidlower explains the project on his site:
PangramTweets is a bot (a computer program that runs on its own) that searches Twitter for, and then retweets, pangrams—texts that contain every letter of the alphabet. A famous pangram, sometimes used as a typing test, is “The quick brown fox jumps over the lazy dog.” [...]
You may find the results interesting, or dull. I make no judgment on this. The bot is entirely automated; I do not curate the results.
The bot originally did not discriminate against known pangrams of the "quick brown fox" variety, but, by popular demand, Sheidlower put a filter in place for that, as well as for tweets in foreign languages. He says he gets "one real pangram in every few million tweets scanned."
It's unlikely that Vindu Goel, a technology reporter for the New York Times, was aware that this tweet of his, for instance, contained every letter of the alphabet:
It's official: Arthur Sulzberger names Dean Baquet executive editor of The New York Times, replacing Jill Abramson.— Vindu Goel (@vindugoel) May 14, 2014
Here are some others that have turned up so far:
I've just (with the help of google) realized I wrote about the wrong experiment in my 12 mark psychology question oops— s (@bricktop___) May 13, 2014
Looking for a new job is exhausting. Every one I want requires a bazillion years of experience I don't have. FML.— Ryan Stephens (@Integrity1stziB) May 16, 2014
Thanks JMM for boosting my boxing prediction confidence again. The Mayweather card did a number on a lot of boxing fans. #MarquezAlvarado— E.J.O. (@ElioOrtiz11) May 18, 2014
SHUT THE FUCK UP ABOUT THE “FRIENDZONE”. MAYBE YOU SHOULD JUST VALUE A WOMAN’S FRIENDSHIP AND QUIT EXPECTING THEM TO FUCK YOU. JESUS FUCK.— ・。。・゜☆゜・。。・ (@thedisconnectt) May 19, 2014
Maybe Joe needs to take some advice from Iceland and arrest the rich people who are stealing from the rest of us tax paying citizens. #qanda— Toby Owens (@TehMegaWiz) May 19, 2014
As I mentioned, Sheidlower is filtering out non-English tweets, however some Indonesian ones keep seeping through. Since I've done research on colloquial varieties of Indonesian, I find these especially fascinating. I was initially surprised that the Indonesian Twittersphere would be generating pangrams, considering that the letters Q, V, X, and Z appear only in loanwords. But Indonesian Twitterers are apparently using quite a lot of Anglicisms, along with many txtspk-style abbreviations of Indonesian words. An example that recently popped up:
@PutriAZSYA EXCITED BGT GRGR 1D MW K INDO. LBH EXCITED LG KLO JOIN LITTLEQUIZ @1D_CrazyLovers DAN BCA JG FFNY.PASTI LO MKIN EXCITED.CEK FAV6— winda (@windaameliasar1) May 20, 2014
The loanwords here are EXCITED, JOIN, and LITTLEQUIZ, and 1D refers to the band One Direction. Here's a key to the abbreviation-heavy Indonesian items:
BGT = banget 'very'
GRGR = gara-gara 'just because'
MW = mau 'will'
K = ke '(come) to'
INDO = Indonesia
LBH = lebih 'more'
LG = lagi '(even) more'
KLO = kalau 'if'
DAN = dan 'and'
BCA = baca 'read'
JG = juga 'also'
PASTI = pasti 'definitely'
LO = (e)lo 'you'
MKIN = makin 'more and more'
CEK = cek 'check'
So that would translate to: "@PutriAZSYA Very excited just because One Direction is coming to Indonesia. You'll be even more excited if you join LittleQuiz @1D_CrazyLovers, and also read FFNY. You'll definitely get more and more excited. Check Fav6."
It'll be interesting to see if the bot turns up a naturally occurring "pangrammatic window"—the number of consecutive letters over which a pangram plays out—that beats the current record-holder of 42 letters, which appears in Cube Route, a novel by the science-fiction and fantasy author Piers Anthony:
"We are all from Xanth," Cube said quickly. "Just visiting Phaze. We just want to find the dragon."
"… movie review of The Yards: Mark Wahlberg, Joaquin Phoenix, Charlize Theron…"
Of course, determining if a pangram is "naturally occurring"—which is to say, occurring by chance and not design—may be difficult, since it's always possible to game the system. But with half a billion tweeters tweeting, maybe someday one of them will authentically produce a 30-letter winner like "Benghazi quickly warped Fox's TV jam," which I just made up.
If it does happen, I'll definitely get more and more excited.
A version of this post appeared on Language Log.