Suppose someone has written an online restaurant review that says, “If you like not being able to hear a word your dinner companion says, then this is the place for you!” A computer analyzing that statement might conclude the review was positive, when it was anything but.
Sarcasm is a mainstay of social media communications, and it’s particularly hard for computers to detect. So, when government researchers, for example, want to find out what people are thinking about politically divisive issues, such as gay marriage or the legalization of marijuana, their computers can only parse the literal meaning of the words they find in online forums.
Marilyn Walker, professor of computer sciences and computational media at the University of California, Santa Cruz wants to change that. She’s working toward the day when a computer can distinguish between a literal comment and one that fairly drips with sarcasm.
Walker, who directs the university’s Natural Language and Dialogue Systems Lab, is collaborating with psychology professors Jean E. Fox Tree and Steve Whittaker and linguistics professor Pranav Anand on a three-year research initiative financed by the National Science Foundation. “We’re funded to develop computational models and tools that will allow a computer to better understand what people are saying in a dialogue,” Walker explains.
The sarcasm study is one form of sentiment analysis, in which computers sift through data to try and understand the emotions behind people’s words. Eventually Walker’s team hopes to write a computer program that can recognize sarcasm in social media, report a person’s positions on a particular topic and identify arguments on both sides of an issue. Walker’s team of researchers could contribute to future applications for both the U.S. government and the private sector in understanding people’s attitudes and emotions, as expressed in social media. Such technology could also eventually result in personal robots that understand human language more effectively and can respond appropriately.
“A lot of the tools for processing natural language data have been developed from newswriting from the Wall Street Journal. They don’t deal well with social language or dialogue,” Walker says. For example, someone opines: “I love being woken up in the morning by the garbage truck.” Humans know exactly what the writer means; computers would find it challenging.
Entertaining with Sarcasm
“In the data we collected, we found that 12 percent of the utterances are sarcastic,” says Walker. That’s because people are often trying to be entertaining on social media.
A case in point: Someone commented in an online forum: “The key issue is that once children are born they are not physically dependent on a particular individual.” A respondent quipped: “Really? Well, when I have a kid, I'll be sure to just leave it in the woods, since it can apparently care for itself.”
To build algorithms that can ferret out sarcasm, the University of California, Santa Cruz team is analyzing language patterns in user reviews, online forums and Twitter feeds. “Sarcasm is very creative and entertaining and relies on a sense of surprise,” Walker says. It also uses really long strings of adjectives. To help identify sarcasm they turned to real human beings on the crowdsourcing platform Amazon Mechanical Turk and engaged members of the crowd to judge whether or not statements were sarcastic.
Walker’s team found that the computer got better at finding sarcasm when the Twitter feeds were included. While Twitter’s short sentences aren’t the same as dialogues posted in online forums, many of the same patterns still occur. For example, sentences that begin “Of course”, or “I love”, often signal a sarcastic statement to follow, such as: “I love it when people talk about evolution and don’t know what they’re talking about.”
“It’s really fun because the language is really entertaining, and it’s a challenging problem,” says Walker. “We are still finding the more examples we have, the more new things we see all the time.”
And she’s not being the least bit sarcastic.