How cutting-edge text analytics can help the Obama campaign determine voters’ hopes and fears.
“Share your story,” Barack Obama’s Pennsylvania website encouraged voters just before the holidays, above a text field roomy enough for even one of the president’s own discursive answers. “Tell us why you want to be involved in this campaign,” read the instructions. “How has the work President Obama has done benefited you? Why are you once again standing for change?” In Obama’s world, this is almost a tic. His transition committee solicited “[a]n American moment: your story” on the occasion of his inauguration. The Democratic National Committee later asked people to “[s]hare your story about the economic crisis.” It’s easy to see where this approach fits into the culture of Obama’s politicking: His own career is founded on the value of personal narratives and much of his field staff takes inspiration from Marshall Ganz, the former labor tactician who famously built solidarity in his organizing sessions by asking participants to talk about their backgrounds. But might a presidential campaign have another use for tens of thousands of mini-memoirs?
That’s the central thrust of a project under way in Chicago known by the code name Dreamcatcher and led by Rayid Ghani, the man who has been named Obama’s “chief scientist.” Veterans of the 2008 campaign snicker at the new set of job titles, like Ghani’s, which have been conjured to describe roles on the re-election staff, suggesting that they sound better suited to corporate life than a political operation priding itself on a grassroots sensibility. Indeed, Ghani last held the chief-scientist title at Accenture Technology Labs, just across the Chicago River from Obama’s headquarters. It was there that he developed the expertise Obama’s campaign hopes can help them turn feel-good projects like “share your story” into a source of valuable data for sorting through the electorate.
At Accenture, Ghani mined the mountains of private data that collect on corporate consumer servers to find statistical patterns that could forecast the future. In one case, he developed a system to replace health insurers’ random audits by deploying an algorithm able to anticipate which of 50,000 daily claims are most likely to require individual attention. (Up to 30 percent of an insurer’s resources can be devoted to reprocessing claims.) To help set the terms of price insurance marketed to eBay sellers, Ghani developed a model to estimate the end-price for auctions, based on each sale item’s unique characteristics.
Often, Ghani found himself trying to help businesses find patterns in consumer behavior so that his clients could develop different strategies for different individuals. (In the corporate world, this is known as “CRM,” for customer-relationship management.) To help grocery stores design personalized sale promotions that would maximize total revenue, Ghani needed to understand how shoppers interacted with different products in relation to one another. The typical store had 60,000 products on its shelves, and Ghani coded each into one of 551 categories (like dog food, laundry detergent, orange juice) that allowed him to develop statistical models of how people build a shopping list and manage their baskets.
Ghani’s algorithms assigned shoppers scores to rate their individual propensities for particular behaviors, like the “opportunistic index” (“how ‘savvy’ the customer is about getting better prices than the rest of the population”), and to see whether they had distinctive habits (like “pantry-loading”) when faced with a price drop. If there was a two-for-one deal on a certain brand of orange juice, Ghani’s models could predict who would double their purchase, who would keep buying the same amount, and who would switch from grapefruit for the week.
But Ghani realized that customers didn’t see the supermarket as a collection of 551 product categories, or even 60,000 unique items. He points to the example of a 1-liter plastic jug of Tropicana Low Pulp Vitamin-D Fortified Orange Juice. To capture how that juice actually interacted with other products in a shopper’s basket, Ghani knew the product needed to be seen more as just an item in the “orange juice” category. So he reduced it to a series of attributes—Brand: Tropicana, Pulp: low, Fortified with: Vitamin-D, Size: 1 liter, Bottle type: plastic —that could be weighed by the algorithms. Now a retailer’s models could get closer to calculating shopping decisions as customers actually made them. A sale on low-pulp Tropicana might lure people who usually purchased a pulpier juice, but would Florida’s Natural drinkers shift to a rival brand? Would a two-for-one deal get those who typically looked for their juice in a carton to stock up on plastic?
The challenge was, in essence, semantic: teaching computers to decode complex product descriptions and isolate their essential attributes. For another client, Ghani, along with four Accenture colleagues and a Carnegie Mellon computer scientist, used a Web crawler to pull product names and descriptions from online clothes stores and built an algorithm that could assess products based on eight different attributes, including “age group,” “formality,” “price point,” and “degree of sportiness.” Once the products had been assigned values in each of those categories, they could be manipulated numerically—the same way that Ghani’s predictive models had tried to make sense of the grocery shopping list. By reducing it to its basic attributes—lightweight mesh nylon material, low profile sole, standard lacing system—a retailer could predict sales for shoes it had never sold before by comparing them to ones it had.
Ghani’s clients in the corporate world were companies that “analyze large amounts of transactional data but are unable to systematically ‘understand’ their products,” as his team wrote. Political campaigns struggle with much the same problem. In 2008, Obama’s campaign successfully hoarded hard data available from large commercial databases, voter files, boutique lists, and an unprecedented quantity of voter interviews it regularly conducted using paid phone banks and volunteer canvassers. Obama’s analysts used the data to build sophisticated statistical models that allowed them to sort voters by their relative likelihoods of supporting Obama (and of voting at all). The algorithms could also be programmed to predict views on particular issues, and Obama’s targeters developed a few flags that predicted binary positions on discrete, sensitive topics—like whether someone was likely pro-choice or pro-life.
Sasha Issenberg is the author of The Victory Lab about the new science of political campaigns.