Data provenance is another key facet of a transparency policy. What’s the quality of the data feeding the bot? In the example of the Quake Bot, the data is relatively clean, since it’s delivered by a government agency, the United States Geological Survey. But the Wikipedia Live Monitor is operating off of entirely noisy (and potentially manipulable) social signals about online editing activity.
Imagine a hacker who knows how an automated news bot consumes data to produce news content. That hacker could infiltrate the bot’s data source, pollute it, and possibly spread misinformation as that data is converted into consumable media. I wouldn’t want my hedge fund trading on bot-generated tweets.
Luckily for us, there are already some excellent examples of accountable bots out there. One bot in particular, the New York Times’ 4th Down Bot, is exemplary in its transparency. The bot uses a model built on data collected from NFL games going back to the year 2000. For every fourth down in a game, it uses that model to decide whether the coach should ideally “go for it,” “punt,” or “go for a field goal.” The bot’s creators, Brian Burke and Kevin Quealy, endowed it with an attitude and some guff, so it’s entertaining when it tweets out what it thinks the coach should do.
4th and 1 for the 49ers near the Seahawks 41. This is one place where coaches do tend to go for it without being prodded by a bot.-- NYT 4th Down Bot (@NYT4thDownBot) January 19, 2014
Burke and Quealy do a deft job of explaining how the bot defines its world. It pays attention to the yard line on the fourth down as well as how many minutes are left in the game. Those are the inputs to the algorithm. It also defines two criteria that inform its predictions: expected points and win percentage. The model’s limitations are clearly delineated—it can’t do overtime properly for instance. And Burke and Quealy explain the bias of the bot, too: With its data-driven bravado, it’s less conservative than the average NFL coach.
Two things that the bot could be more transparent about are its uncertainty—how sure it is in its recommendations—and its accuracy. Right now the bot essentially just says, “Here’s my prediction from the data”—there’s no real assessment of how it’s doing overall for the season. Bots need to learn to explain themselves: not just what they know, but how they know it.
Others working on algorithmic transparency—in newsrooms, elsewhere in media, or even in government—might use this as a first-class case study. Visualize the model. Put it in context. Explain your definitions, data, heuristics, assumptions, and limitations—but also don’t forget to build trust by providing accuracy and uncertainty information.
The cliché phrase “you’re only human” is often invoked to cover for our foibles, mistakes, and misgivings as human beings, to make us feel better when we flub up. And human journalists certainly make plenty of those. Craig Silverman over at Poynter writes an entire column called Regret the Error about journalists’ mistakes.
But bots aren’t perfect, either. Every robot reporter needs an editor, and probably the binary equivalent of a journalism ethics course, too. We’d be smart to remember this as we build the next generation of our automated information platforms.
This article is part of Future Tense, a collaboration among Arizona State University, the New America Foundation, and Slate. Future Tense explores the ways emerging technologies affect society, policy, and culture. To read more, visit the Future Tense blog and the Future Tense home page. You can also follow us on Twitter.