Predictions are hard—especially about the future. It must have taken superhuman will for New York Times FiveThirtyEight blogger and columnist Nate Silver to avoid quoting Yogi Berra in the course of writing his engaging and sophisticated new book, The Signal and the Noise, especially because the line is so directly on point. The essential problem of prediction is that while forecasts are “about” the future, the data on which they’re based are generally data about the past. Modern technology makes it easy for any moderately trained person to throw a bunch of data points into a computer program and spit out a model that “explains” the data. Yet just because you can “predict” the past doesn’t mean you can predict the future. To take one example: Since World War I, no Democrat had won the White House without carrying West Virginia—until, in 2008, Obama won big nationally without coming close in coal country.
So what’s the difference between good models and bad ones?
In the course of this entertaining popularization of a subject that scares many people off, the signal of Silver’s own thesis tends to get a bit lost in the noise of storytelling. The asides and digressions are sometimes delightful, as in a chapter about the author’s brief adventures as a professional poker player, and sometimes annoying, as in some half-baked musings on the politics of climate change. But they distract from Silver’s core point: For all that modern technology has enhanced our computational abilities, there are still an awful lot of ways for predictions to go wrong thanks to bad incentives and bad methods.
Good forecasters are meticulous, open-minded, eager for more data, and rigorous in checking their ideas. You want foxes, in Isaiah Berlin’s terms, rather than hedgehogs who simply assimilate new information into a strongly held big idea. Ideologues do a poor job of making political forecasts, presumably for reasons of bias. It’s easier to make good predictions when you have large samples of solid data, as in baseball, than when forced to deal with sketchy information or small samples. Forecasts may even be deliberately biased: The National Weather Service is pretty good at short-term weather predictions, but local TV newscasts deliberately and systematically overstate the chances of rain. This “wet bias” occurs because the audience is more upset when they’re caught in an unexpected shower without their umbrella than when predicted rain fails to materialize.
But while the factoids about best practices for gambling on NBA games are amusing, the argumentative core of the book is on the drier subject of the controversy between Bayesian and frequentist approaches to statistics. Don’t run away! The distinction is easy to master and can spare you some undue panic in real life.
Silver’s crucial point here is that growing technological sophistication is threatening to bury the world in the pseudo-sophistication of 95 percent confidence intervals and r-squared values. Silver illustrates by supposing that you find another woman’s panties in your dresser drawer. Viewed in isolation, this is damning evidence pointing strongly toward infidelity on the part of your husband. The Bayesian point is that this evidence has to be weighed in light of our prior understanding of the situation. Silver estimates that if your husband is cheating, then there’s perhaps a 50 percent chance of his lover’s underpants ending up in your drawer. If he’s not cheating, then the odds are much lower—say, 5 percent.
But how common is cheating in general? Silver notes studies that show that in any given year, about 4 percent of married partners cheat. Bayes’ theorem says we need to update our old estimate (x), in light of our new evidence (y and z), through the formula
xy + z(1-x)
So the odds that your husband is cheating are 29 percent—in light of the damning panties, much higher than the 4 percent of all spouses, but still well below 50.
More broadly, forecasts are hampered by “overfitting,” “the act of mistaking noise for signal.” Given a series of data points, an analyst can choose between a number of different formulae that “fit” the information on hand. Given the increasing ease with which complicated calculations can be undertaken, the temptation exists to devise elaborate models that fit the data very closely. After all, the more elaborate the model, the more impressive your abstract. But as Silver points out, this often results in over-emphasizing random fluctuations, leading to horrible predictions. In other words, we need some kind of underlying theory to guide our forecast, with the data increasing or decreasing our confidence. The level and sources of variation in the earth’s climate, for example, are so great that “there would be much reason to doubt claims about global warming were it not for their grounding in causality.” The case is persuasive in light of the scientific basis for believing in a greenhouse effect, but simply pulling temperature readings can lead to mistakes like the hype in the media (though not the scientific community) about “global cooling” in the 1970s.
Silver is particularly unimpressed with the performance of forecasters in my field of economics who, he notes, “have for a long time been much too confident in their ability to predict the direction of the economy.” On some level, this is a bit unfair. Economists have a hard time making accurate forecasts in part because to do it correctly you’d need to be able to forecast political outcomes: What will Mario Draghi say? Who will win the power struggle in the Japanese parliament? Salient events like financial crises are particularly hard to predict since if we could predict them reliably they wouldn’t happen. He goes easy on scientists’ inability to predict earthquakes reliably on the grounds that the task is genuinely difficult, and economists deserve some of the same forbearance.
But the failure to apply sound Bayesian methods is also a real problem. Somewhat counterintuitively, modern-day macroeconomists know a great deal about math and rather less about the economy (it’s hard, see above). But humility ill-suits the desire to publish exciting papers and get ahead. So a high premium is placed on what amount to sophisticated data-mining techniques. You can build elaborate models showing that past recessions can be accounted for by “shocks” to technology or people’s desire to work hard. This “explains” the observed fluctuations in the business cycle in a mathematical sense, but it should be obvious that it doesn’t actually explain anything. It’s no coincidence that such methods are completely useless in producing policy-relevant forecasts, even as their fans are quite adept at continually fitting new events into the model.