At a recent tech conference in Silicon Valley, an executive asked me whether I, as a futurist, did predictive modeling. His question was familiar. Most in the tech world buy into the hype that this type of big data analysis will unlock the future—hence all the startups proposing ventures “like Moneyball for industry X”—so many assume that a professional futurist like me is either a hardcore data miner or a statistics whiz. I tried to explain to him that my strategic foresight work, which involves consulting with companies and governments to imagine a variety of future scenarios, requires a wide range of both quantified (big data) and unquantified (unseen shifts in political climate or social norms) considerations. He soon focused his attention elsewhere, presumably to someone more eager to talk data.
It’s not just Silicon Valley that has become obsessed with the idea that we can feed big data into algorithms and expect evermore neatly projected futures. Corporations are trying to use these kinds of analytics to optimize supply chains, shipping routes, and labor. Retailers are using them to forecast sales and design targeted advertising. Lenders, schools, police departments, national security analysts, and more are getting in on the game, too. The acceleration of data collection—including browsing habits, location tracking, credit card swipes, energy consumption, social media connections, salaries, biometrics, and more—has only emboldened this thinking, fueling an obsession with the idea that the more figures you input, the more precise the prediction you get, beckoning us to believe that nearly everything can be broken down and quantified.
Yes, big data and predictive analytics can provide valuable insights. Used smartly, they’ve improved the accuracy of countless kinds of forecasts (weather, climate change, what we want to watch next, to name a few). But making sound, responsible predictions involves so much more than running the stats or having computers analyze huge record sets.
Our current obsession with data, unsurprisingly, seems to have arisen from a mix of real forecasting successes and big tech and pop culture’s fantastical depictions of it. One such portrayal that’s had outsized influence on our national analytics-obsessed psyche: the book-turned-movie Moneyball.
Moneyball, the wildly popular volume written by Michael Lewis in 2003, tells a romanticized version of how Oakland A’s General Manager Billy Beane and Assistant General Manager and economist Paul DePodesta built a playoff-bound team on a limited budget. How? By valuing statistics over instinct.
Throughout 20th-century pro baseball, the story goes, recruiters evaluated players based on subjective measures like whether a guy had a classic baseball body, could throw a good curve ball, or projected confidence. Front offices put too much emphasis on a few select data points in their decision-making—batting averages, home runs, runs batted in, to name a few.
Beane and DePodesta recognized this, so to game the system, they began to take advantage of sabermetrics—a kind of advanced statistical analysis that crunches data from player performance—to see hidden worth in undervalued players. Though the A’s were one of the poorest teams in the league, their ability to pick cheap but secretly talented athletes made them competitive with even the best-funded lineups (including the New York Yankees, who spent nearly three times the amount on their payroll in 2002), and one of the most successful MLB franchises in 2002 and 2003. The success of the “moneyball” method popularized the use of advanced analytics across professional baseball.
The Moneyball narrative became a verb with a life of its own. And it’s only become more popular as it’s converged with Silicon Valley’s promises that big data and artificial intelligence will provide transformative solutions to all sorts of problems—improving performance and making better predictions and decisions. Sabrematician Nate Silver went on to moneyball other sports and elections with his popular blog-turned-website FiveThirtyEight. Former White House appointees John Bridgeland and Peter Orszag issued a bipartisan call to moneyball government. Jared Kushner claimed he and the Trump campaign team played moneyball with the election by “asking ourselves which states would get the best ROI.” There’s been moneyball for startups, sales, higher ed, digital media, book publishers, health care, restaurants, Hollywood movies, music, lobbyists, professors, lawyers, human resources, criminal justice, and even mindfulness.
But this moneyball-ization assumes that all information is reliable information, algorithms are unbiased magic, and big data can also paint the big picture. The scenarios where this has already happened have become all too familiar. Take the 2016 election. Most of us put our faith in the forecasting numbers, charts, maps, and needles that told us Hillary Clinton would be in the White House now. It took for the dissonance we experienced late Nov. 8 to consider the sources behind those predictions. Cade Metz wrote in Wired that Trump’s win “wasn’t so much a failure of the data as it was a failure of the people using the data ... a failure of the willingness to believe too blindly in data, not to see it for how flawed it really is.” As Metz points out, many of the polls relied on surveys that, among other shortcomings, weren’t keeping up with Americans’ mass move from landlines to cellphones. The best deep neural network can’t forecast an election, he wrote, unless you give it good data to make its predictions.
It’s not just sports and politics. Employers are using personality tests to make hiring and promotion decisions, lenders are using big data and credit scores to predict future financial health, health care providers and insurers are using big data techniques to estimate patient risk, and schools are using predictive analytics to forecast whether students will succeed or fail.
None of these are inherently bad uses of data, but overreliance on and uncritical trust in them can easily go wrong—and have profound consequences if they do. For example, when ProPublica investigated a widely used predictive software program that claims to calculate the risk a defendant will commit a future crime, it found that it produced predictions that were not only remarkably unreliable, but also staggeringly stacked against black defendants. What’s more, its creators didn’t publicly disclose how its algorithms determined the risk scores, so judges and defendants couldn’t easily contest its conclusions, which are used to inform legal decisions like sentencing and parole. Yet because a machine produced the calculations, the software was often touted as a tool to reduce the human biases that have historically plagued the justice system.
So what can we do to correct this dangerous obsession? Well, part of the answer, fittingly, comes from better filling out the Moneyball story.
For one, as Lewis’ myth-making book documents, Beane and DePodesta’s success didn’t come because they were the first in the industry to use data. It came because they recognized that not all data was useful data. Instead of just looking at traditional stats like players’ batting averages or home runs, they weighted the more predictive ones like how often a player made it on base. (And how much this data really helped them win is a subject of contention.)
What’s more, many in the league had already been working for decades to improve the metrics collected and sought out in the game. In his book Big Data Baseball, sports writer Travis Sawchik explains how sabermetricians in the 1980s collected about 200,000 useful data points in a given season. That number grew to just under 1 million in 1990 and mushroomed to 20 million in 2007 with the help of the PITCHf/x tracker and other automated systems. It’s not being used blindly either. Baseball managers and scouts continue to refine how they use this torrent of information to make better decisions for their ball clubs and players. After all, there’s a lot of money on the line.
Yet another aspect of forecasting that Moneyball evangelists often miss is the enduring importance of subjective measures. “One of the great false dichotomies of baseball coverage today is ‘stats versus scouts.’ The claim that these two sides don’t or can’t get along was a key part of the mythology of the book Moneyball,” writes ESPN’s Keith Law in his book Smart Baseball. “The problem with the story is that it’s not true: scouts and analysts aren’t at odds, and just about every MLB front office now expects both departments to work together to improve their decision-making in all aspects of player acquisition, from trades to free agency to the draft.”
Like good futurists, scouts evaluate all the possible, probable, and preferable futures of a player. Yes, thanks to big data, these recruiters are better informed than ever. But it’s also their job to cross-check, challenge, weight, and look beyond the stats they’re given. How might unquantifiable characteristics like a player’s tenacity, leadership, work ethic, and temperament affect his individual performance? What about the team’s? What variables do we need to consider this season, and what do we need to take into account for seasons beyond?
Fortunately, MLB managers didn’t stop using scouts. That’s why we have players like the Houston Atros’ José Altuve, the five-time All-Star regarded as one of the best players in all of baseball. At 5-foot-6, he was cut from an earlier tryout in his native Venezuela for being too short. But at a subsequent evaluation, the recruiters looked past it. In a documentary about his career, Houston’s scouts, general manager, and coaches talk about how his work ethic made him the athlete he is today. Altuve’s not an outlier either. As an Oakland Press feature about the history of scouting chronicled, legendary Boston Red Sox recruiter Bill Lajoie selected the young David Ortiz over the statistically superior Robert Frick in 2002 after he scouted both players’ intangibles. As fans know, Ortiz became 2013’s World Series MVP and is considered a shoo-in Hall of Famer. The story also pointed to a similar choice the Detroit Tigers made in 2004 by selecting future Cy Young winner and league MVP Justin Verlander over the data-backed favorite Homer Bailey because they saw potential in the former’s tenacity.
Scouts are also useful where metrics aren’t yet available. For example, nearly one-third of the league’s players at the start of this season were international, including 93 from the Dominican Republic, 77 from Venezuela, and 23 from Cuba. But recruiters like the Pittsburgh Pirates’ director of Latin America scouting Rene Gayo don’t have access to the same numbers they would get if they were looking at a kid who played ball for Florida State University because they don’t exist. Instead of counting the Dominicans or Venezuelans (like Altuve) out, scouts rely on expertise and domain knowledge to make up for what data can’t tell them. It’s as true for things like hiring an employee or evaluating student potential as it is for drafting in sports.
Baseball managers have also been surprisingly forward-looking by making data accessible to everyone. The Pittsburgh Pirates, to take an example that Sawchik’s book, share the metrics managers and coaches use with the entire roster and make sure everyone understands how its collected, applied to make decisions, and can be used by players to improve individual and team performance. The Pirates know that hiring some savant statistician to run the numbers for them won’t automatically produce some sort of magical “moneyball effect.” But they do see the power of data as a tool that, when applied smartly and transparently, can help the whole team build better futures. The strategy paid off, catapulting the Pirates to the playoffs in 2013, 2014, and 2015. Think of if that same strategy were applied to criminal justice. Instead of biased, black-boxed risk algorithms used for sentencing, courts could instead transparently use big data and personal knowledge to help place offenders in the most effective recidivism-reduction programs (job training, short-term housing, mental health, substance treatment, etc.).
In many ways, tech today seems to be stuck where baseball was in the early 2000s: We’re collecting a lot of data, but we have yet to learn how best to use it and how to recognize what can’t be quantified. But now, the stakes are much higher, and the field much more complex, than anything Billy Beane ever faced.
Thinking back to the Silicon Valley executive who asked me what I did as a futurist, I should have told him I thought of myself, and others in my field, not as data scientists but like baseball scouts. Yes, we use the best data we can find to inform what’s next. But we also weigh how much that information can do for us and consider the myriad of unquantifiable factors that players bring to more fully imagine our possible, probable, and preferable futures. It makes a sounder, better story for everyone, in the next season, and beyond.