Netflix prize: learning how to fix algorithms when everything goes haywire.

# Blogging the Stanford Machine Learning Class

Stanford’s Machine Learning Class: Learning How To Fix Algorithms When Everything Goes Haywire

# Blogging the Stanford Machine Learning Class

Stanford’s Machine Learning Class: Learning How To Fix Algorithms When Everything Goes Haywire
The citizen’s guide to the future.
Nov. 22 2011 4:40 PM
FROM SLATE, NEW AMERICA, AND ASU

# Blogging the Stanford Machine Learning Class

VIEW ALL ENTRIES

## Learning how to fix algorithms when everything goes haywire.

After I described last week how machines learn by accepting “training sets”—real-world data they use to construct their algorithms—I got an email from Slate patron saint R.M. “Auros” Harman calling me out on an obvious flaw in this process.* For the past few weeks, the intro machine-learning course I’m taking online via Stanford has been teaching us students to build increasingly complex models that identify handwritten numerals, to the point that we’re more than 97 percent correct. However, Auros has an excellent point: If you’re testing your algorithm on the same data you used to train it, how could you possibly know if it’s any good? It would be like giving students an exact copy of their homework assignments as the final exam.

Say, for example, that you were trying to teach a machine to choose your fantasy baseball lineup from day to day or week to week. You’d have plenty of data from earlier in the season about matchups between hitters and pitchers, conditions under which players get on base frequently, and so forth. You could even work in things like the temperature on the field and which umpire crew was on duty. If you did this correctly, you could probably make yourself an algorithm that, had you been using it from the beginning of the season, would have maximized your score. I’m willing to bet, however, that once you started applying it to future games, against new teams with new strategies, everything would fall apart.

There are worse things in life than losing your fantasy baseball pool. But recall how, in 2008, an error in the Google News algorithm—which uses machine learning to categorize stories by topic—prominently displayed a six-year-old article about United Airlines going bankrupt, which led panicked investors to dump the stock, which dropped 75 percent in value. (Exactly whose fault this was still isn’t clear and involves a lot of technical arguments.) We trust algorithms to invest our money, plan our travel routes, and get us good information online. Which is why they ought to work.