Blogging the Stanford Machine Learning Class

Stanford’s Machine Learning Class: Learning How To Fix Algorithms When Everything Goes Haywire
What's to come?
Nov. 22 2011 4:40 PM

Blogging the Stanford Machine Learning Class

VIEW ALL ENTRIES

Learning how to fix algorithms when everything goes haywire.

After I described last week how machines learn by accepting “training sets”—real-world data they use to construct their algorithms—I got an email from Slate patron saint R.M. “Auros” Harman calling me out on an obvious flaw in this process.* For the past few weeks, the intro machine-learning course I’m taking online via Stanford has been teaching us students to build increasingly complex models that identify handwritten numerals, to the point that we’re more than 97 percent correct. However, Auros has an excellent point: If you’re testing your algorithm on the same data you used to train it, how could you possibly know if it’s any good? It would be like giving students an exact copy of their homework assignments as the final exam.

Say, for example, that you were trying to teach a machine to choose your fantasy baseball lineup from day to day or week to week. You’d have plenty of data from earlier in the season about matchups between hitters and pitchers, conditions under which players get on base frequently, and so forth. You could even work in things like the temperature on the field and which umpire crew was on duty. If you did this correctly, you could probably make yourself an algorithm that, had you been using it from the beginning of the season, would have maximized your score. I’m willing to bet, however, that once you started applying it to future games, against new teams with new strategies, everything would fall apart.

There are worse things in life than losing your fantasy baseball pool. But recall how, in 2008, an error in the Google News algorithm—which uses machine learning to categorize stories by topic—prominently displayed a six-year-old article about United Airlines going bankrupt, which led panicked investors to dump the stock, which dropped 75 percent in value. (Exactly whose fault this was still isn’t clear and involves a lot of technical arguments.) We trust algorithms to invest our money, plan our travel routes, and get us good information online. Which is why they ought to work.

Perhaps Auros and professor Andrew Ng are secretly in collusion, because this week’s class was devoted to how to fix your algorithms when they go haywire—like when a program predicts that a one-bedroom fixer-upper in Detroit should go for \$2 million. For starters, you can essentially weaken your algorithm by giving less weight to the matrices, which might otherwise be too powerfully tuned to the quirks and irregularities in your training data. Or you can start multiplying data points together—perhaps the secret to housing prices is the number of bedrooms multiplied by the square footage.

I find it both surprising and disappointing how much guesswork goes into finding the right fit to teach a machine to model a real-life scenario. At first, it seems almost magical that an algorithm can predict housing prices without knowing the first thing about the real estate market. The same could be said of a machine-learning algorithm that predicts which movies you’ll like, based on how you’ve rated other movies. When the Netflix Prize was first announced—the idea was that the company would pay \$1 million to whoever could beat the original prediction algorithm by 10 percent—I assumed the winner would be the person who had a groundbreaking instinct into the vagaries of human preference. (Maybe there is a cogent explanation for why Mean Girls and The Godfather are both in my top five.) But the winners took a much more computational approach.

What we haven’t learned yet is how to disassemble the algorithm once it works and extract some meaning from it. I don’t know if this is possible, particularly with complex neural networks with a half-dozen large matrices of numbers. I hope it is. It’s enormously exciting to be able to teach machines so easily on such a broad spectrum of topics. It would be nice if they could return the favor and teach us a few things back.

Grades: Still working on the homework, late again.

Correction, Nov. 24, 2011: This article originally misspelled the name of R.M. “Auros” Harman. (Return to the corrected sentence.)

Politics

Meet the New Bosses

How the Republicans would run the Senate.

Cheez-Its. Ritz. Triscuits.

Why all cracker names sound alike.

The Eye

Medical Examiner

Did America Get Fat by Drinking Diet Soda?

A high-profile study points the finger at artificial sweeteners.

A Futurama Writer on How the Vietnam War Shaped the Series

News & Politics
Photography
Sept. 21 2014 11:34 PM People’s Climate March in Photos Hundreds of thousands of marchers took to the streets of NYC in the largest climate rally in history.
Sept. 20 2014 6:30 AM The Man Making Bill Gates Richer
Life
Quora
Sept. 20 2014 7:27 AM How Do Plants Grow Aboard the International Space Station?
Double X
The XX Factor
Sept. 19 2014 4:58 PM Steubenville Gets the Lifetime Treatment (And a Cheerleader Erupts Into Flames)
Slate Plus
Tv Club
Sept. 21 2014 1:15 PM The Slate Doctor Who Podcast: Episode 5  A spoiler-filled discussion of "Time Heist."
Arts
Television
Sept. 21 2014 9:00 PM Attractive People Being Funny While Doing Amusing and Sometimes Romantic Things Don’t dismiss it. Friends was a truly great show.
Technology
Future Tense
Sept. 21 2014 11:38 PM “Welcome to the War of Tomorrow” How Futurama’s writers depicted asymmetrical warfare.
Health & Science
The Good Word
Sept. 21 2014 11:44 PM Does This Name Make Me Sound High-Fat? Why it just seems so right to call a cracker “Cheez-It.”
Sports
Sports Nut
Sept. 18 2014 11:42 AM Grandmaster Clash One of the most amazing feats in chess history just happened, and no one noticed.