Blogging the Stanford Machine Learning Class

Stanford’s Machine Learning Class: Learning How To Fix Algorithms When Everything Goes Haywire
What's to come?
Nov. 22 2011 4:40 PM

Blogging the Stanford Machine Learning Class

VIEW ALL ENTRIES

Learning how to fix algorithms when everything goes haywire.

111122_FUTURE_CalculatorError_EX

Photo illustration by Chris Wilson. Photograph of a calculator by Spring days/Wikipedia.

After I described last week how machines learn by accepting “training sets”—real-world data they use to construct their algorithms—I got an email from Slate patron saint R.M. “Auros” Harman calling me out on an obvious flaw in this process.* For the past few weeks, the intro machine-learning course I’m taking online via Stanford has been teaching us students to build increasingly complex models that identify handwritten numerals, to the point that we’re more than 97 percent correct. However, Auros has an excellent point: If you’re testing your algorithm on the same data you used to train it, how could you possibly know if it’s any good? It would be like giving students an exact copy of their homework assignments as the final exam.

Say, for example, that you were trying to teach a machine to choose your fantasy baseball lineup from day to day or week to week. You’d have plenty of data from earlier in the season about matchups between hitters and pitchers, conditions under which players get on base frequently, and so forth. You could even work in things like the temperature on the field and which umpire crew was on duty. If you did this correctly, you could probably make yourself an algorithm that, had you been using it from the beginning of the season, would have maximized your score. I’m willing to bet, however, that once you started applying it to future games, against new teams with new strategies, everything would fall apart.

There are worse things in life than losing your fantasy baseball pool. But recall how, in 2008, an error in the Google News algorithm—which uses machine learning to categorize stories by topic—prominently displayed a six-year-old article about United Airlines going bankrupt, which led panicked investors to dump the stock, which dropped 75 percent in value. (Exactly whose fault this was still isn’t clear and involves a lot of technical arguments.) We trust algorithms to invest our money, plan our travel routes, and get us good information online. Which is why they ought to work.

Advertisement

Perhaps Auros and professor Andrew Ng are secretly in collusion, because this week’s class was devoted to how to fix your algorithms when they go haywire—like when a program predicts that a one-bedroom fixer-upper in Detroit should go for $2 million. For starters, you can essentially weaken your algorithm by giving less weight to the matrices, which might otherwise be too powerfully tuned to the quirks and irregularities in your training data. Or you can start multiplying data points together—perhaps the secret to housing prices is the number of bedrooms multiplied by the square footage.

I find it both surprising and disappointing how much guesswork goes into finding the right fit to teach a machine to model a real-life scenario. At first, it seems almost magical that an algorithm can predict housing prices without knowing the first thing about the real estate market. The same could be said of a machine-learning algorithm that predicts which movies you’ll like, based on how you’ve rated other movies. When the Netflix Prize was first announced—the idea was that the company would pay $1 million to whoever could beat the original prediction algorithm by 10 percent—I assumed the winner would be the person who had a groundbreaking instinct into the vagaries of human preference. (Maybe there is a cogent explanation for why Mean Girls and The Godfather are both in my top five.) But the winners took a much more computational approach.

What we haven’t learned yet is how to disassemble the algorithm once it works and extract some meaning from it. I don’t know if this is possible, particularly with complex neural networks with a half-dozen large matrices of numbers. I hope it is. It’s enormously exciting to be able to teach machines so easily on such a broad spectrum of topics. It would be nice if they could return the favor and teach us a few things back.

Grades: Still working on the homework, late again.

Correction, Nov. 24, 2011: This article originally misspelled the name of R.M. “Auros” Harman. (Return to the corrected sentence.)

Chris Wilson is a Slate contributor.

TODAY IN SLATE

Technocracy

Forget Oculus Rift

This $25 cardboard box turns your phone into an incredibly fun virtual reality experience.

The Congressional Republican Digging Through Scientists’ Grant Proposals

The 2014 Kansas City Royals Show the Value of Building a Mediocre Baseball Team

The GOP Won’t Win Any Black Votes With Its New “Willie Horton” Ad

Whole Foods Is Desperate for Customers to Feel Warm and Fuzzy Again

The XX Factor

I’m 25. I Have $250.03.

My doctors want me to freeze my eggs.

The XX Factor
Oct. 20 2014 6:17 PM I’m 25. I Have $250.03. My doctors want me to freeze my eggs.
Politics

Smash and Grab

Will competitive Senate contests in Kansas and South Dakota lead to more late-breaking races in future elections?

I Am 25. I Don’t Work at Facebook. My Doctors Want Me to Freeze My Eggs.

These Companies in Japan Are More Than 1,000 Years Old

  News & Politics
The World
Oct. 21 2014 11:40 AM The U.S. Has Spent $7 Billion Fighting the War on Drugs in Afghanistan. It Hasn’t Worked. 
  Business
Moneybox
Oct. 21 2014 1:12 PM The Global Millionaires Club Is Booming and Losing Its Exclusivity
  Life
The Vault
Oct. 21 2014 2:23 PM A Data-Packed Map of American Immigration in 1903
  Double X
The XX Factor
Oct. 21 2014 1:12 PM George Tiller’s Murderer Threatens Another Abortion Provider, Claims Right of Free Speech
  Slate Plus
Behind the Scenes
Oct. 21 2014 1:02 PM Where Are Slate Plus Members From? This Weird Cartogram Explains. A weird-looking cartogram of Slate Plus memberships by state.
  Arts
Brow Beat
Oct. 21 2014 1:47 PM The Best Way to Fry an Egg
  Technology
Technology
Oct. 21 2014 10:43 AM Social Networking Didn’t Start at Harvard It really began at a girls’ reform school.
  Health & Science
Climate Desk
Oct. 21 2014 11:53 AM Taking Research for Granted Texas Republican Lamar Smith continues his crusade against independence in science.
  Sports
Sports Nut
Oct. 20 2014 5:09 PM Keepaway, on Three. Ready—Break! On his record-breaking touchdown pass, Peyton Manning couldn’t even leave the celebration to chance.