Blogging the Stanford Machine Learning Class

## What Richard Scarry and computer science have in common.

A few months ago, Stanford University announced that it would make three of its most popular computer science classes available online: one on artificial intelligence, one on machine learning, and one on database software. When I heard the news, I did two very foolish things: I signed up for the advanced track of the second one, and I promised my editor, Torie, that I'd write an entry after each class.

My original intention was to make a machine learn to blog so that I wouldn't have to do this for very long. But we actually tried that recently at ** Slate**, and it proved to be more difficult than you might imagine.

This is not going to be a technical series. Machine learning is an enormously import area of research, and something I think anyone can understand without getting mixed up in “batch gradient descent algorithms” and “double-variable linear regressions.” Among the various working definitions of machine learning, which like most disciplines has no agreed-upon description, is this: “A field of study that gives computers the ability to learn without being explicitly programmed.” I might put it like this: “Programming computers to understand patterns of behavior better than their human overlords.” The go-to example is Google or Bing search results—all the information that users transmit by clicking on certain links, phrasing their queries in certain ways, goes toward improving the quality of the results you get back.

Stanford is doing a great service by offering this class online, and part of my goal is to see how feasible this massively distributed learning format is. So far, I haven’t had too many problems. The video lectures mainly consist of slides that the professor, Andrew Ng, marks up as he talks with some sort of magic stylus that he can change colors at will. As for the homework, I may post some of the math and computer code as addenda for anyone who cares to look, but otherwise you can put the graph paper away. (Programming note: I'm going to post a week after each class, so as not to violate the honor policy. Not that would I advise anyone to copy my work.) If I pass, I believe I get a certificate.

I'll admit I'm nervous. I gave up math and computer science in college because it was the one thing harder than writing—most of the time. (I've since started coding again for ** Slate Labs**, so we'll see how far that gets me.) At least I have this going for me: If I drop out, what is Andrew Ng gonna do to me?

The great joy of learning new concepts in math and science is that they transport you into a simplified world, in which only a few things govern our lives and all problems can be solved—basically, a Richard Scarry book for the matriculating crowd. When a car of mass *m* rolls down a hill of incline *x*, then hits a brick wall *y* feet later, no one has to worry about insurance premium adjustments or whether you were supposed to be driving your dad's car in the first place. In the first few lectures in this class, we often returned to a data set of houses and how much they should sell for based on square footage. Crime rates, quality of the local schools, proximity to a Whole Foods—we ignored these factors completely.

Of course, not *everything* is perfect in our nerdy Eden. The housing data are a little messy—while bigger houses tend to sell for more, the data points are scattered across the graph, and it looks like some people got better deals than others. So we need to make an algorithm—in this case, a straight line called a linear regression—to model housing prices. I’m going to violate my conviction not to muddle this with graphs and equations, just so we’re on the same page.

This probably sounds more like high-school statistics than machine learning—I think we might have even touched on this in Mrs. Burke's Algebra II class at Charlottesville High School, though probably with an assist from our TI-83 graphing calculators. In fact, the best way to find the ideal line is to teach a computer to gradually hone in on the perfect fit. This is what’s called a “supervised” machine-learning problem—one where we know what we want the computer to find.

Which means, delightfully, that there is such a thing as an “unsupervised” learning problems, in which computers are set lose to figure out the problem and solve it themselves. I find this tremendously exciting, and think it's a pity that machine-learning specialists aren't better marketers. Those weenies in the artificial intelligence class shouldn't have all the sex appeal of computer programming to themselves. I therefore propose to professor Ng that we rename this class “computer prescience” or “clairvoyant coding.” Because that's the coolest thing machine learning can accomplish, in my experience (of one week): It can make machines that solve problems we didn't know we had.

* * *

So far, the math in this class has ranged from third-grade arithmetic to multi-variable calculus and linear algebra. This is fortunate for me, because the latter two were the last two math classes I took in college, and I also graduated from the third grade. (Professor Ng reassures us that, if some of the math is over our heads for now, we’ll still be able to do the homework.)

With these housing data, we're going to draw a line—or simple “model” of housing prices—that predicts the price of a house for any given square footage. Our goal is to draw a line that comes as close to every real data point as possible. There's a simple way to figure out how well you did: For each data point—one real house with a certain square footage sold for a certain price—you look at how closely your model predicted the real value. This formula, after some squaring, adding, dividing, and so forth, gives you a number that represents the accuracy of your model. The lower the number the better, since it means there was the smallest difference between theory and reality.

Here's where machine learning enters the picture: We could draw lines all day, laboriously calculate their accuracy, and probably never arrive at the best possible answer. But we can teach a computer to do it very easily. Next week, I’ll introduce how you do that, once I know. But I can already tell you that I have decided to call it, unofficially, the “Lazy Hiker Principle.”

* * *

**Grades for Week 1:**

Quiz 1: Introductory principles: 5/5!

Quiz 2: Linear regression with one variable: 4.75/5. (My fault, misread an exponent.)

Quiz 3: Linear Algebra: 5/5 .

I suspect they get harder. For now, screenshots of the first and third grades have already been mailed to my mother, for the fridge.

Confused? Have something to add (or subtract)? Email me.