Stanford machine learning class week 5: creating an automated journalist.

# Blogging the Stanford Machine Learning Class

Could I Use the Stanford Machine Learning Class To Create a Robot Journalist?

# Blogging the Stanford Machine Learning Class

Could I Use the Stanford Machine Learning Class To Create a Robot Journalist?
The citizen’s guide to the future.
Nov. 16 2011 3:42 PM
FROM SLATE, NEW AMERICA, AND ASU

# Blogging the Stanford Machine Learning Class

VIEW ALL ENTRIES

## Could I start using this stuff to create an automated journalist?

It occurred to me halfway through this week’s machine learning lectures that I could actually use the stuff we’re learning for something other than the homework assignments. I was reminded of a poster hanging in my seventh-grade math class titled “When will I ever need to know this stuff?” with a long list of professions and how they made use of trigonometry, algebra, fractions, and so forth—all of which was irrelevant to me at the time since I was headed to play for the Philadelphia Phillies.

Since I got stuck in journalism instead, I’ve been thinking about how machine learning might be of use to the field. We’re still on the subject of neural networks, which use a tremendous series of matrices and logarithms to simulate the way your brain learns. All of this would have been extremely useful when my friend Farhad Manjoo and I were trying to write a program called Robottke for Slate that would predict which Web pages popular blogger Jason Kottke would link to in a given day.

All the machine-learning algorithms we’ve covered in class so far have essentially worked the same way: First, you identify a set of real-world data that you’d like to “learn”—that is, simulate in a model. The operating example we’ve been using is housing prices, in which you have, say, 5,000 examples of houses with information like square footage, median income of the neighborhood, number of bathrooms, and so forth, as well as how much they sold for. There’s no limit to how complex you can get, up to hundreds or thousands of data points per house. This is known as your “training set”—the information you’ll feed to your algorithm so that it can slowly adjust its many parts—that is, “learn” the quirks of the housing market. At the end, you’ll be able to test its quality by running all these real-world examples through it and seeing how well it does at getting the right answer, or an answer close to the right one. If it does well, you hope you can trust it to predict prices for homes not yet on the market.