At Kaggle, It’s a Disadvantage To Know Too Much

Stories from New Scientist.
Dec. 8 2012 7:45 AM

Specialist Knowledge Is Useless and Unhelpful

When data prediction is a game, the experts lose out.

Airplanes at an airport.
Airplanes at an airport.

iStockphoto/Thinkstock.

Jeremy Howard founded email company FastMail and the Optimal Decisions Group, which helps insurance companies set premiums. He is now president and chief scientist of Kaggle, which has turned data prediction into sport.

Peter Aldhous: Kaggle has been described as "an online marketplace for brains." Tell me about it.
Jeremy Howard: It's a website that hosts competitions for data prediction. We've run a whole bunch of amazing competitions. One asked competitors to develop algorithms to mark students' essays. One that finished recently challenged competitors to develop a gesture-learning system for the Microsoft Kinect. The idea was to show the controller a gesture just once, and the algorithm would recognize it in future. Another competition predicted the biological properties of small molecules being screened as potential drugs.

PA: How exactly do these competitions work?
JH: They rely on techniques like data mining and machine learning to predict future trends from current data. Companies, governments, and researchers present data sets and problems, and offer prize money for the best solutions. Anyone can enter: We have nearly 64,000 registered users. We've discovered that creative-data scientists can solve problems in every field better than experts in those fields can.

PA: These competitions deal with very specialized subjects. Do experts enter?
JH: Oh yes. Every time a new competition comes out, the experts say: "We've built a whole industry around this. We know the answers." And after a couple of weeks, they get blown out of the water.

Advertisement

PA: So who does well in the competitions?
JH: People who can just see what the data is actually telling them without being distracted by industry assumptions or specialist knowledge. Jason Tigg, who runs a pretty big hedge fund in London, has done well again and again. So has Xavier Conort, who runs a predictive analytics consultancy in Singapore.

PA: You were once on the leader board yourself. How did you get involved?
JH: It was a long and strange path. I majored in philosophy in Australia, worked in management consultancy for eight years, and then in 1999 I founded two start-ups—one an email company, the other helping insurers optimize risks and profits. By 2010, I had sold them both. I started learning Chinese and building amplifiers and speakers because I hadn't made anything with my hands. I travelled. But it wasn't intellectually challenging enough. Then, at a meeting of statistics users in Melbourne, somebody told me about Kaggle. I thought: "That looks intimidating and really interesting."

PA: How did your first competition go?
JH: Setting my expectations low, my goal was to not come last. But I actually won it. It was on forecasting tourist arrivals and departures at different destinations. By the time I went to the next statistics meeting I had won two out of the three competitions I entered. Anthony Goldbloom, the founder of Kaggle, was there. He said: "You're not Jeremy Howard, are you? We've never had anybody win two out of three competitions before."

PA: How did you become Kaggle's chief scientist?
JH: I offered to become an angel investor. But I just couldn't keep my hands off the business. I told Anthony that the site was running slowly and rewrote all the code from scratch. Then Anthony and I spent three months in America last year, trying to raise money. That was where things got really serious, because we raised $11 million. I had to move to San Francisco and commit to doing this full-time.

PA: Do you still compete?
JH: I am allowed to compete, but I can't win prizes. In practice, I've been too busy.

PA: What explains Kaggle's success in solving problems in predictive analytics?
JH: The competitive aspect is important. The more people who take part in these competitions, the better they get at predictive modeling. There is no other place in the world I'm aware of, outside professional sport, where you get such raw, harsh, unfettered feedback about how well you're doing. It's clear what's working and what's not. It's a kind of evolutionary process, accelerating the survival of the fittest, and we're watching it happen right in front of us. More and more, our top competitors are also teaming up with each other.

PA: Which statistical methods work best?
JH: One that crops up again and again is called the random forest. This takes multiple small random samples of the data and makes a "decision tree" for each one, which branches according to the questions asked about the data. Each tree, by itself, has little predictive power. But take an "average" of all of them and you end up with a powerful model. It's a totally black-box, brainless approach. You don't have to think—it just works.

PA: What separates the winners from the also-rans?
JH: The difference between the good participants and the bad is the information they feed to the algorithms. You have to decide what to abstract from the data. Winners of Kaggle competitions tend to be curious and creative people. They come up with a dozen totally new ways to think about the problem. The nice thing about algorithms like the random forest is that you can chuck as many crazy ideas at them as you like, and the algorithms figure out which ones work.

PA: That sounds very different from the traditional approach to building predictive models. How have experts reacted?
JH: The messages are uncomfortable for a lot of people. It's controversial because we're telling them: "Your decades of specialist knowledge are not only useless, they're actually unhelpful; your sophisticated techniques are worse than generic methods." It's difficult for people who are used to that old type of science. They spend so much time discussing whether an idea makes sense. They check the visualizations and noodle over it. That is all actively unhelpful.

PA: Is there any role for expert knowledge?
JH: Some kinds of experts are required early on, for when you're trying to work out what problem you're trying to solve. The expertise you need is strategy expertise in answering these questions.

PA: Can you see any downsides to the data-driven, black-box approach that dominates on Kaggle?
JH: Some people take the view that you don't end up with a richer understanding of the problem. But that's just not true: The algorithms tell you what's important and what's not. You might ask why those things are important, but I think that's less interesting. You end up with a predictive model that works. There's not too much to argue about there.

This article originally appeared in New Scientist.

TODAY IN SLATE

Politics

Talking White

Black people’s disdain for “proper English” and academic achievement is a myth.

Hong Kong’s Protesters Are Ridiculously Polite. That’s What Scares Beijing So Much.

The One Fact About Ebola That Should Calm You: It Spreads Slowly

Operation Backbone

How White Boy Rick, a legendary Detroit cocaine dealer, helped the FBI uncover brazen police corruption.

A Jaw-Dropping Political Ad Aimed at Young Women, Apparently

The XX Factor
Oct. 1 2014 4:05 PM Today in GOP Outreach to Women: You Broads Like Wedding Dresses, Right?
Music

How Even an Old Hipster Can Age Gracefully

On their new albums, Leonard Cohen, Robert Plant, and Loudon Wainwright III show three ways.

How Tattoo Parlors Became the Barber Shops of Hipster Neighborhoods

This Gargantuan Wind Farm in Wyoming Would Be the Hoover Dam of the 21st Century

Moneybox
Oct. 1 2014 8:34 AM This Gargantuan Wind Farm in Wyoming Would Be the Hoover Dam of the 21st Century To undertake a massively ambitious energy project, you don’t need the government anymore.
  News & Politics
Politics
Oct. 1 2014 7:26 PM Talking White Black people’s disdain for “proper English” and academic achievement is a myth.
  Business
Buy a Small Business
Oct. 1 2014 11:48 PM Inking the Deal Why tattoo parlors are a great small-business bet.
  Life
Dear Prudence
Oct. 2 2014 6:00 AM Can’t Stomach It I was shamed for getting gastric bypass surgery. Should I keep the procedure a secret?
  Double X
The XX Factor
Oct. 1 2014 5:11 PM Celebrity Feminist Identification Has Reached Peak Meaninglessness
  Slate Plus
Behind the Scenes
Oct. 1 2014 3:24 PM Revelry (and Business) at Mohonk Photos and highlights from Slate’s annual retreat.
  Arts
Brow Beat
Oct. 1 2014 9:39 PM Tom Cruise Dies Over and Over Again in This Edge of Tomorrow Supercut
  Technology
Future Tense
Oct. 1 2014 6:59 PM EU’s Next Digital Commissioner Thinks Keeping Nude Celeb Photos in the Cloud Is “Stupid”
  Health & Science
Science
Oct. 1 2014 4:03 PM Does the Earth Really Have a “Hum”? Yes, but probably not the one you’re thinking.
  Sports
Sports Nut
Oct. 1 2014 5:19 PM Bunt-a-Palooza! How bad was the Kansas City Royals’ bunt-all-the-time strategy in the American League wild-card game?