In Artificial Intelligence Breakthrough, Google Computers Teach Themselves To Spot Cats on YouTube

The Citizen's Guide to the Future
June 27 2012 9:34 AM

In Artificial Intelligence Breakthrough, Google Computers Teach Themselves To Spot Cats on YouTube

72144845
Google's computers quickly concluded that cats' faces were among the more important features to be able to recognize when watching YouTube.

Photo by Timothy A. Clary/AFP/Getty Images

Working at the secretive Google X lab, researchers from Google and Stanford connected 1,000 computers, turned them loose on 10 million YouTube stills for three days, and watched as they learned to identify cat faces.

Will Oremus Will Oremus

Will Oremus is Slate's senior technology writer.

The research, thus summarized, is good for a laugh. “Perhaps this is not precisely what Turing had in mind,” wrote The Atlantic’s Alexis Madrigal. Sure it was, countered The Cato Institute’s Julian Sanchez: Google was training its computers to pass the “Purring Test.”

Advertisement

But what’s most fascinating about the study is that the researchers didn’t actually tell the computers to look for cat faces. The machines started doing that on their own.

The paper’s actual title, you see, has nothing to do with felines, or YouTube for that matter. It’s called “Building High-level Features Using Large Scale Unsupervised Learning,” and what it’s really about is the ability of computer networks to learn what’s meaningful in images—without humans’ help.

When an untutored computer looks at an image, all it sees are thousands of pixels of various colors. With practice and supervision, it can be trained to home in on certain features—say, those that tend to indicate the presence of a human face in a photo—and reliably identify them when they appear. But such training typically requires images that are labeled, so that the computer can tell whether it guessed right or wrong and refine its concept of a human face accordingly. That’s called supervised learning.

The problem is that most data in the real world doesn’t come in such neat categories. So in this study, the YouTube stills were unlabeled, and the computers weren’t told what they were supposed to be looking for. They had to teach themselves what parts of any given photo might be relevant based solely on patterns in the data. That’s called unsupervised learning.

They were to develop these concepts using artificial neural networks—a system of distributed information processing analogous to that of the human brain. The goal was to see if Google’s computers could mimic some of the functionality of humans’ visual cortex, which has evolved to be expert at recognizing the patterns that matter most to us (such as faces and facial expressions).

In fact, Google’s machines did home in on human faces as one of the more relevant features in the data set. They also developed the concepts of cat faces and human bodies—not because they were instructed to, but merely because the arrangement of pixels in image after image suggested that those features might be in some way important.

Google engineering ace Jeff Dean, who helped oversee the project, tells me he was surprised by how well the network accomplished this. In past unsupervised learning tests, machines have managed to attach importance to lower-level features like the edges of an object, but not more abstract features like faces (or cats).

It might seem surprising that this type of pattern recognition should be so difficult. After all, a three-year-old can do it. But for one thing, the neural networks in a three-year-old’s brain contain far more connections than even Google’s massive set-up. (“How Many Computers to Identify a Cat? 16,000” was the New York Times’ headline. When I spoke with Dean, he politely pointed out that it was only 1,000 computers, with a combined 16,000 cores, but either way it’s a lot.)

Secondly, humans by age three are already equipped with specialized tools for recognizing faces. Part of the point of the experiment was to study how such tools might develop in infants’ brains in the absence of feedback or supervision.

For all their successes, it’s worth noting that Google’s computers also fell far short of humans in several respects. After unsupervised learning followed by a period of supervised training, they picked out human faces with 82 percent accuracy. But their accuracy on a broad range of features that humans consider relevant was a far more humble 16.7 percent. 

Meanwhile, Dean notes that the computers “learned” a slew of concepts that have little meaning to humans. For instance, they became intrigued by “tool-like objects oriented at 30 degrees,” including spatulas and needle-nose pliers.

For Dean, the big takeaway was not that computers have achieved human-like visual processing skills, but that it’s possible that they will someday in the not-too-distant future. Why does he think that’s the case? Because Google’s experiment shows that having more processing power and more data makes a difference—and as time passes, we’ll only have more of both.

Future Tense is a partnership of SlateNew America, and Arizona State University.

TODAY IN SLATE

War Stories

The Right Target

Why Obama’s airstrikes against ISIS may be more effective than people expect.

The NFL Has No Business Punishing Players for Off-Field Conduct. Leave That to the Teams.

Meet the Allies the U.S. Won’t Admit It Needs in Its Fight Against ISIS

I Stand With Emma Watson on Women’s Rights

Even though I know I’m going to get flak for it.

Should You Recline Your Seat? Two Economists Weigh In.

Medical Examiner

How to Stop Ebola

Survivors might be immune. Let’s recruit them to care for the infected.

History

America in Africa

The tragic, misunderstood history of Liberia—and why the United States has a special obligation to help it fight the Ebola epidemic.

New GOP Claim: Hillary Clinton’s Wealth and Celebrity Are Tricks to Disguise Her Socialism

Why the Byzantine Hiring Process at Universities Drives Academics Batty

Moneybox
Sept. 23 2014 3:29 PM The Fascinating Origins of Savannah, Georgia’s Distinctive Typeface
  News & Politics
History
Sept. 23 2014 11:45 PM America in Africa The tragic, misunderstood history of Liberia—and why the United States has a special obligation to help it fight the Ebola epidemic.
  Business
Moneybox
Sept. 23 2014 2:08 PM Home Depot’s Former Lead Security Engineer Had a Legacy of Sabotage
  Life
Education
Sept. 23 2014 11:45 PM Why Your Cousin With a Ph.D. Is a Basket Case  Understanding the Byzantine hiring process that drives academics up the wall.
  Double X
The XX Factor
Sept. 23 2014 2:32 PM Politico Asks: Why Is Gabby Giffords So “Ruthless” on Gun Control?
  Slate Plus
Political Gabfest
Sept. 23 2014 3:04 PM Chicago Gabfest How to get your tickets before anyone else.
  Arts
Brow Beat
Sept. 23 2014 8:38 PM “No One in This World” Is One of Kutiman’s Best, Most Impressive Songs
  Technology
Future Tense
Sept. 23 2014 5:36 PM This Climate Change Poem Moved World Leaders to Tears Today
  Health & Science
Medical Examiner
Sept. 23 2014 11:37 PM How to Stop Ebola Could survivors safely care for the infected?
  Sports
Sports Nut
Sept. 23 2014 7:27 PM You’re Fired, Roger Goodell If the commissioner gets the ax, the NFL would still need a better justice system. What would that look like?