Computer Learns to Distinguish Paris From London (But Can’t Tell U.S. Cities Apart)

Aug 08, 201212:00 PM

Paris and London Google Street View images — Machine-learning software can sift through Google Street View images to pick out cities’ unique visual signatures, such as the cast-iron balconies and distinctive street signs of Paris (above) and the columned entryways and stone balconies of London (below).

Courtesy of Carnegie Mellon University

What makes Paris look like Paris? For most people it’s a certain, well, je ne sais quoi. But for a machine-learning program developed by researchers at Carnegie Mellon, it’s third-floor cast-iron balcony railings, window balustrades, the shape of the streetlamps, and the molding on the wooden doors.

The software analyzed 40,000 Google Street View images from 12 major cities to pick out the visual signatures that most clearly distinguish each one. London buildings, it noticed, have a lot of neoclassical entryways with columns on either side. Like Paris, they often have balcony railings, but they’re usually made of stone and tend to be on the second floor.

For San Francisco, the software zoomed in on bay windows, and for New York, fire escapes. But in general, it struggled to tell U.S. cities apart, resorting to the brands of road tunnels and cars on the street to try to make distinctions. “This might be explained by the relative lack of stylistic coherence and uniqueness in American cities (with its melting pot of styles and inﬂuences), as well as the supreme reign of the automobile on American streets,” the researchers wrote.

From a machine-learning perspective, the hard problem was sifting through vast amounts of visual data that tells you little about location, such as trees, concrete, and sky, to find the small patches that are “geo-informative.” That required acquiring the concept of, say, balcony well enough to match one city’s balconies to another’s while also finding the subtle differences. It took 150 processors running overnight to complete the analysis.

By the time it was finished, the program had come up with a set of features that could correctly identify Paris 83 percent of the time and Prague 92 percent of the time. And in a Web study, humans who had been to Paris were able to use the features that the computer had picked out to tell Paris from Prague 79 percent of the time. That’s far better than their 58 percent performance when provided with random Google Street View images from the two cities.

This suggests that people and the computers use many of the same visual cues to tell cities apart . It’s just that we probably do it subconsciously.

You can try your hand at it yourself with the study team’s Paris-NonParis test.

The research team, which was led by Carnegie Mellon computer science professor Alexei Efros and included members from INRIA/Ecole Normale Supérieure, will present its findings at a conference on Thursday in Los Angeles. The full paper is here.