Future Tense

AI Can Spot the Building Numbers in Google Maps Images

Google funded extensive research to automate the process of identifying building numbers from Google Maps images.

Photo by rjs1322

Google collects and organizes an almost unfathomable amount of data every day. It harnesses the information to provide a bunch of services and make money. But with that much intel swirling around, there’s bound to be stuff that Google doesn’t know it knows. Except the company does know it doesn’t know everything it knows, and it’s working hard to identify and claim the unstructured data that exists within its servers.

The most recent effort involves research that has succeeded in automating the process of finding building numbers in Google Street View images. This information allows Google to correlate geolocation data with real addresses, which can be especially crucial in places where house or building numbers do not ascend or descend in an easily recognizable pattern.

Humans could theoretically be tasked with the job, because people can identify building numbers in images with 98 percent accuracy. But the task of finding tens of millions of building numbers in hundreds of millions of Street View panoramas would require a lot of soul-sucking time from even a large army of humans.

Instead of recruiting for this awful task, a team of Google researchers worked to automate the process using an artificial neural network that allows for pattern recognition and independent, experiential learning on interconnected processors (mimicking an animal nervous system). The researchers “trained” the system over six days using the publicly available “Street View House Numbers” data set, which includes 200,000 building numbers. As the 11-layer neural network ran through the images, it learned the relevant patterns, taking the numbers in as a whole instead of analyzing them one digit at a time.

When the researchers trained the neural network on 95 percent of Street View data, the system was able to spot almost 100 million actual address numbers with accuracy comparable to a human’s (98 percent). Ian Goodfellow, a member of the Google research team, told MIT Technology Review that this result was “unprecedented success” and suggested that the strategy might be applicable in similar problems like text transcription or speech recognition.

To make all of this possible, the team programmed the neural network to assume that no building number was longer than five digits, which most aren’t. The system was also identifying numbers in images that were cropped so that the number took up one-third or more of the total image width.

The most successful part of the research appears to be speed, the weakest area for humans. For example, Goodfellow says that it would take an hour for the system to note all of the street numbers in France. That’s pretty legit. One problem, though, is that the technique doesn’t seem to be scalable for collecting other unstructured data in Street View images, like phone numbers on signs or ID numbers on cabs. The problem is that these strings of numbers might be longer than five digits, and therefore outside the scope of what the neural network can accomplish.

It’s easy to see how this type of unstructured data, which we currently view as nonthreatening because it was previously so difficult to capture, could become a topic of concern. It could allow companies like Google, or essentially anyone, to obtain a deeper level of correlation and tracking than has ever been available before. But the times when Street View cameras are trained on a particular scene or place are still infrequent and the people or vehicles they capture are still fairly random. It’s definitely feeling less exciting and more unnerving to find your house, or even yourself, on Street View as its functionality grows.