Car Spotters! The Neural Net Can ID Cars on Google Street View 50 Times Faster Than You Can – And It Can Analyze the Demographics of a Neighborhood in Seconds By Its Cars

I’ve been wondering about this for a few years now: will machines be trained to spot and identify cars faster and more accurately than humans? Thanks to a link to an article sent me by CC reader Carlton C., it’s confirmed. A system called Convoluted Neural Networks (“CNN”) has been trained by scientists to identify cars on Google Street View in 0.2 seconds, as compared to some 10 seconds by human car spotters, who of course had to be hired to train the system.

And now it’s possible to confirm the demographics and political persuasions by just that, meaning quicker and cheaper than other existing means. As in: a preponderance of trucks means the neighborhood leans Republican, whereas a preponderance of sedans implies Democrats. And more Hondas and Toyotas means Asians. Chrysler, Buick, and Oldsmobile are positively associated with African American neighborhoods. Pickup trucks, Volkswagens, and Aston Martins are indicative of mostly Caucasian neighborhoods.

Surprised yet? Not me. But can it identify a 1959 Mercury?

Here’s the essence of the process:

We demonstrate that, by deploying a machine vision framework based on deep learning—specifically, Convolutional Neural Networks (CNN)—it is possible to not only recognize vehicles in a complex street scene but also to reliably determine a wide range of vehicle characteristics, including make, model, and year. Whereas many challenging tasks in machine vision (such as photo tagging) are easy for humans, the fine-grained object recognition task we perform here is one that few people could accomplish for even a handful of images. Differences between cars can be imperceptible to an untrained person; for instance, some car models can have subtle changes in tail lights (e.g., 2007 Honda Accord vs. 2008 Honda Accord) or grilles (e.g., 2001 Ford F-150 Supercrew LL vs. 2011 Ford F-150 Supercrew SVT). Nevertheless, our system is able to classify automobiles into one of 2,657 categories, taking 0.2 s per vehicle image to do so. While it classified the automobiles in 50 million images in 2 wk, a human expert, assuming 10 s per image, would take more than 15 years to perform the same task. Using the classified motor vehicles in each neighborhood, we infer a wide range of demographic statistics, socioeconomic attributes, and political preferences of its residents.

In the first step of our analysis, we collected 50 million Google Street View images from 3,068 zip codes and 39,286 voting precincts spanning 200 US cities (Fig. 1). Using these images and annotated photos of cars, our object recognition algorithm [a “Deformable Part Model” (DPM) (11)] learned to automatically localize motor vehicles on the street (12) (see Materials and Methods). This model took advantage of a gold-standard dataset we generated by asking humans (both laypeople, recruited using Amazon Mechanical Turk, and car experts recruited through Craigslist) to identify cars in Google Street View scenes.

Recruited from Craigslist? Why not CC?

I’m impressed, but 2,657 categories of automobiles does leave out some. I’d be curious to see just what that list comprises. And how well this machine does in some really CC-reach neighborhoods.

Here’s the full article