Machine learning (ML) and computer vision are advancing at such a rapid pace that many are likening it to the Cambrian explosion, the sudden emergence of complex animals from simple organisms hundreds of millions of years ago. But I also suspect there is another parallel to ML: one of the catalysts of the Cambrian explosion was the evolution of eyes, and in my opinion, ‘computer vision’ is similarly poised for a breakout.
Put simply, machines seeing things and understanding them is at the heart of what we mean when we talk about “artificial intelligence” because the majority of the data we process as humans is visual; it’s our most powerful sense. This is reflected in all facets of our culture (e.g. the famous lip-reading scene in 2001: A Space Odyssey) as well as our technology. For example, with the smartphone revolution, in some ways the real revolution was having a mobile, connected camera.
In the spirit of the visual, I’ve organized my thoughts around pictures that illustrate our current progress, and what the future may hold.
As in other segments of machine learning, computer vision has been advancing by leaps and bounds. For example, in only six years the error rate for object identification, the ability to identify objects in images, has dropped from about 25% to just 2% currently - that’s better than people. A.I. is finally allowing computers to understand the ‘meaning’ of images, like recognizing that a photo of the Eiffel Tower, and a painting of it, reference the same object.
Silicon Valley is understandably attentive: Pinterest just hired a Head of Computer Vision, Amazon is adding a camera to the Echo, Microsoft has been lauded by analysts for its image recognition efforts, startups like Clarifai have very impressive pedigrees, and Google recently released AutoML Vision, which makes custom ML models for image recognition much faster and easier to create.
Google in particular has been a leader with respect to computer vision for years, as simply by crawling the Web, Google can amass a vast trove of data on which to sharpen its ML teeth. In fact, I suspect the subject of computer vision crystallized for many people after Google published its neural net “dreams” in 2015.
In my field, content management, computer vision’s implications are so far-reaching we’re having to reexamine some very fundamental concepts, such as what constitutes a document. Computers can currently extract keywords and tags from an image and process pictures as readily as text documents or spreadsheets. Soon, we’ll be able to extract and process data within large raw video files easily and automatically. New avenues for content automation will arise, such as in fraud detection and contract analysis. Businesses are using vision intelligence even today to automate the search for - and take action against - trademark violations.
Other fields are realizing similar potential: doctors want to use computer vision to automate diagnoses like scans for eye disease. Duke University is cataloging every power plant in the world by applying computer vision to satellite imagery. But the most daunting test of vision intelligence to-date, in my opinion, is the spectrum of decisions facing a self-driving car. Automated vehicles must 1) recognize street signs (at high speeds), 2) derive their meaning, and 3) understand whether that meaning applies to it, or to other cars. There’s no end in sight (pun intended) to the possibilities.
Can you see the car? One day, it will see you!
Yet, while computer vision holds great promise, it also necessitates thoughtful caretaking. This article’s second image, of Google’s neural dreams, is what happens when you give a picture of the sky to an algorithm designed to find animals. As with other areas of ML, biases - whether unintentionally transferred from human programmers, or arising of the program’s own design - must be closely scrutinized. Seeing patterns where there are none is a problem that is unfortunately limited to neither machine nor man.
I’ll end on a personal note: the other day, I was cataloguing pictures of my children from a long time ago, back when they were so young they were hard to tell apart save for their clothing, and Google Images’ facial recognition identified them quicker than I did. Computer vision is rapidly revealing its enormous potential, and given the intimacy with which we will one day invite it into our lives, we must be cognizant of the amazing, but unpredictable, changes ahead.