- User creativity
- Social connection
- Overall efficiency
- Safety
Creativity
Often we think that the goals of commercial computer vision have to do with helping people do more things or do them faster/more effectively. But in some cases, organizations are actually more interested in inspiring customers or sparking their imagination. For example, at a practical level, Shutterstock helps people find the right images out of their enormous database, but they also suggest particular kinds of edits based on what’s in the the photo that a customer selects. This comes from the fact that people want to do different things with, say, portraits than they do with landscapes. So while we could say there’s a use case around image search, it’s probably better to focus on what searches are for since knowing the content of an image lets them be more specific with what they help users do. An adjacent use case leverages user behavior to uncover intent and inspire creativity as well. For example, people who post a lot of kitchen pictures to Pinterest may be planning a remodel. In fact, Lowe’s has been exploring how to take a person’s whole page of pins, match it against the Lowe’s catalog to find similar objects and then assembling all these pieces together visually. The business intent is to sell a whole new kitchen, but its success is connected to how Lowe’s helps users dream and turn those dreams into reality.
Connection
Humans are social animals, so a lot of projects that seem to be about search/retrieval are really about how we relate to others. This is most obvious with Facebook’s facial recognition, which finds and suggests photos that have friends and family in them. Meanwhile, Apple uses high-level computer models to help search your photos for, say, a dog. Even though you haven’t annotated any of the images like you might on Facebook, the model will help find your favorite pooches in your albums. Now, since sharing is at the heart of why people are looking for a photo, you want to make sharing a prominent and easy part of the product design. You can also learn, over time, the qualities of the photos that users individually and collectively tend to share. That is, you can build in feedback loops that make user actions part of the training data so that your systems get smarter over time. In this vein, Trulia logs how long people look at various photos of homes for sale in their app and they use this to intuit what a user likes as well as to understand long-range themes across users, down to the level of which enamels and finishes correlate the best with attention and house sales. This is a particularly good spot to be in: individual user behaviors enrich data at a higher, more general level, over a longer stretch of time. Monitoring what’s inside photographs also lets companies identify products in social media. While relatively simple text analytics can tell you how people are mentioning your products, vision models can help you know know how often products are appearing in Instagram or any other platform. Understanding a person’s style in clothing and cars can also help target them for related products. You can also match people with places—if you’re visiting a new city, where is it that people who look like you go? As you can imagine, a lot of computer vision applications require careful ethical reflection about what they do to privacy and social segregation.
Efficiency
The easiest business applications for computer vision come from helping people be more efficient; instead of fixing problems, just help people know where to look. For example, in content moderation, you can train a model to find offensive images without necessarily having to traumatize a bunch of human content reviewers with the more disturbing things that get posted on websites. Efficiency is also behind a lot of medical/healthcare applications of computer vision. The goal isn’t to replace diagnosticians but to direct their efforts. Instead of looking at hundreds of, say, radiology scans where nothing is out of the ordinary, show the experts only the images the model finds problematic or ones where the algorithm isn’t very confident. If there are water rationing provisions in place but you see a bunch of emerald green lawns for mansions in Beverly Hills, you can be confident there are violations. It’s a lot easier for a machine to churn through acres upon acres of satellite photos than for a human to do that–or go door-to-door. But models like the one created by OmniEarth can be used to distinguish between a pond and a pool in satellite photos. There are a variety of uses for aerial and satellite photos, perhaps the most wide-reaching are concerned with deforestation and urbanization. The MIT Media Lab has worked to find safer and less safe parts of cities as well as to understand what makes cities thrive. Likewise, using aerial photos to detect logging roads that are a precursor to logging in the rain forest makes enforcement far more efficient.