Augmented reality and virtual reality (AR/VR) are two promising technologies that businesses are keeping an eye on. In fact, nearly three-quarters of industry leaders report that they expect these immersive technologies to be mainstream within the next five years and Goldman Sachs reports that the AR/VR industry will be worth $95 billion by 2025. At the same time, running in parallel is AI and machine learning, which are not only becoming mainstream quickly, but are now considered mission-critical for modern business.
Only recently has the tech world started to unlock the benefits behind combining AR/VR with AI. Pairing the two has the power to drive innovation, new customer experiences, and new ways of interacting with our world. But without high-quality data, this partnership won’t succeed.
AI and AR/VR: A Perfect Fit?
First, let’s define what we mean by AR/VR:
Augmented reality – A blend of the physical and digital environments; refers to the technology in which data is overlaid on the physical reality using a fusion of sensor data from cameras, accelerometers, etc. Pokémon Go is a popular example.
Virtual reality – A computer-generated simulation of a 3D image that enables the person to interact with a digital environment.
The AR/VR field has traditionally leveraged techniques like computer vision (not AI-powered) to advance innovation. But many businesses are discovering that these technologies and AI have a deep, complementary connection. AI excels at many actions that are beneficial to AR/VR: it can track objects, create detailed models of the 3D world, understand what features are in these models, and make judgments about them.
Deep learning models in AI are particularly useful here, as they can identify vertical and horizontal planes; track an object’s movements and position; and estimate object depths, among other AR/VR synchronicities. Deep learning models can, in other words, help an AR/VR system interpret complex environments. An auto mechanic could, theoretically, use an AI-powered AR system to view a vehicle’s engine and be told by the system which parts need to be fixed and how.
As a result of these complementary characteristics, AI is starting to replace traditional computer vision methods in AR/VR, with a number of industry leaders projecting that AI will help drive immersive technology adoption in consumer and business segments. Specifically, AI can enhance AR/VR experiences through the application of more realistic models as well as giving people greater ability to interact with the scenes.
This powerful partnership of AR/VR and AI is due in part to advances in deep learning that apply to 3D model building, increased availability of data and data storage options like the cloud, and increasing levels of computing power. Regardless of the reasons, the integration is expected to provide exciting opportunities across many industries.
How Companies Are Already Using AI and AR/VR
AI enhances AR/VR technologies in numerous ways: through improving the quality of the content, advancing and personalizing the user experience, and fostering more efficient interaction between the user and the technology. It’s for these reasons that many startups and tech companies are already making use of AI-powered immersive technologies. Here are a few exciting examples to watch for:
Using image recognition deep learning technologies, AI with AV can help engineers handle maintenance issues in aviation by pinpointing which components of the airplane need improvements and providing detailed instructions on how to make them.
Many applications of AI-powered AR/VR exist in retail. These include, for example:
- Pop-up coupons that could appear in a digital environment while a shopper is navigating the aisles of a store.
- Virtual showrooms that display products customized to the shopper’s interests or needs.
- Virtual fitting rooms that enable a customer to try on clothes in the comfort of their home.
- AR that shows the customer pieces of furniture placed inside their own home.
Behind creative industries, retail likely has the most to gain from the AR/VR field.
AI-powered VR can guide members of the military through simulated hazardous environments, with the goal of reducing error rates when they face the real thing.
Smart glasses could eventually be the norm for eyewear for all of us. These could give us useful information on the people we meet. For example, if we run into a coworker, our glasses could identify what position in the company that coworker holds.
AR/VR could be coming to a virtual meeting near you. Possible applications include providing an immersive virtual experience, where the user feels like they’re in the office with their fellow co workers rather than at home at their computer. AI could add camera tracking (much like Facebook Portal offers) so the focus is always brought to whoever is speaking at the moment.
Security could leverage AI-powered VR for identity detection and for flagging images of suspicious people.
Gaming is perhaps the first example people think of when it comes to AR/VR, especially with the Pokémon Go craze that spread throughout many parts of the world a few years ago. Indeed, the strongest demand for AR/VR technologies comes from the creative industries, starting with video games and ending with live events and video entertainment. AI could help create increasingly realistic gaming experiences as well as offer more opportunities for the gamer to interact with the digital environment.
In many of the above use cases, startups and tech companies are already hard at work in implementation, so these are more reality than fiction. However, it may take several years before AI and AR/VR combinations are truly ubiquitous in our lives.
How Data Powers AI and AR/VR
Generating an AI-powered AR/VR system can only be accomplished through the use of tons of data, which makes data collection and annotation critical steps in the process of building these types of technologies. Data may be collected from sensors (like smartphone cameras, for instance), product images, social networks, and many, many other locations. Depending on the use case, the data can include image, video, audio, and text, all of which need to be labeled with key features for the model to recognize, making these systems very complex projects. For example, here are several types of data annotation that are common to AI and AR/VR projects:
Image and Video
Object detection: Model learns to identify objects in an image and their positions. This can trigger hit boxes and colliders that enable the user to interact with the environment.
Classification: Model learns to classify target objects in an image, which then triggers a label of that image to display.
Segmentation: Usually done at the pixel level, the model learns to segment target objects in an image.
Audio recognition: Model processes audio, such as speech, and interprets accordingly. Certain keywords may trigger AR/VR effects, such as in a gaming environment.
Text recognition and translation: Model learns to detect and read text in an image, which is then translated to the appropriate language. AR technology can then overlay the translated text into the physical world.
In the above examples, you can get a sense of how AI and AR/VR technologies integrate to provide an interactive experience for the user. The more data collected, the more realistic the environment has the possibility to be. This remains true in terms of the quality of the data as well: high-quality data will produce environments of equally high quality. Additionally, more data, especially about the user themselves, can create a more personalized environment for that user.
Often, AR/VR data consists of personally identifiable information (PII) to create custom environments and interactions. PII may include geolocation data, biometrics, purchase histories, and other PII. Data security is critical when building these applications to ensure customer information is protected and confidential. Working with PII means having strict security protocols in place that achieve the highest levels of compliance for the region and type of data.
Using a Data Provider to Get Ahead
Virtual worlds are complex and building them isn’t a simple task. Many companies seek help from a third-party data provider to gain a competitive advantage in the immersive space. A data provider can offer a huge uplift in gathering relevant data for both the AI and AR/VR models. The right data provider will likewise have tools and processes for annotating that data with accuracy, to ensure the resulting environments are as realistic as possible.
Working with a data provider, you can set up scalable data pipelines that help you continuously improve your models with new, labeled data. Model improvement will correlate directly with an enhanced user experience. As the real world changes, so should your virtual models, and a data provider will help you monitor your system for regular retraining.
With AI anticipated to be the engine that drives the AR/VR industry forward in the coming years, acquiring the right data and annotating it with accuracy should be considered the fuel that powers the engine. With the complexity that this task requires, leveraging the right data partner can give you a competitive edge among the competition. Given the fast-paced nature of both AI and immersive technologies, this may be an essential step on your AI journey.
Expert Insight from Don Blaine – Senior Solutions Engineer at Appen
Fundamentally, what’s needed for an exceptional AR/VR application is the ability to comprehend the environment as well as how users will interact within that environment.
Comprehending the environment VR vs AR
With VR, the environment is created from scratch digitally, which means that each component of the environment can be specifically identified and interacted with programmatically, based on the environment’s defined methods. The benefit here is that, right from the start, the environment and everything within it is interchangeable without having to obtain any additional data. Additionally, because VR environments are created from scratch rather than taken from a physical world, there is a far greater degree of specification that can go into what that environment is. Capturing physical environments that meet specific requirements can often be quite tedious compared to simulating those environments in a virtual space.
With AR, the environment is a physical area, such as a street or shopping aisle, or the area you are currently looking at. As this environment is taken directly from the physical world, it can be far denser than a VR environment and must be captured using one or many sensors to provide data in formats such as (LiDAR/ Radar/ Video/ Audio/ Images/ etc.) often in combination with one another.
Once data has been captured from the environment, we need to know what’s what within that environment. This typically involves creating an ML model for detection/ classification/ segmentation/ recognition of the components within that data that are relevant to that application at hand. For example, if we have a video of a road with cars driving in it, we may want a model that, given a video, identifies the boundaries of each car within each frame of that video. Another example may be identifying words written on a menu for automatic translation. In both cases, we will first need to create human-labeled training data that consists of individuals detecting the relevant areas from the data, segmenting those areas from the full data, and then classifying that data. The end result here is that, like VR, we have an environment where some of the elements can be identified and interacted with programmatically, which is needed if we want users to interact with the environment.
Interacting with the environment
Every application is unique in how it asks users to interact with an environment. In some cases, it’s with a smartphone, in others with smart glasses, and others with specific AR/VR devices. In each case, actions that a user is performing must be captured by a device sensor and then processed/classified into what that action means in the AR/VR environment. An example of this would be snapping your fingers in front of your device camera; this means your application would need to process the video, capture and identify your hand and then detect that the action you are doing is ‘snapping your fingers”. To do this you would need to create a model using human-annotated data that identifies hands within a video as well as a model that identifies what certain hand positions should be classified as.
How to start
The best way to start is to fully define what data you can capture programmatically and what data you will need a model to process for you. Once you know what models you’ll need to create, the next step is to connect with a data provider like Appen to collect training data suitable for that use case. Gathering high-quality training data can be as tedious as building the model itself and a model is only as good as the data it’s built on, so it’s best to define what data you are looking for as objectively as possible to avoid any subjective confusion.
Ask yourself: how would you tell people to judge whether someone is ‘snapping their fingers? Would it just be contact between the thumb and middle finger? What if no sound was created? What if they snapped with their thumb and ring finger or thumb and index finger? A model is only as good as the data it was built on which is why starting your AR/VR journey with a partner like Appen to test and iterate potential solutions for your project is the best way to start.
What We Can Do For You
Appen collects and labels images, text, speech, audio, and video used to build and continuously improve the world’s most innovative and complex artificial intelligence systems. With over 25 years of expertise in more than 235 languages, a global crowd of over 1 million skilled contractors, and the industry’s most advanced AI-assisted data annotation platform, Appen solutions provide the quality, security, and speed required by leaders in technology, automotive, financial services, retail, manufacturing, and governments worldwide.
We can help your organization with data collection, data annotation, as well as model retraining and improvement in the post-production phase. Machine learning assistance is built-in to our industry leading annotation tools to save you time, effort, and money—accelerating the ROI on your AR/VR or other AI initiatives.