Artificial intelligence (AI) is already transforming business, driving down costs, maximizing revenue, and enhancing customer experience. And many organizations are taking notice: The AI market size is expected to grow to $390.9 billion by 2025, and industries within the space show a similar trend—automotive AI, for example, is expected to grow by 35% year over year, and manufacturing AI will likely increase by $7.22 billion by 2023. We see organizations accelerating their adoption of AI projects as well, with Gartner reporting that the average company adopted four AI projects in 2019 and is expected to adopt 35 in 2022.
Even with this immense growth, challenges in deploying AI remain. According to top industry analysts, most (about 80%) of AI projects stall at the pilot phase or proof-of-concept phase, never reaching production. In many cases, this is due to a lack of high-quality data. Ethics and responsible AI continue to be obstacles for many companies, which often lack the resources or internal talent to build unbiased models in a time where AI is making increasingly impactful decisions. Companies also face an uphill battle with scaling and automation; while tech leaders are keen to apply DevOps principles to AI, they still struggle with architecting a solution for automating end-to-end machine learning (ML) pipelines.
Developing the right tools and strategies upfront will help overcome these challenges, giving businesses the confidence to deploy and the potential to scale.
Techniques and Tooling to Train, Deploy, and Tune ML Models
If there’s one key takeaway for deploying AI with confidence, it’s this: it’s all in the data. You know you need high-quality training data to launch effective models. So defining your data strategy upfront, including what your data pipeline will look like, will be crucial to success. To illustrate, let’s walk through a healthy ML pipeline:
Collect and Annotate Data
Many data scientists and machine learning engineers say that about 80% of their time is spent wrangling data. That’s a heavy uplift, but a model can’t work without training data. The model build process, then, starts with collecting and labeling training data.
You’ll want to start with a clear strategy for data collection. Think about the use cases you’re targeting and ensure your datasets represent each of them. Have a clear plan for collecting diverse datasets. For example, if you’re building AI for a self-driving car, you’ll likely want data representing different geographies, weather, and times of the day.
Next, you’ll want to implement your data annotation process, which in most cases, requires a diverse crowd of human annotators. The more accurate your labels, the more precise your model’s predictions will ultimately be. Various perspectives will enable you to cover a broader selection of use and edge cases.
At the data collection and annotation phase, it’s critical to have the right plan for tooling in place. Be sure to integrate quality assurance checks into your processes as well. Given that this step takes up most of the time spent on an AI project, it’s especially helpful to work with a data partner in this area.
Train and Validate Model
When your training data is ready, train your model using that data. Most ML models leverage supervised learning, which means you’ll need humans to provide ground truth monitoring. They’ll check to make sure the model is making accurate predictions. This is often a critical phase, but is a lighter lift. If the model isn’t working in this phase, go back and ensure your training data is truly the right data you need. Optimize with a focus on the business value that this model is supposed to bring.
Deploy with Confidence and Tune Model
Once your model reaches the desired accuracy levels, you’re ready to launch. Post-deployment, the model will start to encounter real-world data. Continue to evaluate the model’s output; if it fails to output the correct data, loop that data back through the validation phases. It’s helpful to keep a human-in-the-loop to manually check a model’s accuracy and provide corrected feedback in the case of low-confidence predictions or errors.
Remember to tune your model regularly after deployment. According to McKinsey, 33% of live AI deployments require “critical” monthly data updates to maintain accuracy thresholds as market conditions change. In our State of AI 2020, we found that 75% of organizations said that they must update their AI models at least quarterly. Regardless, every model should be continuously monitored for data drift to ensure it doesn’t become less effective over time or even obsolete.
Real-world Success Stories
Companies are leveraging AI for numerous fascinating endeavors. The following examples demonstrate, in particular, the importance of having an integrated data pipeline to deploy with confidence.
In 2017 John Deere acquired Blue River Technologies, and together they’re poised to revolutionize pesticide use. Their AI models use drones and computer vision algorithms to identify weeds on farms. Doing so enables pesticides only to be sprayed on the weeds, rather than all crops in a field. Spending on pesticides was around $20 billion per year, but with these efforts, it is expected to lead to a 90% reduction in pesticide costs.
The methodology for this AI project is precise image segmentation. This method requires labeling data at the pixel level to determine which component of an image is weed versus crop. As one might imagine, the annotation process is very complex and involved. It requires both a comprehensive tooling interface and human levelers with a deep level of expertise in segmentation.
The manufacturing industry is using AI to automate logistics and supply chains. Nokia, for example, uses machine learning to alert an assembly operator when quality deviates. Specifically, if there are inconsistencies in the production process. AI may also monitor and track packages as part of a smart factory monitoring system, reducing lead time and preventing overstocking, or it can monitor throughput and downtime, highly impactful factors from a cost perspective.
Data pipelines in manufacturing are highly dependent on the type of manufacturing and nature of the supply chain. Data may need to be collected from various machines and sensors and aggregated to create human-readable analytics.
There are many automotive AI trends worth highlighting, including automation and safety, voice assistance, and personalization, among others. Self-driving cars are perhaps receiving the most fanfare, as these have the power to most dramatically change our daily lives.
When we look at the ML pipelines involved in building an AI-powered, fully-automated vehicle, these grow increasingly complicated. Vast amounts of sensor data (cameras, LIDAR, and RADAR, for example) are required to effectively train an algorithm. Tooling enhancements are essential in this space to building safe, efficient AI.
It’s not limited to autonomous vehicles, though. Nissan uses machine learning to increase how many people convert from the test drive experience, increasing the number of test drive experiences by over 900% by continuously analyzing over 1,000 data points.
What these examples demonstrate is that many AI use cases require complex data pipelines to support accurate models. Getting the data strategy right up front is likely the difference between success and failure for cases like these.
ML Pipelines Help Deploy with Confidence and Realize Business Value
Developing an automated, integrated, and scalable data and model pipeline will help you increase your delivery speed and confidence in your models. There are many critical steps to ensuring your model is successful, but one of the biggest is ensuring your training data is accurate.
With the amount of time spent on the data collection and annotation, and the need for regular retraining and optimizing, even the biggest AI leaders turn to data partners. Data partners can reduce the amount of time your team spends on this part of the model building process as well as help QA your model to ensure it remains accurate as it scales. Learn more about how Appen can help you deploy with confidence and support your data collection and annotation needs.