AI Model Maintenance: A Guide to Managing a Model Post-Production
Managing Models Beyond the Initial Deployment
You’ve built your artificial intelligence model, trained it, tested it, tuned it, deployed it to production – you’re even receiving great feedback from customers! It’s time to move on to your next challenge, right?
The reality is it’s only the beginning: the lifecycle of a machine learning model continues long after deployment.
The final but continuous phase of ML development is model monitoring and maintenance. As with any piece of machinery, an AI model requires regular tuning and updating to meet performance expectations. Failing to perform this essential step may result in diminishing model accuracy over time. Recognizing that there is from the get-go allows you to think about your team and infrastructure’s long-term needs. With teams and infrastructure in place to retrain your model continuously, your model is set up for success beyond the initial launch.
Model Maintenance 101: Regular Retraining
Post-deployment, you need to monitor your model to ensure it continues to perform as expected. Model drift is a phenomenon you should both expect and be prepared to mitigate through regular retraining.
What’s Model Drift?
AI models are trained using historical data. If these models run in a static environment using static data, then model performance would continue at exactly the rate it is now – forever. But models don’t usually run in static environments; rather, they’re faced with ever-changing environments and variables. Over time, these changes cause degradation in model performance as the model has no predictive power for interpreting unfamiliar data. This performance decline is called model drift.
Let’s walk through an example. Imagine we have an AI model that powers a shopping website’s search engine function and performs at a desired accuracy threshold. Back in 2017, fidget spinner toys became an overnight sensation. This search engine had never encountered a fidget spinner before, so when a customer searched “fidget spinner,” the model couldn’t recognize the query and provided inaccurate results (driving down its overall accuracy below the desired threshold). In this instance, model drift occurred, affecting search relevance, due to changing external variables.
Why Do I Need to Perform Model Maintenance?
To solve the above fidget spinner problem, the engineering team would have to retrain their model with plenty of new data on fidget spinners, including providing the model with labeled images of fidget spinners and familiarizing it with all keywords related to fidget spinners. Following this update, the model’s performance would theoretically rebalance to the desired threshold.
The fidget spinner example may be a more binary illustration of how external circumstances can change, but it still underscores the impact of changing AI models’ variables. In many cases, exterior changes are more subtle and, therefore, more difficult to detect. Nonetheless, optimizing your model regularly is the only way to ensure your model continues to perform as intended.
How Do I Know When to Retrain My Model?
The retraining rate will depend on three factors: the model, the data, and the business problem the model is trying to solve. Some models require data that changes rapidly. For example, As we learn more about COVID-19 and new case data comes in, AI models tracking the virus need to update continuously. Other types of models may occupy less dynamic environments, only requiring retraining every few months.
There are generally two main approaches to retraining, each with its advantages and disadvantages:
Time-based: Retrain your model at a regular interval, regardless of how it’s performing. For time-based retraining, a clear understanding of how frequently data and variables change in your model’s environment is required. Your model performance will decline if your intervals are too spaced out in the interim.
Continuous: Monitor key performance indicators (such as accuracy thresholds and bias metrics) to determine when retraining is needed. This relies on a comprehensive panel of measurements that detect where model drift has the potential to occur. Using incorrect or vague measurements defeats the intention of the method.
The approach you take will depend on your understanding of your use case and data, and your team may choose to use a combination of the two techniques. Fortunately, there are more tools than ever for monitoring model performance, including technology that can immediately detect biases in data. Regardless of the approach you choose, you should include human-in-the-loop to regularly check that the model performs as expected.
Preparing Your Organization for Post-Production
Understanding the importance of model retraining is only the first step toward managing the post-deployment stage of model development. Before reaching that step in the ML lifecycle, work to create the team structures and processes that best support retraining pipelines. What this means in practice:
Invest in an MLOps team. This is an emerging field in AI development for a reason. MLOps is to AI what DevOps is to software development. MLOps is a collaboration between data scientists and production teams, which is a departure from the typically siloed nature of the data scientists (who build the model) and the engineers (who maintain it in production).
If you’re not ready for an MLOps team, ensure open communication lines and collaboration between your data scientists and engineers exist. Data scientists have expertise on how they built and trained the model that could inform engineering decisions in production surrounding model maintenance.
Incorporate feedback from customers quickly. Whether it’s your MLOps team or your engineering team managing the model in production, they need to communicate with the teams that are closest to the customer (such as customer success). Customers may identify model errors or areas of underperformance that were missed; the more efficiently Customer Success can pass this feedback on to the production team, the quicker the model can be retrained.
Obtain key stakeholder buy-in. Members of your organization may not understand that building ML models isn’t a one-and-done process. Educate critical stakeholders (executives in particular) on the importance of continuous model maintenance and retraining before launch to ensure enough time, money, and team members are invested in the post-deployment lifecycle stage.
Build a retraining pipeline. Map out a retraining strategy prior to the launch of your AI model and ensure you have the right tools and infrastructure in place to unlock efficiency. Include human-in-the-loop in your pipeline for ground truth monitoring.
After you’ve launched your AI model, your work may just be beginning. But by understanding the importance of retraining and investing in the right model maintenance resources, you’ll set your AI ventures up for continued success.
Watch the Launching AI Roundtable Series: Life After Deploying AI on-demand to hear experts shed light on how to be prepared for taking the next step with AI deployments.