"Data drift monitoring seems to be a thing. Should I keep an eye on it if I can evaluate the model quality directly?"
Not always, but you might want to!
If you can evaluate the model quality fast, you might just stop there. For example, if you make content recommendations inside an app, you immediately know if the user clicks on the suggested reads. The feedback loop is almost instant, and you can track a metric like accuracy to monitor the model performance.
But even if you can calculate the model quality metric, monitoring data and prediction drift can be often useful. Let's consider a few examples when it makes sense to track the distributions of the model inputs and outputs.
1. You need to wait for the labels
You don't always get real-world feedback quickly.
Sometimes you can calculate the model quality metric, but only with a lag. Imagine that you forecast sales volumes or predict the probability of the credit default. It can take a week, a month, or a year until you learn if the model predictions are accurate.
If the use case is not too critical and the error cost is low, you can probably take your time. Otherwise, it might be useful to evaluate the model performance before the true labels arrive.
If you have nothing else, data and prediction drift provide an indirect signal of the model quality. If the data distributions are stable, the model will probably do a good job. If things change, it's time to investigate and intervene. You can take steps to combat drift as needed.
2. You want to get an early warning
Even if you get the labels quickly, keeping tabs on the data drift can provide an extra signal.
Model performance doesn't always change overnight. The quality might be trending down but stay within the expected range. If you only look at the direct performance metric, it would not give you enough reason to intervene.
In this tutorial, we showed how the statistical data drift appears during the first week of the model application, while the model quality metric still looks reasonable. In cases like this, you might look both at the data drift and model performance.
Data drift monitoring can provide additional information about the model quality trend. The goal is to catch the decay early and learn about the possible change even before the performance takes a hit.
Tracking drift as an additional metric can, of course, increase the number of false-positive alerts. However, if you operate in a high-risk environment when the model mistakes can have serious consequences, you might err on the side of caution. It's better to deal with an extra alert than be late to react.
3. The model performance is volatile
The model quality metric is not always the best reflection of the performance.
Model quality might fluctuate for a variety of reasons. For example, if you have a model that relies on many weak factors.
Take grocery demand forecasting. One day the prediction quality is slightly worse because the weather is rainy, and people decide to stay home. Another day, there is a promotion nearby, and the people rush to all the stores, including yours.
In cases like this, you might have constant noise in the data. The forecasting quality can be 2% better one day and 3% worse on the next. You would probably know this in advance by simply plotting the model performance by day or by the hour.
It might be hard to distinguish between the usual quality fluctuation and the emerging model decay. The latter might happen when you have a new segment of shoppers appearing, for example.
Input data drift monitoring can tell more about the model stability than the quality metric. When the model performance fluctuates, you might want to keep an eye on both metrics to detect and interpret the actual model drift underneath.
For example, you might decide to dismiss an alert about model quality drop if no data drift is detected and only react if both alerts appear simultaneously. Or you can even give priority to the data drift alert.
4. You have different data segments
Extending the previous point: here is a particular example of the known model instability.
Sometimes you have segments in the data defined by the value of a particular categorical variable. This can be region, lead source, customer type, etc. The model performance varies for these segments. For example, the quality of content recommendations for the logged users with a long interaction history is much better than for the new visitors.
Depending on the inbound marketing campaigns, you can see more users of the first type or more of the second. It would lead to the fluctuation in the aggregate daily model performance.
There is, however, a difference between "the model performance is down because we have a lot of new users today" and "the model performance is down even though the user population is stable." The former is not a reason to worry while the latter probably is.
Monitoring a single model performance metric can be less informative in cases like this. The model accuracy will change following the change in the input data distributions. You need a way to keep tabs on the data, too. Data drift monitoring can provide it.
Data drift monitoring can help distinguish model underperformance from expected variations. You can track the distributions of the features that define the known segments in the data. It would make the interpretation of the model performance issues easier.
5. You want to be ready for debugging
Even if you know the actual model performance immediately, what will you do if it ever goes down?
After noticing a quality drop, the next step is to try and find the reason. Looking at the input data changes is an essential part of it. You can as well anticipate it.
In this case, you might not need to bring in the drift detection in the monitoring layer. The alerting and notification rules can still rely on model performance metrics. But in addition to it, you can automatically generate the data drift reports or dashboards if you detect a quality drop.
They can include, for example, the distribution plots for the top model features. Including such visualizations can help start the investigation and figure out the root cause of the decay. Otherwise, the same effort will be needed during the ad hoc analysis of the model logs.
Tracking data and prediction drift is not a must in ML monitoring. However, it often makes sense. Here are a few examples when you might find this drift monitoring helpful:
- Drift in the input data and predictions can be a proxy for model quality. It is a valuable signal if the ground truth feedback is delayed or hard to measure.
- Data drift can warn about expected decay even before the model quality goes down. You might want to track it as an additional metric if the cost of model error is high.
- Data drift can be a better measure of the model stability. If the model performance is volatile, you can rely on drift checks to detect real change.
- Data drift analysis can help distinguish between the actual model decay and a quality fluctuation due to the relative size of known data segments. If you have segments with varying performance, feature drift monitoring is one of the ways how you can track them.
- Data drift reports can help with the root cause analysis when you notice a performance drop. Having the tracking set in advance can provide a complete picture of the model health for the inevitable debugging.
[fs-toc-omit]Get started with open-source ML monitoring
Evaluate, test, and monitor ML models in production with Evidently. From tabular data to NLP and LLM. Built for data scientists and ML engineers.
Get started ⟶
Cloud waitlist ⟶