TL;DR: This is a new type of blog where we answer specific questions. In this one, we explain the difference between outlier and drift detection. You might need both to monitor your models in production.
We spend a lot of time answering questions about ML in production, monitoring, and system design. Some users write us directly. Some questions come up in our Discord community chats. Some are repeated over and over when we speak at different meetups.
We decided to write some things down and start a new section on our blog: machine learning Q&A! Expect visual explainers, both high-level and deep dives.
And if you want to ask your question, welcome to our community server!
Here is the first one.
What is the difference between outlier and data drift detection?
When monitoring ML models in production, we can apply different techniques.
Data drift and outlier detection are among those. Both are useful when we do not have ground truth labels yet. The data is then the only thing to look at.
There are various statistical approaches to detect either (an interesting discussion by itself!), but also a principle difference.
Focus: whole dataset VS individual objects
When we talk about drift detection, we look at the "global" data distributions in the whole dataset. We want to know if they have shifted significantly compared to the past period or model training.
We perform the drift analysis to test if this holds.
Data drift might look like a gradual shift in object properties. In this simplified example, we would observe the distribution shift for the features "color" and "position" while "shape" and "size" remain consistent.
Assuming all features are quite important, that sounds like something we'd have to react to! For example, by retraining the model so that it learns the new pattern.
When we search for outliers, we want to find individual "unusual" or "different" objects in our input data. These objects might appear both in training and production data from time to time. We call them anomalies, or outliers—hinting that this is something we expect rarely.
In this example, some of the objects would stand out since their "shape," "position", "size" or "color" is very different from the rest.
We might want to detect them and send them for manual review, for example.
Data drift and outliers can exist independently. The whole dataset might drift without outliers. An individual outlier might easily appear without data drift.
When it comes to monitoring, we might want to keep an eye on both. We would have different expectations and workflows for each of them.
Decision: can I trust the model VS can I trust the prediction?
When we monitor for drift, our goal is to decide if we can trust the model still performs as expected. The assumption is that if the data distributions are similar to what the model was trained on, it should do well overall.
If the distributions have changed, the whole system might need an update.
If we suspect drift, we take action on the level of the model: retrain it using the new data, rebuild it entirely, or apply some business logic for all the model outputs.
When we monitor for outliers, our goal is to decide if we can trust that the model performs well on the particular input. The assumption is that if the specific object is "too far" from everything the model knows, it probably won't make a very good prediction this time.
If we find outliers, we take an action on the level of the individual object: ask a human expert to make a decision instead of the model or apply some business logic for this particular output.
These differences also affect the way we design the checks applied in each case.
Test design: robustness VS sensitivity
We want the drift detector to be robust to outliers.
It should not raise alarms if there is a single broken input, for example. It should react only when we see "enough" changes altogether.
This can often be achieved by picking the right statistical test to compare the data distributions for the individual features: e.g. Kolmogorov–Smirnov, Anderson–Darling, or Chi-squared test. A lot of nuances still apply as to which test to pick when. We are working on selecting reasonable defaults for the evidently open-source library to address this.
On the other hand, we want the outlier detector to be sensitive enough.
It should raise alarms when individual objects look "strange" even before changes accumulate and reach a critical mass. We will likely opt for a different test, such as the isolation forest algorithm or one-class SVM.
Outlier detection is often important when the cost of a single model mistake is high. We will likely tolerate some false positive alerts to perform extra spot-checking.
Test application: one, both, or neither
Do we always need both tests? Not really.
Sometimes we might only look at the data drift. For example, we would use the test to decide when to start labeling new data to update the model.
Otherwise, if the cost of individual model error is not too high, we might be fine if our predictions are occasionally off due to the outliers.
Sometimes we would only check for the outliers. For example, we would use outlier detection to decide when to rely on the rule-based response instead of the model prediction.
However, we might not always need drift detection. We usually check for it as a proxy for model quality. For example, if we have to wait for the ground truth labels. But if the labels arrive fast, we can calculate the true model quality instead, such as accuracy or mean error. The drift check would be an extra.
Both drift and outlier detection can be helpful to monitor the production performance of machine learning models.
We don't always need them but might choose to run one or both checks. Each has its specifics.
Data drift detection helps define when the overall distributions of the input data changed. We design this test to be robust to outliers so that it alerts only to the meaningful shifts. We would typically react to drift by retraining or updating the model.
Outlier detection helps detect individual unusual data inputs. We design this test to be sensitive enough to catch a single deviating input. We would typically react to outliers by applying some business logic or manually processing this individual object to take a decision.
Have a question about production machine learning? Join our Discord community and ask in the #ds-ml-questions channel.
[fs-toc-omit]Get started with open-source ML monitoring
Evaluate, test, and monitor ML models in production with Evidently. From tabular data to NLP and LLM. Built for data scientists and ML engineers.
Get started ⟶
Cloud waitlist ⟶