📚 LLM-as-a-Judge: a Complete Guide on Using LLMs for Evaluations. Get your copy

Evidently

Evidently 0.1.4: Analyze Target and Prediction Drift in Machine Learning Models

Last updated:

January 23, 2025

Published:

December 29, 2020

contents‍

Start testing your AI systems today

Get demo

Our second report is released! Now, you can use Evidently to explore the changes in your target function and model predictions.

What is it?

The Data Drift report preset we released earlier helps you detect the change in model features. Similarly, you can look at how your model predictions and target evolve.

When you know the ground truth (i.e. actual labels or values), the Target Drift report helps you explore the changes in the target function and understand how to adapt. If the ground truth is not available, you can use this new report to detect model decay in advance by evaluating prediction drift.

Basically, it is a way to answer the questions quickly:

Does the model target behave similarly to the past period?
Do my model predictions still look the same?

‍If anything changes, you can further explore:

What exactly has changed?
Which features are likely to contribute to the shift?
Are there any new data segments to pay attention to?

You are reading a blog about an early Evidently release. This functionality has since been improved, and additional Report types are available. You can browse available reports and check out the current documentation for details.

How does it work?

Once again, you need to prepare two Pandas DataFrames. The first one is your Reference dataset. Usually, this is the data you use to train your model. The second is the most recent Current data.

These data frames can include only the predicted values, only actual values, or both.

Based on what you predict, you can apply the report to:

Numerical targets. It can be a regression, a probabilistic classification or a ranking task.‍
Categorical targets. For example, a binary or a multi-class classification.‍

Evidently will automatically parse the target type, and apply the appropriate statistical tests.

Simply call the Target Drift preset for your data, and all plots and insights will be served!

Numerical target drift

For Numerical Target Drift, your report will first show the comparison of target distributions:

Comparison of distributions. Target drift is detected.

If no drift is detected, most likely you are good to go. Nothing major is going on.

In this example, the tool has detected distribution drift (based on the two-sample Kolmogorov-Smirnov test at a 0.95 confidence level).

‍When this happens, you'd want to see the main changes quickly. The tool does just that. It plots the target values from Reference and Current datasets for visual comparison. It also uses index or DateTime, if available.

‍Next, the report shows the change in correlations between individual features and the target. By default, it will use the Pearson correlation coefficient. Even though not all relationships are linear, it can be helpful. A significant shift in a correlation coefficient can point to the source of change.

Correlations between features and target. Target values over time.

To dig deeper, it gives an overview of the behavior of each feature. This helps understand if they can explain the target shift.

List of features and their relation to the target.

If you have a reasonable number of features, you might want to go through them one by one.

In other cases, you might look:

only at your most important features;
features with a significant correlation change with the target;
features where the data drift was detected.

This way, you can visually discover new data segments in specific features, and see how they associate with the target values.

For example, in a Boston house pricing dataset, you can see a new segment with values of TAX above 600 but the low value of the target (house price).

When you notice the change, you can interpret why it is happening, using your domain knowledge. Sometimes, there is an actual real-world change due to data and concept drift.

In other cases, what looks like a "new segment" can be a result of data quality issues, like measurement errors.

Categorical target drift

For Categorical Target Drift, the report looks slightly different.

First, it visually compares the target distributions and performs the test to detect drift. Since the target is categorical, for smaller samples it will use the chi-squared test.

For a classification problem with three classes, it can look like this.

This is an extreme case of the target shift. In the Reference dataset, we most frequently observe Target class = 0. In the Current dataset, the most popular Target class = 2.

When this drift is detected, you can again dig deeper into the behavior of the individual features. In this example, we took the well-known Iris dataset, so the feature list is rather short.

Visualisations demonstrate the feature values we tend to observe for different target classes. And, how it changes over time.

In this case, you can quickly understand that we deal with the classic example of data drift. In Current data, we face a new input distribution, different from the Reference dataset.

The "old" classes we already know behave the same way. But a new prevalent class comes with a new feature space. For several features, we see new values visibly aligned with the label "2". In the image above, these are the observations with the petal width more than 2cm.

Training the model on this new dataset will likely help solve the issue.

When should I use this report?

Here are our suggestions on when to use it—best combined with our Data Drift report.

1. Before model retraining.‍

Before feeding fresh data into the model, you might want to verify whether it even makes sense.

When nothing changes, most likely an update will not improve the performance.

When things change too much, blind retraining might not be enough.

If you observe specific changes in the data or target function, you might want to:

add or drop certain features;
perform additional feature engineering;
introduce new business rules or make ensemble models to account for new segments.

The report will help you decide if certain features or segments need attention.‍

2. When you are debugging the model decay.‍

If you already observe a drop in performance, this report can help you with debugging. It quickly shows you what has changed so that you know how to explain and address it.‍

3. When you are flying blind, and no ground truth is available.‍

Not having immediate feedback is no reason to ignore your model. You can run this report every time you generate batch predictions or otherwise schedule some checks.

By observing what exactly changes, you can anticipate data and concept drift. In some cases, you might decide to pause your model. For example, if you suspect a significant change or face a data quality issue.