Our second report is released! Now, you can use Evidently to explore the changes in your target function and model predictions.
What is it?
The Data Drift report we released earlier helps you detect the change in model features. Similarly, you can look at how your model predictions and target evolve.
When you know the ground truth (i.e. actual labels or values), the report helps you explore the change and understand how to adapt. If it is not available, you can use this new report to detect model decay in advance.
Basically, it is a way to answer the questions quickly:
- Does the model target behave similarly to the past period?
- Do my model predictions still look the same?
If anything changes, you can further explore:
- What exactly has changed?
- Which features are likely to contribute to the shift?
- Are there any new data segments to pay attention to?
How does it work?
Once again, you need to prepare two Pandas DataFrames. The first one is your Reference dataset. Usually, this is the data you use to train your model. The second is the most recent Production data.
These data frames can include only the predicted values, only actual values, or both.
Based on what you predict, you can choose between two report types:
- NumTargetDrift works for any problem statement where you predict a numerical target. It can be a regression, a probabilistic classification or a ranking task.
- CatTargetDrift applies to cases where your target is categorical. For example, a binary or a multi-class classification.
Select the report type, and all plots and insights will be served!
Numerical target drift
For Numerical Target Drift, your report will first show the comparison of target distributions:
If no drift is detected, most likely you are good to go. Nothing major is going on.
In this example, the tool has detected a drift (based on the two-sample Kolmogorov-Smirnov test at a 0.95 confidence level).
When this happens, you'd want to see the main changes quickly. The tool does just that. It plots the target values from Reference and Production datasets for visual comparison. We use index or DateTime, if available.
Next, the report shows the change in correlations between individual features and the target. We use the Pearson correlation coefficient. Even though not all relationships are linear, it can be helpful. A significant shift in a correlation coefficient can point to the source of change.
To dig deeper, we give an overview of the behavior of each feature. This helps us understand if they can explain the target shift.
If you have a reasonable number of features, you might want to go through them one by one.
In other cases, you might look:
- only at your most important features;
- features with a significant correlation change with the target;
- features where a data drift was detected.
This way, you can visually discover new data segments in specific features, and see how they associate with the target values.
For example, in a Boston house pricing dataset, we can see a new segment with values of TAX above 600 but the low value of the target (house price).
When we notice the change, we can interpret why it is happening, using our domain knowledge. Sometimes, there is an actual real-world change due to data and concept drift.
In other cases, what looks like a "new segment" can be a result of data quality issues, like measurement errors.
Categorical target drift
For Categorical Target Drift, the report looks slightly different.
First, we visually compare the target distributions and perform the test to detect drift. Since the target is categorical, we use the chi-squared test.
For a classification problem with three classes, it can look like this.
This is an extreme case of the target shift. In our Reference dataset, we most frequently observe Target class = 0. In the Production dataset, the most popular Target class = 2.
When this drift is detected, we can again dig deeper into the behavior of the individual features. In this example, we took the famous Iris dataset, so the feature list is rather short.
Visualisations demonstrate the feature values we tend to observe for different target classes. And, how it changes over time.
In this case, we can quickly understand that we deal with the classic example of data drift. In Production, we face a new input distribution, different from the Reference dataset.
The "old" classes we already know behave the same way. But a new prevalent class comes with a new feature space. For several features, we see new values visibly aligned with the label "2". In the image above, these are the observations with the petal width more than 2cm.
Training the model on this new dataset will likely help solve the issue.
When should I use this report?
Here are our suggestions on when to use it—best combined with our Data Drift report.
1. Before model retraining.
Before feeding fresh data into the model, you might want to verify whether it even makes sense.
When nothing changes, most likely an update will not improve the performance.
When things change too much, blind retraining might not be enough.
If you observe specific changes in the data or target function, you might want to:
- add or drop certain features;
- perform additional feature engineering;
- introduce new business rules or make ensemble models to account for new segments.
The report will help you decide if certain features or segments need attention.
2. When you are debugging the model decay.
If you already observe a drop in performance, this report can help you with debugging. It quickly shows you what has changed so that you know how to explain and address it.
3. When you are flying blind, and no ground truth is available.
Not having immediate feedback is no reason to ignore your model. You can run this report every time you generate batch predictions or otherwise schedule some checks.
By observing what exactly changes, you can anticipate data and concept drift. In some cases, you might decide to pause your model. For example, if you suspect a significant change or face a data quality issue.
How can I try it?
Go to Github, and explore the tool in action using sample notebooks. We updated our demo notebooks for the Iris dataset (Jupyter notebook, Colab), California housing dataset (Jupyter notebook, Colab), and breast cancer dataset (Jupyter notebook, Colab).
If you like it, spread the word.
If you have any questions, contact us on firstname.lastname@example.org. That's an early release, so send any bugs that way, or open an issue on Github.
Want to stay in the loop?