We are excited to announce our first release. You can now use Evidently open-source python package to estimate and explore data drift for machine learning models.
It helps you quickly understand: did my data change, and if yes, where?
How does it work?
As an interactive report right in the Jupyter notebook.
You need to prepare two datasets. One is the reference: you will use it as a baseline for comparison. Pick something you consider a good example, and where your model performed reliably. It can be your training or validation data. Or, production data from some past period.
The second dataset is the most recent (current) production data you want to evaluate.
Import your data as a Pandas DataFrame. You can have two DataFrames, or a single one where you explicitly select which rows belong to the reference, and which to the production data.
Then, you can use Evidently to generate an interactive report like this:
We show the drifting features first. Using different statistical tests and metrics, Evidently makes a drift/no drift decision for each feature individually.
You might want to explore them all or look into your key drivers.
By clicking on each feature, you can explore the values mapped in a plot. The green area covers one standard deviation from the mean, as seen in the reference dataset.
Or, you can zoom on distributions to understand what has changed:
You are reading a blog about the first Evidently release. The tool has evolved since then! It supports various ML monitoring metrics and architectures. Check out the current documentation.
Why is it important?
We wrote a whole blog about Data and Concept Drift. In short, things change, and this can break your models. Detecting this is key to maintaining good performance.
If there are data quality issues, Evidently will also pick this up. When your data goes missing or features break, this usually shows in data distributions. We will soon add more fun reports to explore features and analyze data quality. But this one can already serve as a proxy.
What is cool about it?
We implemented different statistical tests and drift detection methods, so you don't need to think them through. We know these are quite cumbersome to write, and there is quite some chance to mess it up. Solved.
By default for small datasets, Evidently will use a two-sample Kolmogorov-Smirnov test for numerical features and the chi-squared test for categorical features, both at 0.95 confidence level. To detect drift in larger datasets, it will use other metrics like Wasserstein distance. You can select any of the other methods available in the library.
The visuals are helpful, and would otherwise take considerable time to code in Plotly or Matplotlib. Here, each feature gets an interactive plot you can explore to understand its behavior.
What's more, you can share this report around as an HTM file. If you ever had a back-and-forth exchange of screenshots with another department, you will like this one:
Finally, it is dead simple to install and use. No new tool to learn, no service to maintain. Just open your notebook and try it out!
You can also export the results as JSON or Python dictionary to integrate easily with your prediction pipelines.
When should I use it?
Of course, when your model is in production. But also before.
Here are a few ideas on how you can use the data drift tool:
- Support your model maintenance. Understand when it is time to retrain your model, or which features to drop when they are too volatile.
- Before acting on model predictions. Validate that your input data is from the same distribution, and you are not feeding anything outrageously different into your model.
- When debugging model decay. If the model quality dropped, use the tool to explore where the change comes from.
- In A/B test or trial use. Detect training-serving skew and get better context to interpret test results.
- Before deployment. Understand drift in the offline environment. Explore past shifts in the data to define your future retraining needs and monitoring strategy.
- To find useful features when building a model. Get creative: you can also use the tool to compare feature distributions in your positive and negative class. This will quickly surface the best discriminants.
How can I try it?
[fs-toc-omit] Want to stay in the loop?
Sign up to the User newsletter to get updates on new features, integrations and code tutorials. No spam, just good old release notes.
If you have any questions or thoughts, write to us on firstname.lastname@example.org. That's an early release, so send any bugs that way, or open an issue on Github.