📚 LLM-as-a-Judge: a Complete Guide on Using LLMs for Evaluations. Get your copy

Evidently

Evidently 0.1.46: Evaluating and monitoring data quality for ML models.

Last updated:

January 23, 2025

Published:

April 6, 2022

contents‍

Start testing your AI systems today

Get demo

TL;DR: Meet the new Data Quality report in the Evidently open-source Python library! You can use it to explore your dataset and track feature statistics and behavior changes. It is available both as a visual HTML report or a JSON profile.

What is it?

We are happy to announce a new addition to the Evidently open-source Python library: an interactive report on Data Quality.

The Data Quality report helps explore the dataset and feature behavior and track and debug data quality when the model is in production.

You can generate the report for a single dataset. For example, when you are doing your exploratory data analysis.

It will quickly answer the questions like:‍

How many features of each type do we have?
How many features are (mostly) missing or constant?
How is each of the features distributed?
Which features are strongly correlated?

You can also generate the report for two datasets and contrast the properties of each feature and whole data side by side.

‍It will then help you answer the comparison questions:‍

Are two datasets similar?
If something has changed, what exactly?

You can use the comparison feature to understand different segments in your data. For example, you can contrast data from one geographic region against another. You can also use it to compare older and newer data batches: for example, when evaluating different model runs.

‍The report is available in two formats:

A visual HTML report. You can spin it up in Jupyter notebook or Colab or export it as a separate HTML file.
A JSON or Python dictionary output. It provides a snapshot of metrics you can log or use elsewhere, e.g. to integrate data quality check as a step in an Airflow DAG.

You are reading a blog about an early Evidently release. This functionality has since been improved and simplified. You can read more new available reports and additional features in the documentation.

How is it different from the data drift report?

One might ask, how is it different from the Data Drift report in Evidently?

The data drift report performs statistical tests to detect changes in the feature distributions between the two datasets. It helps visualize distributions but does not go into further detail on feature behavior.

The data quality report looks at the descriptive statistics and helps visualize relationships in the data. Unlike the data drift report, the data quality report can also work for a single dataset.

If you are looking to evaluate the data changes for your production model, you might use both reports together as they complement each other.

How does it work?

To generate the report Evidently needs one or two datasets. If you are working with a notebook, you should prepare them as Pandas DataFrames. If you are using a command-line interface, prepare them as .csv files.

If you use two datasets, the first dataset is "Reference," and the second is "Current." You can also prepare a single dataset and then explicitly state where the rows belong to perform the comparison.

Once you import Evidently and its components, you can spin up your report with just a couple of lines of code:

data_quality_report = Report(metrics=[
    DataQualityPreset(),
])

data_quality_report.run(reference_data=ref_data, current_data=cur_data)
data_quality_report

You might need to specify column mapping to ensure all features are processed correctly. Otherwise, it would work automatically by deriving the feature type from the pandas data type.

‍Pro tip: if you have a lot of data, you might want to apply some sampling strategy or generate the report only for some of the features first.

Let's have a look at what's inside!

Dataset summary

The first table quickly gives an overview of the complete dataset (or two!). You can immediately spot things like a high share missing or constant features.

What's cool here: note "almost missing" and "almost constant" rows. It is often relevant to detect such issues for real-world datasets to sort out features that would be hard to rely on.

Feature overview

Next, you will see a statistical overview and a set of visualizations for each feature. They include descriptive statistics, feature distribution visuals, distribution of the feature in time, and distribution of the feature in relation to the target.

What's cool here:

For each feature type (numerical, categorical, and datetime), the report generates a set of custom visualizations. They highlight what is most relevant for a given feature type.
If you are performing the comparison, it also helps detects the changes quickly. For example, notice the number of new categories for a categorical feature.
The visualization of the feature's relationship with the target helps build intuition on how useful the feature is or detect a target leak.

What's more, each plot is interactive! Evidently uses Plotly on the back end, and you can zoom in and out as needed, or switch between logarithmic and linear scale for a feature distribution, for example.

For example, here is how the summary widget for a numerical feature might look:

Here is the numerical feature distribution in time that highlights the values that belong to the reference and current distribution:

evidently numerical feature distribution in time

The feature by target functionality helps explore the relationship between the feature and the target and its changes between the two datasets. Here is an example of a categorical feature:

Correlations

The report also generates a table summary of pairwise feature correlations and correlation heat maps.

What's cool here:

It explicitly lists the top-5 most correlated numerical and categorical features.
If you perform a comparison, it lists the features where correlation has changed between the reference and current datasets.

This way, you can quickly grasp the properties of your dataset and select the features that need a closer look (or should be excluded from the modeling).

And of course, the visuals:

You can check out the complete documentation for more details and examples.

Can I modify the report?

Of course! All Evidently reports can be customized.

You can mix and match the existing widgets however you like or even add a custom widget. See the documentation.

What about JSON profiles?

Business as usual! You can get the report output as a JSON or Python dictionary.

You can use it however you like. For example, you can generate and log the data quality snapshot for each model run and save it for future evaluation. You can also build a conditional workflow around it: maybe generate an alert or a visual report, for example, if you get a high number of new categorical values for a given feature.

When to use the report

Here are a few ideas on how to use the Data Quality report:

Exploratory data analysis. This report provides many insights that can help even before you build a model. Use it to understand the data and decide if is good enough to start.
Feature selection. You can use the report to quickly sort out the empty and constant (or almost empty and constant) features. You can automate this selection process using the JSON profile, for example, to repeat whenever you retrain the model.
Dataset comparison. You often want to contrast the datasets during modeling, testing, or production evaluation. For example, you might use it to compare data in the train and validation split or in between batches. Use the dashboard for visual comparison and JSON profiles in automatic evaluation.
Data profiling. You can log the JSON profile of the data used in each model run for future evaluation or analysis.
Rule-based data quality monitoring. You can build a conditional workflow if you detect a change in your data properties, e.g., an increase in constant values. In this case, you can rely on Evidently to calculate the metrics and then define the logic around it.
Data documentation. This report can document your data properties for future model governance. For example, you can use it to describe the data used in model training.
Data drift debugging. If you detect data or target drift for your production model, you usually need to drill down into the feature changes. The data quality report can provide additional details for each drifting feature.
Production model debugging. If you are directly monitoring the model quality and notice a decay, you can spin up this report to dig into the details of the data changes.

How can I try it?

Go to Github, pip install evidently, and give it a spin!

If you need any help or want to share your feedback, join our Discord community!

‍For any questions, contact us via hello@evidentlyai.com. That's an early release, so let us know of any bugs! You can also open an issue on Github.