We are excited to announce our latest release!
TL;DR: You can use Evidently together with Prometheus and Grafana to set up live monitoring dashboards. We created an integration example for Data Drift monitoring. You can easily configure it to use with your existing ML service.
Why do I need this?
To monitor the performance of production machine learning models in real-time.
We hope there is no need to explain why monitor ML in the first place! But the question of "how exactly to do it" often remains open.
Many operational teams already use Grafana to track the performance of their software systems. We made it easy to monitor the health of machine learning models in a familiar interface and with Evidently on the backend.
With this integration, you get live dashboards for your models and can build alerting workflows.
We started with a Data Drift example, but it will work in the same way for all other Evidently Reports.
What is cool about it?
First, it is a fully open-source ML monitoring solution ready for production.
We use Evidently to collect and calculate metrics, Prometheus to store them, and Grafana to display and alert. Everything is open-source.
We also made a very convenient example to start with. Stitching different tools is not always easy. So we packed it all in Docker containers to make sure everything works smoothly. We provide a docker-compose file that launches a complete setup out of the box.
Just build the example locally and try it out. Here are all the instructions on how to build an Evidently dashboard in Grafana.
You can take it, connect it to your production ML service instead of our demo source, and host the service wherever you like.
Second, we implemented all the necessary monitoring logic inside it.
If you used Evidently, you know it provides nicely combined metric sets—for example, for data drift, prediction drift, or regression model performance.
However, to build a real-time monitoring system, you need an extra layer of logic—and code—to define how the service should calculate these metrics on top of a production data stream.
How many objects do we need to get to calculate a new metric value? How frequently should the service send requests? Is our reference dataset static or moving?
We implemented this monitoring service component in Evidently. It abstracts all this complexity into a single configuration.
You can quickly edit it to set up the logging process for your model metrics:
There are still decisions for you to make. Every model is custom, and we cannot guess, e.g., the exact window size you need.
But we want to make it easy to set up monitoring without writing boilerplate code for something like the moving windows.
Third, we directly "translate" Evidently reports into the Grafana dashboards.
If you log the metrics to Prometheus, it does not instantly create a well-designed dashboard. You have to choose and configure plots from the Grafana interface for each metric. Things can get messy on occasion.
Not with our examples!
We pre-built a Grafana dashboard for you. The relevant metrics are grouped, and all the plots are well-organized and easy to read.
Right now, it is available for the Data Drift report only, but we will do the same for other Evidently reports soon.
How does the integration work?
Here is a high-level overview of the architecture:
The Evidently service first reads the model logs. In the demo example, the input features and predictions are stored in a .csv file. In production use, you can replace the data source for your actual model logs or have your service send a POST request to the system.
The Evidently monitoring service follows the defined configuration. It receives the input data from a production ML service, and once enough new observations are collected, it calculates the metrics you need. It uses the Analyzers from the core Evidently library to define the way tests and metrics are estimated.
The Evidently service then exposes a Prometheus endpoint. Prometheus will check for the new metrics from time to time and log them to the database.
Prometheus is then used as a data source to Grafana. That is a true-and-tested combination. Grafana provides a convenient interface and a way to set up alerting workflows.
If you take our example as a starting point, it comes with a configured Grafana dashboard for the Data Drift. You can then adjust it to your liking!
Why is just Prometheus and Grafana not enough?
Both are great tools that many pick to create their home-grown ML monitoring solution. They provide a convenient base to build upon. Prometheus provides time series storage and a query language. Grafana—the visualization and alerting functionality.
But there are a few more steps in between the production model server and a neat monitoring dashboard.
If you were to implement an ML monitoring system from scratch, you need to:
- Define and choose the metrics and statistical tests. How exactly do I monitor drift? Is it a KS test? Or an Anderson-Darling? Is it the same one for numerical and categorical features?
- Implement the metrics. If you do it by hand, you might spend quite some time constructing a giant PromQL query for a statistical test. It can be hard to maintain or edit and share between different models.
- Build the monitoring logic and instrument your service. Traditional software monitoring doesn't have to do with things like moving windows or using external baseline references. You'd need to code this logic in a custom way to set up the logging.
In our integration, Evidently abstracts all this monitoring logic and provides the metrics in a Prometheus-friendly format.
You can think of it as a metrics calculation layer that you can easily modify. It provides great defaults due to a pre-built set of metrics and gives a convenient route to include custom ones.
As we are actively developing the tool, this Grafana integration will inherit new Evidently reports, different metrics, and statistical tests that we expect to add in the future.
Using a standard library on the backend makes it easier to maintain, control, and unify your monitoring across different models. It will scale as the number of deployments grows.
How can I see an example?
All the instructions on how to launch and change the example are on GitHub.
How can I use it for my production service?
If you want to take the Data Drift example and use it with your production model, here is what you need to do.
First, copy our example from GitHub.
Second, change the data source. If you have an operational ML service, make it send the production data through a POST request to Evidently. If you store your predictions in a database, you can set Evidently monitoring service to read it from there or from a file.
Third, edit the monitoring configuration to fit your use case. For example, specify which features are numerical or categorical, define the size of the monitoring window or frequency of requests.
Fourth and optional, edit the metrics and the Grafana dashboard.
The sample Evidently Grafana dashboard uses the default statistical tests from the Data Drift report and shows:
- Overall dataset drift, defined by the number of drifting features and the confidence level
- Share of drifted features
- Number of drifted features
- Total number of features
- P-values of the statistical test for individual features
This does not include everything we have in our own Evidently dashboards.
For example, we did not add the distribution plots. Our reasoning is that you probably don't need to look at them all the time for monitoring purposes. What you want is to get an alert when the drift is detected.
If something is wrong, and you go to the debugging mode, you can spin up a separate Evidently report to investigate.
That said, you might still prefer to add the distribution histograms directly to the Grafana dashboard, or maybe pick a different statistical test.
You can do that with a few more lines of code.
In some cases, you can simply choose the metric we already calculate in Evidently and add it to the monitoring service. In others, you would need to implement a custom metric and then pass it ahead. Then, you can add an extra Grafana plot to display it.
More detailed instructions are on the Evidently GitHub page.
What about batch models?
Nothing stops you from using Grafana dashboards for batch models. You simply won't update them as frequently.
However, you might prefer to pick a different tool in this case. For starters, you can simply schedule Evidently HTML reports or JSON summaries using a cron job or a tool like Airflow. We have a handy integration there, too.
You can also log the Evidently metrics with MLflow, which suits well for batch workflows.
Are you using other tools? We welcome community-driven integrations and examples. Join our Discord to share which tools you are using!
We are working in three directions.
First, we will "translate" other Evidently reports into the Grafana dashboards. Expect the Prediction Drift and Regression Model Performance next.
Then, we are implementing more robust logic for the monitoring service. For example, we want to enable moving reference and cover other similar scenarios.
Last but not least, the Evidently library itself is in active development. We will be adding new metrics and statistical tests, meaning more options to choose from—whether you prefer to use it on our interface or add it to the Grafana dashboard.
Do you like the integration? Give us a star on Github, and send some feedback our way.
If you have more questions and want to discuss the roadmap, join our Discord community to chat about it!
[fs-toc-omit] Want to stay in the loop?
Sign up to the User newsletter to get updates on new features, integrations and code tutorials. No spam, just good old release notes.
For any questions, contact us via firstname.lastname@example.org. That's an early release, so let us know of any bugs! You can also open an issue on Github.