LLM evals + Hacktoberfest = ❤️ Learn how to contribute new LLM evaluation metrics to the open-source Evidently library
Evidently

Evidently 0.4: an open-source ML monitoring dashboard to track all your models

July 21, 2023

TL;DR

We are releasing a new feature: Evidently user interface for ML monitoring. 

  • You can now self-host a dashboard to monitor all your ML models in production. It’s open-source! 
  • The new functionality sits on top of existing Evidently Reports and Test Suites. It uses the same API, metrics, and tests – but allows bringing them all to one place. 
  • You can use it to monitor data quality, data drift, and model quality for models built on tabular or text data, including LLMs. 

Want to start immediately? Here is the Getting Started tutorial.

For more details, read on.

Evidently at a glance

Our mission since the first release remains the same: help run ML in production reliably. Evidently gives visibility into how ML models behave in the wild and helps detect and debug issues, from data drift to segments of low performance.  

All with an open-source approach: extensible, open, and built together with the community. 

We are very humbled to see that Evidently remains the most popular tool in its ML monitoring category. It has been downloaded over 3M times and has over 3700 GitHub stars.

Up until now, Evidently had two main components: Reports and Test Suites. 

Reports help compute and visualize 100+ metrics in data quality, drift, and model performance. You can use them to explore, debug, create ML model cards, and run report-based batch monitoring. There are handy presets to make neat visuals appear with just a couple of lines of code.

Evidently Reports

Test suites perform structured data and ML model quality checks. They verify conditions and show which of them pass or fail. It’s very easy to start – Evidently can automatically infer test conditions from the reference dataset. You can then design a continuous testing framework for your production pipelines, data, and models. 

Evidently Test Suites

With these two, you can flexibly implement ML monitoring. Here are some frequent patterns we saw users adopt:

  • Batch monitoring jobs using an orchestrator like Airflow. Some simply store the HTML or JSON reports generated on a cadence. Some added apps on top, like Streamlit.
  • On-demand live reports, for example, by connecting Evidently with FastAPI
  • Pushing Evidently metrics to external visualization layers like Grafana or other BI tools.  

However, there was no native way to track Evidently metrics and test results over time. Nor was it possible to see all the models at once. Each time, you got a standalone snapshot that captured the current state of affairs. You had to design the rest of the workflow. 

We are thrilled to present a new Evidently feature that finally solves it – a brand-new visualization layer for ML monitoring. You can now run Evidently as a web application and bring all your models to one place.

New ML monitoring app 

This centralized monitoring dashboard is the new, third component in Evidently.

You can use it to trace how ML models perform over time and bring all types of checks together. It is open-source, self-hostable, and tightly connected to the rest of Evidently.

Evidently ML Monitoring

You can pick from 100+ metrics in Evidently and visualize them over time. Here are some ideas for dashboards you can design.

  • Data and prediction drift. You can monitor how your ML model works before you get ground truth labels: by keeping tabs on the data and prediction drift, prediction distribution, and number of model calls. 
  • Data quality. You can also keep track of various feature statistics and data quality. For example, you can monitor if the incoming data contains nulls or duplicates and if features stay within range.
  • Model quality. Once you get the ground truth labels or actuals, you can evaluate and track model performance and the behavior of your target variable.
  • NLP and LLM models. If you work with text-based models, we also got you covered! You can track properties of the text data – such as text length, sentiment, mentions of specific words, etc.

Here is an example: you can launch it as a demo project in the Getting Started tutorial in just a minute!

Evidently ML monitoring dashboard example

How it works

Let's take a quick look into the new functionality. For details, head to the Getting Started tutorial and the new section in the docs.  

1. Design your monitoring

First, you must define the monitoring workflow: what to evaluate and when. Depending on the model deployment scenario, you can collect the metrics at different stages of your production pipeline. You can log them in real-time, asynchronously, or both.

For example, you might run a batch prediction model daily. Every time you generate the predictions, you can capture data quality metrics and evaluate prediction drift. 

Once you get the labels that arrive with a delay, you can compute the true model quality and log it to update the performance dashboard.

Evidently ML Monitoring workflow

2. Capture snapshots

You must use the Evidently Python package to collect the information about your models and data. Install it from PyPi or Conda. 

If you already use Evidently, don't worry! This new component does not break the existing workflows but expands them. You can continue with the usual Evidently Reports and Test Suites. But, you can also now use them to capture data for the Evidently ML Monitoring. To do that, you will need to add an extra line of code to export the output in a new format. 

If you are new to Evidently, worry not! Bottom line: you can define what to log with a couple of lines of Python code.

my_snapshot = Report(
	metrics=[
	DataQualityPreset()
	],
metadata={"type": "data quality"}, tags=tags,
)

This way, you can capture a "snapshot." Snapshots are rich JSON objects with data and model summaries. Think of it as a singular log in the Evidently universe! You can collect as much or little data as you want. You can go granular and pick individual metrics. Or, you can use presets to capture pre-defined summaries.  

You must put the snapshots in object storage. That's it! Now you have a data source that Evidently UI can work with. 

3. Get a dashboard

Here comes the visualization layer! You can launch the Evidently user interface over the captured logs. You must also create a corresponding workspace and project for your model. 

To launch the Evidently service, run this:

evidently ui --workspace . /workspace --port 8080

Evidently will parse the data from the snapshots and visualize the metrics. If you previously worked with Evidently, think of this new functionality as a way to automatically extract any data point from multiple Reports or Test Suites and show how it changes in time. 

Data drift monitoring dashboard

To be fair, the monitoring dashboards do not exactly appear out of the box – yet. Since Evidently Reports and Test Suites capture many things at once, you must explicitly pick which of those metrics to visualize. 

You can also explore and access all individual Reports and Test Suites for previous periods – for example, your weekly or daily snapshots. We even added a way for you to download the existing Reports and Test Suites to share them with others.

Download reports in HTML or JSON

And, you can easily create multiple projects for different models you have! 

What is cool about it?

Evidently now has 3 components to cover the complete ML observability workflow. 

You use Evidently even before you deploy the model to run ad hoc checks and explorations. In production, you can start with lightweight monitoring using Reports or Test Suites. Once ready, switch to the new ML monitoring setup with a live dashboard. When something is wrong, quickly jump to debug in the familiar notebook environment. All within one tool! 

You can use components together or individually and increase complexity as needed.

Evidently for the complete ML lifecycle

Consistent evaluations. From validation to production, you keep the same metric definitions. For each model you will count nulls, drift, or accuracy the same way. And, there is only one API to learn!  

Simplicity first. Ease of use and dead simple API has always been our top priority. Complicating things is easy – making them simple is hard. Evidently has many in-built presets, and we’ll expand ML monitoring functionality with the same approach. Monitoring might not yet be perfectly smooth (this is the very first release!) but we aim for a frictionless experience where you can customize everything, yet it’s possible to be very hands-off.

Open-source. Evidently monitoring UI is available under Apache 2.0 just like the rest of Evidently. You can run it in your environment without sending the data elsewhere. There are no limits to the number of models or datasets you can run it on.

Q & A 

How can I try it?

“Just show me all the source code!” -> Head to GitHub

“I want a step-by-step explanation” -> The Getting Started tutorial is here.

Why can’t I just use [another dashboard tool]? 

You still can! Export the metrics however you like. The Evidently Monitoring UI is meant for those who want a coherent experience. You get a complete ML observability stack with less effort by using Evidently for all the steps: production logging, visualization, and debugging. The components play well together. No need for glue code and jumping between tools. 

What if I have a real-time model? 

You can either run monitoring jobs over your model predictions logs to compute Evidently snapshots or log directly from your ML prediction service. We’ll be adding more integration examples soon. Stay tuned! 

Can I add a custom metric? 

Yes! You can add custom metrics the same way as you do in the Evidently Reports and Test Suites. As long as the metric gets inside the Evidently snapshot, you can visualize it on the ML monitoring dashboard.  

How can I migrate from existing Reports or Test Suites? 

You must add a new export format – instead of the usual HTML or JSON, you must use the .save method to generate an Evidently snapshot. You must also configure the project and workspace. Head to the Getting Started tutorial for details!  

Can you run it for me? 

Not yet, but it will be possible! We are working on the Evidently Cloud: a hosted version of the tool for those who do not want to spend time implementing and maintaining the monitoring dashboard on their own. Sign up to the waitlist to be the first to know! 

[fs-toc-omit] Support Evidently
Do you like this new release? Star Evidently on GitHub to contribute back! This helps us continue creating free, open-source tools for the community. 

⭐️ Star on GitHub ⟶

What’s next?

This is the very first release of the Evidently Monitoring UI. We’ll continue expanding the functionality. Here is what comes next.

  • More example dashboards. For text data and different model types.
  • Example integrations, including with real-time ML service. 
  • Better UI experience. Right now, you can only generate the dashboards as code. This is great for technical teams and CI/CD, but we also want to allow adding and deleting new panels directly from the user interface.
  • Add pre-built tabs. Right now, you have to define each dashboard component. We want to add a few predefined plot combinations.
  • Alerting integrations. Currently, the alerting workflow is external. We want to add in-built functionality to allow sending notifications.

Which features should we add, and which examples should we prioritize? Jump on the Discord community to share or open a GitHub issue.

You might also like

Get Started with AI Observability

Book a personalized 1:1 demo with our team or sign up for a free account.
Icon
No credit card required
By clicking “Accept”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.