Are you looking to build an end-to-end ML monitoring pipeline? Or have you already started using Evidently but now look at how to add alerting to it?
One option is using Evidently open-source Python library and AWS Simple Email Service (SES). This tutorial will explain how to implement Evidently checks as part of an ML pipeline and send email notifications based on a defined condition.
Code example: if you prefer to head straight to the code, open this example folder on GitHub.
This blog and code example was contributed by the Evidently user. Huge thanks to Marcello Victorino for sharing his experience with the community!
If you are new to Evidently, AWS SES, and ML monitoring, here is a quick recap.
The lifecycle of an ML solution extends well beyond “being deployed to production.” A whole new discipline of MLOps emerged to help streamline ML processes across all stages. Monitoring is an essential part: you must ensure that an ML solution, once deemed satisfactory after training, continues to perform as expected.
Many things can go wrong in production. For example, you can face data quality issues, like missing or incorrect values fed into your model. Input data can drift due to changes in the environment. Ultimately, both can lead to model quality decay.
You can introduce batch checks as part of a prediction pipeline to detect issues on time. For example, whenever you get a new batch of data, you can compare it against the previous batch or validation data to see if the data distributions remain similar.
In many instances, you also want to proactively notify stakeholders so that they can further investigate what is going on and decide on an appropriate response.
Evidently open-source Python library helps evaluate, test, and monitor ML models from validation to production.
Evidently has the Test Suite functionality that comes with many pre-built checks that you can package together and run as part of a pipeline. For example, you can combine various tests on data schema, feature ranges, and distribution drift.
To perform the comparison, you must pass one or two DataFrames and specify the tests you want to run. After completing a comparison, Evidently will return a summary of results, showing which tests passed and which failed.
You can visualize results in Jupyter Notebook, as an HTML or export them as JSON or Python dictionary.
AWS SES is a cloud email service provider.
You can integrate it into any application to bulk send emails. It supports a variety of deployments, including dedicated, shared, or owned IP addresses. You can check sender statistics and a deliverability dashboard.
This is a paid tool, but it also includes a free tier.
[fs-toc-omit]How the integration works
Out of the box, Evidently provides many checks to evaluate data quality, drift, and model performance. It has a simple API and pre-built visualizations. This way, Evidently solves the detection of ML and data issues and helps with troubleshooting.
However, since it is a Python library, Evidently does not provide alerting. It returns the test output as a JSON, Python dictionary, or HTML. Then, it is up to the developer to decide how to embed it in the workflow.
To close the loop, you might want to proactively notify that an issue has been detected as soon as it is detected. One way to do that is to integrate an email notification that sends out a report per email.
This is what we will show in this tutorial.
In this tutorial, you will learn how to send email notifications using AWS SES with HTML reports on data and ML checks generated by Evidently.
- You have basic knowledge of Python.
- You went through the Evidently Get Started Tutorial and know how to run Test Suites.
- You are familiar with Amazon Web Services and have an AWS account.
Here is what we will cover:
- How to create simple checks using Evidently to evaluate data drift in the ML pipeline.
- How to use AWS SES to send an email alert to a list of recipients based on the defined condition.
The idea is to send email alerts on potential ML model issues to data scientists or machine learning engineers. This way, they can proactively investigate and debug the situation and decide on the appropriate response. For example, retrain the model with new data, accounting for detected drift.
This tutorial is primarily focused on solving the alerting part. If you want more examples of creating Test Suites and implementing them as part of the pipeline, you can check the Evidently documentation.
To see each step in detail, you can follow the example repository.
1. Design the data or ML monitoring check
First, you need to design the checks you want to run.
To start, you can use some of the pre-built Evidently test presets and default test conditions. In this case, you can simply pass so-called “reference” data. This can be the data used during model validation, model training, or some past period. Evidently will learn the shape of the reference data and automatically generate checks using different heuristics.
In the GitHub repository, we included a Jupyter Notebook showing how to perform feature distribution drift checks on a toy dataset. In this case, we perform data drift checks using Population Stability Index as a drift detection method.
You can customize the Test Suite contents to include any other checks on data quality, drift, or model quality.
In our example, we also define a separate helper function, get_evidently_html, to save any Evidently Test/Report as a fully rendered HTML file.
2. Set up AWS SES
Now, you need to set up the AWS SES service. You can explore the requirements for setting up and using SES in the AWS Documentation.
You must create an AWS account and have a verified email address as SENDER.
AWS has a generous free tier, which as of the day of publication, includes up to 1 thousand emails per month. You can check their pricing.
Note: It is also possible to use Amazon SNS (Simple Notification Service) to send push-based communication. It is a Pub-Sub service that allows sending notifications through various channels, including email. However, it has fewer customization options: you cannot format the email, use markdown or HTML tags, or send attachments. It also requires an additional step with setting up SNS Topic. For this reason, we decided to keep the focus on the SES example since it is simpler to use and customize.
3. Define alerting condition
Next, you need to set the conditions defining when to get the email alert.
In our example, we specify that if any tests in the test suite fail, we want to know about it!
We get the Evidently test results as a Python dictionary and then extract the information about the failed tests:
If any of the checks fail, we initiate the alert.
Note: there are various ways to orchestrate the Evidently checks. You can execute a Python script, plug it as a step into any data pipeline or use a framework like Metaflow. For example, you can use Metaflow to trigger a MonitoringFlow daily after a new PredictionFlow is executed. This makes it easy to retrieve the new DataFrame for comparison: the two are stored as artifacts in the TrainingFlow vs. PredictionFlow. Additionally, it is possible to use the Metaflow-HTML plugin to store the rendered Evidently HTML report (see this Integration example for further details).
You can implement similar logic using other tools, for example, by orchestrating your monitoring jobs using a tool like Airflow (integration example) and capturing the metrics with MLFlow (integration example).
This example is generic: we simply assume that you perform a check periodically, for example, once per day, using your preferred method.
4. Send the email
Now, when any checks fail, we want to email about it! There are a few options for how we can implement email notifications. You can browse all the examples in the repository.
4.1. Basic: email with the link
The simplest option is to send an email that contains the link to the location (accessible to the email recipient) with the report to look at.
For example, we can store the Evidently HTML output as an artifact in an S3 bucket and then include a link to this location. You can also store the report in any other shared storage folder.
Note: As an example, if you use a tool like Metaflow, you can store the artifact in S3. The Metaflow UI makes it easy to access/retrieve the artifacts for a given Flow execution.
To format the basic email, we created a send_email_basic function. This example builds upon the AWS documentation code example that shows how to send a formatted email.
You can find the exact script for sending this simple email in the tools folder in the repository. It specifies the email with HTML format and content that includes the link.
Here is how we execute this function in case the alert is detected to send the email to a defined recipient list:
Here is how the result looks:
To see which specific tests failed, you need to follow the link.
However, it might be convenient to attach the report directly to the email. Enter the second option!
4.2 Attachment: email with the HTML report
In this example, we build upon the “Send raw email” feature (see the related AWS documentation).
We will use the MIME protocol that allows email messages to contain one or more attachments. You can find the exact script in the tools folder in the repository.
In this case, we need to prepare the HTML file to become the attachment to the email.
There are a couple of limitations to consider:
- The MIME protocol requires the file attachment to be provided as binary, so it can apply the `base64` encoding. This is abstracted away with the helper function get_evidently_html.
- The attachment should be no larger than 10 MB. We can overcome this by either compressing the HTML file or sampling the data before generating the HTML report.
Here is how we execute this function in case the alert is detected to send the email with the attachment:
Here is the result. Note that in addition to the link, there is also a file attached to the email:
This email serves the purpose but looks fairly simple. We can make it much nicer by adding a template.
4.3 Formatted email: add a template
In this example, we use the same approach as above but:
- Add some CSS styling that is specifically compatible with email. We recommend using an existing open-source email theme (here is the template we used).
- Use JINJA for templating (substitute string placeholders), combining the default message, HTML tags, and dynamic values.
Now, the resulting email looks much nicer!
The button would lead to the same destination where the HTML report is accessible.
Following the same logic, you can adapt the example to your use case:
- Define a different composition of Tests to run
- Customize the email contents
- Customize the email design