In this code tutorial, you will learn how to create interactive visual ML model cards to document your models and data using Evidently, an open-source Python library.
Code example: if you prefer to head straight to the code, open this example Jupyter notebook.
An ML model card is a short document that provides key information about a machine learning model. It serves as a concise snapshot, explaining the purpose of the model, its target audience, evaluation metrics, and other details.
The goal of an ML model card is to improve collaboration and increase transparency. It serves as a single source of truth about a production ML model.
As a good practice, you can create a model card for every production ML model. You can publish and share this document with data scientists and ML engineers that might use or maintain the model, as well as product managers and business stakeholders.
[fs-toc-omit]Want to learn more about ML monitoring?
Sign up for our Open-source ML observability course. Designed for data scientists and ML engineers. Yes, it's free!
Save my seat ⟶
In this tutorial, you will learn how to create a model card for your ML model using Evidently, an open-source Python library to evaluate, test and monitor ML models.
You will work with a toy dataset and go through the following steps:
- Review the ML model card contents
- Populate the ML model card following the template
- Customize the ML model card
You will generate the ML model card using Jupyter notebook or Colab and export it as an HTML file.
The tutorial requires minimal Python knowledge. The most crucial part is identifying the content you will include in the card: the tool will do the rest.
✏️ Design the model card
First, let’s understand what to include in the model card. This depends on your use case, audience, and goals.
The Model Cards for Model Reporting paper that introduced the concept of Model Cards also suggested a possible structure. This includes the following components:
- Model Details (date, version, type, model owner, etc.)
- Intended User (primary uses, users, and out-of-scope applications)
- Factors (e.g., relevant demographic groups, technical attributes, etc.)
- Evaluation data
- Training data
- Quantitative Analyses
- Ethical Considerations
- Caveats and Recommendations
You can take this structure as a guideline and adapt it based on the type of model, risks, and who you expect to work with the model card. Ideally, a model card should strike a balance: be approachable for product and business stakeholders while correctly capturing crucial technical information.
If you want to get more inspiration, here are a few useful links:
- The original paper: Model Cards for Model Reporting.
- A case study of how Wayflyer creates ML model cards.
- Example model cards and templates by Salesforce, Wikimedia, NVIDIA, OpenAI, Stable Diffusion
🔢 Prepare the demo data
Say, you want to generate the model card for a classification model. It will be aimed for internal use in a company. We prepared a simple template, loosely following the original paper and examples seen in the industry. Let’s take a look at it!
To use the template, you need to import and prepare toy data.
Follow the example notebook to recreate the steps.
First, install Evidently. Use the Python package manager to install it in your environment. If you are working in Colab, run !pip install. In the Jupyter notebook, you should also install nbextension. Check out the instructions for your environment.
You must also import a few other libraries and Evidently components that you will use in the tutorial: Report and Metrics.
Next, prepare the data. In the demo, we used the Breast Cancer Wisconsin Diagnostic Dataset, available in sklearn. You must split the dataset into two parts: one to train a simple model and another to evaluate it.
Let’s take a look at the sample of the data. This structured tabular dataset is suitable for solving a binary probabilistic classification task.
Run the steps in the notebook to train a simple model. For this exercise, the details of the model training are not relevant. You only need to arrive at a point where you have two datasets, each containing model features, predictions, and true labels.
You want to have two, to document both training and evaluation datasets in the model card.
Column mapping. In this example, our dataset has a simple structure (e.g., the “target” is called “target”), and you can directly proceed to analysis. In other cases, you might need a ColumnMapping object to help Evidently process the input data correctly.
🧪 Run the template
Now, let’s run the template to understand how it works! At this step, you will generate the “dummy” model card without populating or modifying the contents.
The model card includes both text fields and visualizations. We added four text fields to the template:
- Model details
- Training dataset
- Model evaluation
For now, each text field contains a prompt that you will fill in later:
We also chose several visual plots and metrics:
- Distribution of classes predicted by the model
- Statistics of the training dataset, such as the number of rows, nulls, etc.
- Classification quality metrics, including precision, recall, ROC AUC, etc.
- Confusion matrix
Under the hood, Evidently has dozens of pre-built plots and metrics, so we selected a few that might go well together.
To generate the card, you need to create a new Report object and then list all the Metrics and Comments you want to include:
This will generate the template model card directly in Colab or Jupyter notebook. Here is how the training section of the model card looks like:
You can also export it as an HTML and open it in the browser.
🚀 Create your card
Now, let’s populate the text fields and explore how to adjust the model card to your needs.
Let’s start with comments. To add your text to the model card, copy and edit the existing text comments with prompts. Use markdown to structure the text.
Modify the selection of plots. You can choose the plots available in the Evidently library. You can browse the list of all Metrics or example notebooks (they include pre-rendered Colabs that you can quickly go through to select suitable visualizations). You can also implement a custom Metric, though this will require some knowledge of Plotly.
Here are some examples of changes we made:
Added a plot to show dataset correlations.
Added plots to show the stats for specific features we might want to highlight ("mean radius," "mean symmetry"). This is an arbitrary selection: in practice, you might want to highlight the most important model features, for example.
Added a couple of new plots related to the quality of probabilistic classification: distribution of predicted probabilities and ROC Curve.
Added a table showing the alternative classification decision threshold and a comment about possibly modifying it. Consider including similar notes for business stakeholders when annotating some findings or metrics.
Here is the resulting code to generate an updated model card:
🛠 Adapt it to your data
Do you want to create a similar model card for your data? Here are the general steps you should follow:
- Prepare the dataset or two datasets that contain model input features, predictions, and true labels. If you only want to document the data, you can pass the input features and not generate the metrics related to model quality.
- Create a column mapping object to map the data structure to the Evidently format.
- Design the contents of the model card using the custom text comment fields and available Evidently Metrics. You can use the list of all Metrics or the example notebook to select what to include. There are corresponding metrics for both Classification and Regression models.
- Generate and save the Reports as HTML files to document your ML model cards.
What else can you do?
Generating visual Reports is only one of the features of Evidently. You can also use it to:
- To log the results of the calculations and dataset metrics. You can export them as JSON or a Python dictionary.
- To create Test Suites that perform structured checks on data quality, data drift, and model quality. To explore this functionality, start with the Getting Started tutorial.
- To implement monitoring for production ML models. For example, you can integrate it into your prediction pipelines using tools like Airflow or Prefect or host dashboards.
Did you enjoy the blog? Star Evidently on GitHub to contribute back! This helps us continue creating free, open-source tools and content for the community.
⭐️ Star on GitHub ⟶