📚 LLM-as-a-Judge: a Complete Guide on Using LLMs for Evaluations. Get your copy

Tutorials

A simple way to create ML Model Cards in Python

Last updated:

April 9, 2025

Published:

June 15, 2023

contents‍

Start testing your AI systems today

Get demo

In this code tutorial, you will learn how to create interactive visual ML model cards to document your models and data using Evidently, an open-source Python library.

Code example: if you prefer to head straight to the code, open this example Jupyter notebook.

⚠️ Disclaimer:
This example uses the Evidently API as available in version 0.6.7 or lower. Please ensure you are using the correct version when running this example. For updated and new examples, visit our documentation.

Introduction

An ML model card is a short document that provides key information about a machine learning model. It serves as a concise snapshot, explaining the purpose of the model, its target audience, evaluation metrics, and other details.

The goal of an ML model card is to improve collaboration and increase transparency. It serves as a single source of truth about a production ML model.

As a good practice, you can create a model card for every production ML model. You can publish and share this document with data scientists and ML engineers that might use or maintain the model, as well as product managers and business stakeholders.

[fs-toc-omit]Get started with AI observability

Try our open-source library with over 25 million downloads, or sign up to Evidently Cloud to run no-code checks and bring all the team to a single workspace to collaborate on AI quality.

Sign up free ⟶

Or try open source ⟶

Tutorial scope

In this tutorial, you will learn how to create a model card for your ML model using Evidently, an open-source Python library to evaluate, test and monitor ML models.

You will work with a toy dataset and go through the following steps:

Review the ML model card contents
Populate the ML model card following the template
Customize the ML model card

You will generate the ML model card using Jupyter notebook or Colab and export it as an HTML file.

The tutorial requires minimal Python knowledge. The most crucial part is identifying the content you will include in the card: the tool will do the rest.

Step-by-step guide

✏️ Design the model card

First, let’s understand what to include in the model card. This depends on your use case, audience, and goals.

The Model Cards for Model Reporting paper that introduced the concept of Model Cards also suggested a possible structure. This includes the following components:

Model Details (date, version, type, model owner, etc.)
Intended User (primary uses, users, and out-of-scope applications)
Factors (e.g., relevant demographic groups, technical attributes, etc.)
Evaluation data
Training data
Quantitative Analyses
Ethical Considerations
Caveats and Recommendations

You can take this structure as a guideline and adapt it based on the type of model, risks, and who you expect to work with the model card. Ideally, a model card should strike a balance: be approachable for product and business stakeholders while correctly capturing crucial technical information.

If you want to get more inspiration, here are a few useful links:

The original paper: Model Cards for Model Reporting.
A case study of how Wayflyer creates ML model cards.
Example model cards and templates by Salesforce, Wikimedia, NVIDIA, OpenAI, Stable Diffusion

🔢 Prepare the demo data

Say, you want to generate the model card for a classification model. It will be aimed for internal use in a company. We prepared a simple template, loosely following the original paper and examples seen in the industry. Let’s take a look at it!

To use the template, you need to import and prepare toy data.

Follow the example notebook to recreate the steps.

⚠️ Disclaimer:
This example uses the Evidently API as available in version 0.6.7 or lower. Please ensure you are using the correct version when running this example. For updated and new examples, visit our documentation.

First, install Evidently. Use the Python package manager to install it in your environment. If you are working in Colab, run !pip install.

You must also import a few other libraries and Evidently components that you will use in the tutorial: Report and Metrics.

import pandas as pd
import numpy as np

from sklearn import datasets, ensemble

from evidently.report import Report
from evidently.metrics import *

Next, prepare the data. In the demo, we used the Breast Cancer Wisconsin Diagnostic Dataset, available in sklearn. You must split the dataset into two parts: one to train a simple model and another to evaluate it.

Let’s take a look at the sample of the data. This structured tabular dataset is suitable for solving a binary probabilistic classification task.

Run the steps in the notebook to train a simple model. For this exercise, the details of the model training are not relevant. You only need to arrive at a point where you have two datasets, each containing model features, predictions, and true labels.

You want to have two, to document both training and evaluation datasets in the model card.

Column mapping. In this example, our dataset has a simple structure (e.g., the “target” is called “target”), and you can directly proceed to analysis. In other cases, you might need a ColumnMapping object to help Evidently process the input data correctly (see the docs).

🧪 Run the template

Now, let’s run the template to understand how it works! At this step, you will generate the “dummy” model card without populating or modifying the contents.

The model card includes both text fields and visualizations. We added four text fields to the template:

Model details
Training dataset
Model evaluation
Considerations

For now, each text field contains a prompt that you will fill in later:

We also chose several visual plots and metrics:

Distribution of classes predicted by the model
Statistics of the training dataset, such as the number of rows, nulls, etc.
Classification quality metrics, including precision, recall, ROC AUC, etc.
Confusion matrix

Under the hood, Evidently has dozens of pre-built plots and metrics, so we selected a few that might go well together.

To generate the card, you need to create a new Report object and then list all the Metrics and Comments you want to include:

model_card = Report(metrics=[
   Comment(model_details),
   ClassificationClassBalance(),
   Comment(training_dataset),
   DatasetSummaryMetric(),
   Comment(model_evaluation),
   ClassificationQualityMetric(),
   ClassificationConfusionMatrix(),
   Comment(considerations),
])

model_card.run(current_data=bcancer_cur, reference_data=bcancer_ref)
model_card

This will generate the template model card directly in Colab or Jupyter notebook. Here is how the training section of the model card looks like:

You can also export it as an HTML and open it in the browser.

model_card.save_html("model_card.html")

🚀 Create your card

Now, let’s populate the text fields and explore how to adjust the model card to your needs.

Let’s start with comments. To add your text to the model card, copy and edit the existing text comments with prompts. Use markdown to structure the text.

For example:

Modify the selection of plots. You can choose the plots available in the Evidently library. You can browse the docs for the list of all Metrics or example notebooks (they include pre-rendered Colabs that you can quickly go through to select suitable visualizations). You can also implement a custom Metric, though this will require some knowledge of Plotly.

Here are some examples of changes we made:

Added a plot to show dataset correlations.

Customize model card with dataset correlations in Evidently

Added plots to show the stats for specific features we might want to highlight ("mean radius," "mean symmetry"). This is an arbitrary selection: in practice, you might want to highlight the most important model features, for example.

Added a couple of new plots related to the quality of probabilistic classification: distribution of predicted probabilities and ROC Curve.

Customize model card with probabilistic classification in Evidently

Added a table showing the alternative classification decision threshold and a comment about possibly modifying it. Consider including similar notes for business stakeholders when annotating some findings or metrics.

Customize model card with decision threshold in Evidently

Here is the resulting code to generate an updated model card:

model_card = Report(metrics=[
   Comment(model_details),
   ClassificationClassBalance(),
   Comment(training_dataset),
   DatasetSummaryMetric(),
   DatasetCorrelationsMetric(),
   ColumnSummaryMetric(column_name="mean radius"),
   ColumnSummaryMetric(column_name="mean symmetry"),
   Comment(model_evaluation),
   ClassificationQualityMetric(),
   ClassificationConfusionMatrix(),
   ClassificationProbDistribution(),
   ClassificationRocCurve(),
   Comment(threshold_comment),
   ClassificationPRTable(),
   Comment(considerations)
])

model_card.run(current_data=bcancer_cur, reference_data=bcancer_ref)
model_card

🛠 Adapt it to your data

Do you want to create a similar model card for your data? Here are the general steps you should follow:

Prepare the dataset or two datasets that contain model input features, predictions, and true labels. If you only want to document the data, you can pass the input features and not generate the metrics related to model quality.
Create a column mapping object to map the data structure to the Evidently format.
Design the contents of the model card using the custom text comment fields and available Evidently Metrics. You can use the list of all Metrics or the example notebook to select what to include. There are corresponding metrics for both Classification and Regression models.
Generate and save the Reports as HTML files to document your ML model cards.

What else can you do?

Generating visual Reports is only one of the features of Evidently. You can also use it to:

To log the results of the calculations and dataset metrics. You can export them as JSON or a Python dictionary.
To create Test Suites that perform structured checks on data quality, data drift, and model quality. To explore this functionality, start with the Getting Started tutorial.
To implement monitoring for production ML models. For example, you can integrate it into your prediction pipelines using tools like Airflow or Prefect or host dashboards.