How do different companies start and scale up their MLOps practices?
We teamed up with Javier López Peña, a Data Science Manager at Wayflyer, who shared their story on adopting model cards for machine learning models.
Keep reading to learn more about:
- ML applications at Wayflyer,
- how they create ML model cards using open-source tools,
- what each ML model card contains,
- how they incorporate data quality and drift analysis,
- why the model card is a great tool to adopt early in the MLOps journey,
- and even what Stable Diffusion has to do with it!
Machine learning at Wayflyer
Wayflyer is a revenue-based financing platform for e-commerce brands. They help e-commerce businesses achieve growth with various products: from sales forecasts to cash advances.
E-commerce companies often have to pay upfront for the inventory they will only sell months later. This creates a cash flow gap. Wayflyer helps bridge this gap by providing instant capital and insights on key financial metrics.
This business is highly data-driven, and Wayflyer has over 20 people in the data roles and tens of ML models in production. They typically work with time series, tabular and text data. Here are some of the ML use cases that power the day-to-day operations:
- Lead generation. One example is the identification of leads for the sales team to reach out to. There are millions of companies out there, and Wayflyer uses machine learning to identify those that will be a good match for their products. For each company, they need to answer very simple questions. Is this an e-commerce company? Is it in a country that Wayflyer operates in? Is it the right size to target?
- Prediction of cash flow, sales, and inventory levels. Once a company becomes a customer, they connect certain data sources to the Wayflyer platform, including data on sales, marketing campaigns, and transactions. Wayflyer uses all this data to create ML-driven insights. For example, they use time series modeling to predict cashflows and various statistical and ML models to forecast sales and inventory levels.
- Default and fraud prediction. To back up funding decisions, they need to model default ("How likely is a customer to go out of business before they repay the cash advance?") and predict the probability of fraud.
- Categorizing banking transactions. To make sense of the banking transaction data, Wayfyer also uses NLP modeling. They categorize each banking transaction based on its description: Is this a salary payment? Is this a marketing expenditure? Is this a product return?
With such a variety of ML use cases, Wayflyer does not have a central machine learning team. Instead, they operate in product-oriented pods. Each team may be responsible for one or more ML models.
For example, they have a team focused on growth. They take care of lead generation and are responsible for the ML model that generates the list of potential customers to reach out to. This team consists of specialists from various domains: product managers, engineers, designers—and data scientists.
Another example is the underwriting team. They take care of the funding product and focus on assessing the financial stability of potential customers. The team consists of financial analysts that review the data, data scientists that create fraud and default models, and engineers that build pipelines and put everything together.
This distributed approach creates an interesting challenge. The teams, of course, communicate with each other, but they may be unfamiliar with the fine details of what others are working on. At the same time, they want other people in the company to use the ML models they create.
Adopting ML model cards
Why model cards?
As the number of product pods and ML models grew, Wayflyer faced the challenge of sharing the models across the company. For example, the fraud model designed by the underwriting team might later be picked up by someone else. But if the model is used outside of the team that created it, how will they know model characteristics?
"We need to be able to share the information about the models: what the models do, how they do it, what the limitations are, what the potential use cases are, and what they should not be used for. That led us to the question: "How do we document ML models?"
To answer that question, Wayflyer turned to model cards. Introducing a way to share information about the ML models with both technical and non-technical users became one of the priority tasks in building company-wide MLOps practices at Wayflyer.
What is an ML Model Card?
This concept was first proposed by Mitchell et al. in 2018 in a paper "Model Cards for Model Reporting."
Simply put, a Model Card is a document that provides key information about a machine learning model. The goal of the Model Card is to increase transparency and become a single source of truth about an ML model, including its intended use, performance evaluation, and other characteristics.
Model card toolset
The Wayflyer team looked for the tooling to help automatically generate ML model cards for each model. Initially, they tested the open-source Google Model Card Toolkit, which seemed exactly what the team needed. Except they could not use it.
"Back when we looked at it, using the model card toolkit required a tight integration with Tensorflow. Whereas, Tensorflow has a bunch of dependencies that are hard to build. It also only worked with older Python versions. All that made using the stack really complicated. So we wrote our version of the toolkit based on components that were easier to use within our development stack."
They looked to create a system on top of open source that would be easy to use and had a small number of external dependencies. They settled on building on top of the Pydantic classes that allow generating JSON output. Then, they used a Jinja template to ingest the Pydantic classes and format the resulting HTML for each card. Instead of limiting the charting to static graphs, they relied on Plotly in order to get interactive plots. Later, they discovered Evidently, and incorporated its visual components as part of the model card template. The result is a very barebones HTML file that can be styled using CSS.
When designing the model card contents, Wayflyer took an iterative approach. They started small with a simple and barebone system and continued with adding new components as needed.
Model card components
The model card at Wayflyer is a well-structured dossier intended for various groups of stakeholders. Let's have a look!
The model description first provides basic information about the model in the text format. It tells what the model does, its version number, when it was trained, and who owns the model.
It also has a "Considerations" section, which is essential for other stakeholders in the company—and helps communicate what the model was designed to do to the people who did not create it. It includes:
- the intended users
- the use cases that drove the development of the model
- its limitations (e.g., known data quality risks or a warning against an unintended usage scenario: for example, you could create a certain ML model to score leads but would not expect it to be used for credit risk assessment)
- trade-offs that have been made while developing the model (e.g., the number of features was narrowed by a domain expert to make the model more explainable)
- ethical considerations (e.g., the possibility of inheriting human biases, faulty data, etc.).
The next section covers how the model works. Aimed at other data scientists rather than non-technical stakeholders, it contains:
- Information about the model architecture based on visual representation that one can generate with scikit-learn. For example, it shows the model type and parameters that went into the model.
- Overview of the input data: features used to train the model, feature types, and some basic information about them, like their min and max values for numerical features or the range of possible values for categorical ones. The first implementation of the model card contained a simple feature table, which was later expanded with an additional Data Quality section.
- Indicators of model performance. This section includes all the results of the exploratory analysis a data scientist performs when deciding whether a model is good enough to be shipped. It includes plots like Actual vs. Expected values, calibration curve, confusion matrices, gain chart, ROC curve, precision-recall curve, model accuracy, and feature importances for training and validation datasets.
One of the issues Wayflyer faced when building ML models was a lot of data quality problems. The data comes from different sources and sometimes isn't aligned correctly or is incomplete, which might affect the quality of the model itself.
Wayflyer decided to add data quality assessment to the model cards to make it easy to explore the training data. This is where Evidently came in.
"We previously had a basic table summarizing the features and simple statistics. We wanted something better, and we thought the Data Quality report from Evidently would be exactly what we needed. So that's what we added."
They started with the Evidently data quality report and included the dataset summary and feature-by-feature analysis in their model cards template. This helps visually show the feature behavior over time and whether it has spikes or extreme outliers.
"Before assessing the model, we can explore whether our data is stable over time. That is a very, very important consideration for us. If the feature is unstable over time, we probably should not make predictions based on it."
Wayflyer also incorporated the Evidently Data Drift report in the model card. Though this type of analysis is typically performed on live data, Wayflyer implemented it to approximate future feature behavior. They want to see which features are volatile and understand trends and potential data drift in production.
As e-commerce businesses tend to have high seasonality, the data Wayflyer deals with is also time-sensitive. When evaluating the model quality, the company uses a time-based train-validation split in addition to the usual stratified random split in training.
They keep the last few months—could be 3 or 4—of the available data as a hold-out dataset. Evaluating the model quality on this period indicates how much it will deteriorate if put into production. The comparison between the test set and the hold-out set shows if the model is robust and generalizes well when deployed.
Essentially, Wayflyer is simulating what would have happened if they had deployed the model 3-4 months ago. They do the same for data drift analysis.
"If there has been a significant change in data, we want to see how our model will react to these scenarios. And the data drift report from Evidently came in very handy for that. What are the features that tend to drift? Could we have done something about it? At what point could we have detected something happening, and what action could we have taken to mitigate it? That is the type of internal discussion we have on the Data Science team."
A bit of fun!
Each ML model also gets a personalized AI touch.
Usually, each newly trained ML model gets a hashed name. At Wayflyer, they took a creative spin and decided that giving each model a more personal name is better than something like "83been13aa."
To give each model a proper name, the Wayflyer team developed an automated code name generator. It turns a hash value that describes the model into something readable. Each code name consists of three parts: a funky descriptor, an adjective, and the name of an animal. The names are hilarious!
But the team didn't stop there. They also used the model's name to create an AI-generated image for each model with the help of Stable Diffusion. Meet Discombobulated Addicted Mastiff!
Using model cards
Every newly created model at Wayflyer gets a model card. This way, the company ensures that the models are documented and available to other teams.
There are now three main user groups of model cards at Wayflyer: data scientists who created the model, data scientists looking to implement similar solutions or researching the model features, and business stakeholders.
"The first problem that we are looking to solve with model cards, beyond creating nice-looking documentation, is to develop a practice and a discipline of making informed decisions of when and how to deploy a new model and make sure it is reproducible, understandable, and can be consumed by other parties. The second problem we wanted to solve was to have a mostly automated way of generating documentation, because no data scientist ever enjoys having to write reports about a model that is already done!"
A new card is also created whenever the model is updated (e.g., retrained on a new dataset). The new model is then compared with its older version. If the improvement is statistically significant, the team will deploy the new model.
As Wayflyer sticks to an iterative approach, model cards are always a work in progress, as well as other areas of MLOps. The next task the team is working on is integration with MLflow. They intend to register all the relevant data from the model card as metadata attached to the model in a model registry. It will allow users to compare models with each other easily.
When it comes to adopting new tools and practices, it is always a trade-off between the cost of introducing the change and its potential benefits. Some changes are harder to implement than others. When it concerns changing vendors, rewriting legacy code, or retraining the entire team — that cannot be done overnight.
In contrast, model cards can be adopted incrementally to bring value fast. One can start small by simply documenting model use cases and limitations in the text format and add more visual components as needed.
"The biggest friction many MLOps tools have is the work needed to do to adapt those thousands of lines of existing code to work with these tools. On the contrary, ML cards are a low-hanging fruit. It doesn't need a service or a platform. You can run it in a notebook, and the output is sharable with everybody simply in a web browser. That is why we implemented model cards and Evidently even before MLflow for model and experiment tracking."
To learn more about Evidently open-source tools for ML model evaluation and monitoring, check out GitHub and documentation. Here is how you can create a custom HTML Report as a base for the model card.