This suggestion comes up from time to time. Imagine you have a machine learning model in production, and some features are very volatile. Their distributions are not stable. What should you do with those? Should you just throw them away? Can you do better?
Of course, the answer does depend on one the context. Here is one way to think it through.
1. Why is the model using these features?
Assuming you built the model, you should know the answer to this question.
It might be that you performed solid feature engineering, consulted with domain experts, and can interpret the meaning of each feature. On the opposite side of the spectrum, it might be that you just threw all the data you had, built a quick model, and put it to trial use. It is the first time you look at the feature patterns in detail.
It is good to acknowledge how much thought went into feature selection and engineering before you start throwing them away.
If you are still in the quick-and-dirty solution phase: you can treat it as a model development experiment and rebuild your model with different feature combinations. Throwing away low-quality, noisy features is a reasonable thing to do: better sooner than later.
You might as well do nothing if the model operates in a non-critical setting and performs fine otherwise.
If you put more thought into the feature selection, that is a different story. You might face training-serving skew: the reality does not match the training data. Or, the features might have been stable for a while, and now they are drifting. Let’s think a bit more before getting rid of those features!
Pro tip: you can test the feature behavior in advance and evaluate historical drift patterns.
2. Evaluate the feature importance
Now, if you plan to dig deeper, the first step is to review the feature importance.
If the drifting features are unimportant, you can probably do without them. The ML model won’t grasp any meaningful patterns if they are truly volatile. If you have a lot of features with relatively low importance, you can safely experiment with dropping some of them.
It might also help to review your drift detection approach. The features might still be mildly useful. You simply do not want to overreact to an occasional shift, as it is unlikely to affect the model's performance. You can keep the volatile features in the model but exclude them from the drift monitoring. You can then regularly evaluate drift only in the top features.
If the drifting features are key to the model, you cannot just eliminate them. If you detect a shift, it is best to investigate its nature. What is the real-world phenomenon behind it? Did they become entirely unpredictable, or is it a trend that a model can learn?
In the latter case, you can estimate how much data you need to collect to retrain your model to capture new patterns. While you wait for that, you might want to pause your model, use a fallback or continue with the decreased model performance for a while. You can also explore if there are other data sources you can use to augment the model.
Further reading: how to be intentional about model retraining needs and evaluate how much data you need to retrain.
3. Consider a dual-model setup
As you explore the changes in detail, you might notice some patterns. One recurring pattern is when you have some features that are both important and stable and a lot of features that are still useful but volatile or not even always present in the data.
In this case, you can consider a dual-model setup.
The idea is to build a first ML model using only stable, reliable features and the second ML model to correct the output of the first using the bigger feature set. This will help balance reliability (thanks to the core, robust model built on a limited feature set) and maximize the performance (thanks to using all the additional features to improve the output of the first one).
You can use the models in different combinations. For example, apply the models sequentially, or choose the output of one model over another depending on the input data or a rule that you define.
You can also set up input feature drift monitoring only for the “core” model to react to important shifts.
4. Think about seasonality
Here is another thing to dig into: is this true feature drift, or is it a temporal pattern?
Often it is some seasonality that affects the feature behavior. This might be directly related to the time dimension, and you need to choose appropriate comparison windows to avoid false positive drift detection. For example, you might need to compare your data month-by-month and not week-by-week.
You can also deal with other types of seasonality. For example, if you predict the estimated time of the taxi's arrival, the weather is a very impactful factor. When it rains, things are very different regarding car availability, traffic speed, and user demand.
If you look at the behavior of the individual features, rainy weather might look like data drift. While it is, in fact, a pattern that you can single out.
You might end up with two or more models for different weather types and set a rule to switch between them (e.g., relying on the humidity levels to distinguish when the rain starts or using some other heuristic).
Further reading: what else you can do to handle drift.
5. Treat drift as a heuristic
We started the question with the assumption that “drift is detected” and features are volatile.
However, there is no such thing as absolute drift. It is a valid check that compares feature distributions or specific expectations of feature behavior. But it does not necessarily mean that the feature behavior is abnormal or that the model quality is affected.
There is no substitute for the data scientist’s judgment. You need to come up with expectations about your features when you select the drift detection methods, which features to check for it, and the comparison windows you set. You should also consider the context when interpreting the drift detection results and deciding whether to throw the features away (or not).
Getting rid of the drifting features is a possible approach, but it can be simplistic: just like blind retraining on the new data once drift is detected (check the quality first!).
Further reading: how to build intuition on different drift detection methods.
[fs-toc-omit]Get started with open-source ML monitoring
Evaluate, test, and monitor ML models in production with Evidently. From tabular data to NLP and LLM. Built for data scientists and ML engineers.
Get started ⟶
Cloud waitlist ⟶