Model Drift in Machine Learning
Utkarsh Sharma
SME & Manager | SAP Certified Application Associate | Certified Data Scientist | Intel certified Machine Learning Instructor| Mentor
?“Change is the only constant in life.”- Heraclitus (Greek philosopher).
The world is not static, it’s dynamic and continually changing. Whatever technology we are having today will become inefficient in the coming time due to various changes in requirements or advancements in technology or anything else. Thus, nothing lasts forever. Our youth is not forever, the best becomes the worst, and our machine learning models degrade as time does its thing.
For example, consider the following case:
You created a machine learning model and trained it with your training data, validated its performance across various measures that seemed promising, and then put it into production, and then something unexpected happen (a pandemic like COVID-19) and the model forecasts go haywire. Do you have any idea what happened?
This is the phenomenon called model drift, specifically concept drift.
Another example to understand this is the email spam filtering model. If you are using a model created in the year 2000 for spam identification today, then there is a high probability that it might not be filtering the mails correctly now. A spam email from the 2000s isn’t the same as a spam email in 2022. The features used to detect fraudulent emails in 2021 would differ significantly from those of the 2000s.
?What is model drift?
Model Drift (also known as model decay) refers to the degradation of a model's prediction power due to changes in the environment, and thus the relationships between variables. If the environment, the variables, and the data remain the same forever then there will be no problem of model drift then, but as we all know that this will not happen. If you are creating a machine learning model for learning purposes or for gaining experience then there is nothing to worry about but, if the model created by you will be going out in production to be used in real-time, then you should take care of this problem of model drift.
领英推荐
So, prior to dealing with the problem of model drift, you should understand what the various types of model drift exist.
Concept Drift
Concept drift in machine learning and data mining refers to the change in the relationships between input and output data in the underlying problem over time.
As an example, think of a product recommendation system in eCommerce. Do you think a model that was trained before COVID-19, would work equally well during the COVID-19 pandemic?
Data Drift
Simply put, data drift occurs when the data a model is trained on changes. The change in input data or independent variable leads to poor performance of the model. Microsoft has stated data drift to be one of the top reasons model accuracies degrade over time.
Data drift is generally a consequence of seasonal changes or changes in consumer preferences over time. For instance, educational data collected before Covid shows a lesser preference for online learning than post-covid. Similarly, the demand for lipsticks has reduced considerably after Covid while face masks became a norm. As a result, Models trained on previous data will be useless. Since the input data has changed, the distribution of the variables becomes different and confuses the model.