Model Drift in Machine Learning

Model Drift in Machine Learning

?“Change is the only constant in life.”- Heraclitus (Greek philosopher).

The world is not static, it’s dynamic and continually changing. Whatever technology we are having today will become inefficient in the coming time due to various changes in requirements or advancements in technology or anything else. Thus, nothing lasts forever. Our youth is not forever, the best becomes the worst, and our machine learning models degrade as time does its thing.

For example, consider the following case:

You created a machine learning model and trained it with your training data, validated its performance across various measures that seemed promising, and then put it into production, and then something unexpected happen (a pandemic like COVID-19) and the model forecasts go haywire. Do you have any idea what happened?

This is the phenomenon called model drift, specifically concept drift.

Another example to understand this is the email spam filtering model. If you are using a model created in the year 2000 for spam identification today, then there is a high probability that it might not be filtering the mails correctly now. A spam email from the 2000s isn’t the same as a spam email in 2022. The features used to detect fraudulent emails in 2021 would differ significantly from those of the 2000s.

?What is model drift?

Model Drift (also known as model decay) refers to the degradation of a model's prediction power due to changes in the environment, and thus the relationships between variables. If the environment, the variables, and the data remain the same forever then there will be no problem of model drift then, but as we all know that this will not happen. If you are creating a machine learning model for learning purposes or for gaining experience then there is nothing to worry about but, if the model created by you will be going out in production to be used in real-time, then you should take care of this problem of model drift.

So, prior to dealing with the problem of model drift, you should understand what the various types of model drift exist.

Concept Drift

Concept drift in machine learning and data mining refers to the change in the relationships between input and output data in the underlying problem over time.

As an example, think of a product recommendation system in eCommerce. Do you think a model that was trained before COVID-19, would work equally well during the COVID-19 pandemic?

Data Drift

Simply put, data drift occurs when the data a model is trained on changes. The change in input data or independent variable leads to poor performance of the model. Microsoft has stated data drift to be one of the top reasons model accuracies degrade over time.

Data drift is generally a consequence of seasonal changes or changes in consumer preferences over time. For instance, educational data collected before Covid shows a lesser preference for online learning than post-covid. Similarly, the demand for lipsticks has reduced considerably after Covid while face masks became a norm. As a result, Models trained on previous data will be useless. Since the input data has changed, the distribution of the variables becomes different and confuses the model.


要查看或添加评论,请登录

Utkarsh Sharma的更多文章

  • reCAPTCHA: The Turing Test We Use Daily

    reCAPTCHA: The Turing Test We Use Daily

    It is amazing that we use some things so frequently that we forget to understand the mechanism behind them, like for…

  • Enable Machines to Feel: Sentiment Analysis

    Enable Machines to Feel: Sentiment Analysis

    Have you ever got a text from someone and couldn't tell if they were kidding or not? Unless we clearly tell the person…

  • Introduction to Time Series Analysis

    Introduction to Time Series Analysis

    Time series is a sequence of data points organized in time order. Forecast of data by analyzing time-based data is Time…

    1 条评论
  • Dimensionality Reduction by PCA using Orange

    Dimensionality Reduction by PCA using Orange

    The curse of dimensionality haunts every data scientist dealing with a dataset containing a large number of attributes.…

    1 条评论
  • Principal Component Analysis????

    Principal Component Analysis????

    What is PCA? Principal Component Analysis, or PCA, is a dimensionality-reduction method that is often used to reduce…

    3 条评论
  • Curse of Dimensionality

    Curse of Dimensionality

    Yes, data scientists and the data handling community do suffer from this well-known curse. So, is it really a curse or…

  • Market Basket Analysis:- What will I buy next?

    Market Basket Analysis:- What will I buy next?

    Have you ever wondered, while entering a shopping store that how they organize or stack the things in a particular…

  • What do Data Engineer Do?

    What do Data Engineer Do?

    So, to define it very shortly a data engineer is that person who is responsible to collect the data from various…

    4 条评论
  • A beginner’s Guide to data mining : RapidMiner

    A beginner’s Guide to data mining : RapidMiner

    RapidMiner studio is a data science and data mining platform that lets users extract transform and load data to draw…

  • Database Vs Data Warehouse Vs Data Lake

    Database Vs Data Warehouse Vs Data Lake

    In this article, we are going to discuss the difference between databases, data warehouses, and data lakes. So, to need…

    1 条评论

社区洞察

其他会员也浏览了