登录查看更多内容

Model Drift in Machine Learning

Utkarsh Sharma

SME & Manager | SAP Certified Application Associate | Certified Data Scientist | Intel certified Machine Learning Instructor| Mentor

发布日期: 2022年4月14日

?“Change is the only constant in life.”- Heraclitus (Greek philosopher).

The world is not static, it’s dynamic and continually changing. Whatever technology we are having today will become inefficient in the coming time due to various changes in requirements or advancements in technology or anything else. Thus, nothing lasts forever. Our youth is not forever, the best becomes the worst, and our machine learning models degrade as time does its thing.

For example, consider the following case:

You created a machine learning model and trained it with your training data, validated its performance across various measures that seemed promising, and then put it into production, and then something unexpected happen (a pandemic like COVID-19) and the model forecasts go haywire. Do you have any idea what happened?

This is the phenomenon called model drift, specifically concept drift.

Another example to understand this is the email spam filtering model. If you are using a model created in the year 2000 for spam identification today, then there is a high probability that it might not be filtering the mails correctly now. A spam email from the 2000s isn’t the same as a spam email in 2022. The features used to detect fraudulent emails in 2021 would differ significantly from those of the 2000s.

?What is model drift?

Model Drift (also known as model decay) refers to the degradation of a model's prediction power due to changes in the environment, and thus the relationships between variables. If the environment, the variables, and the data remain the same forever then there will be no problem of model drift then, but as we all know that this will not happen. If you are creating a machine learning model for learning purposes or for gaining experience then there is nothing to worry about but, if the model created by you will be going out in production to be used in real-time, then you should take care of this problem of model drift.

领英推荐

Machine Learning Algorithms: Valere Breaking Down the…

Valere 10 个月前

Types of Machine Learning Algorithms and building…

Sankhyana Consultancy Services Pvt. Ltd. 2 年前

Machine Learning - The main impact areas where we can…

Alessandro Civati 3 年前

So, prior to dealing with the problem of model drift, you should understand what the various types of model drift exist.

Concept Drift

Concept drift in machine learning and data mining refers to the change in the relationships between input and output data in the underlying problem over time.

As an example, think of a product recommendation system in eCommerce. Do you think a model that was trained before COVID-19, would work equally well during the COVID-19 pandemic?

Data Drift

Simply put, data drift occurs when the data a model is trained on changes. The change in input data or independent variable leads to poor performance of the model. Microsoft has stated data drift to be one of the top reasons model accuracies degrade over time.

Data drift is generally a consequence of seasonal changes or changes in consumer preferences over time. For instance, educational data collected before Covid shows a lesser preference for online learning than post-covid. Similarly, the demand for lipsticks has reduced considerably after Covid while face masks became a norm. As a result, Models trained on previous data will be useless. Since the input data has changed, the distribution of the variables becomes different and confuses the model.

要查看或添加评论，请登录

Utkarsh Sharma的更多文章

reCAPTCHA: The Turing Test We Use Daily

2023年9月20日

reCAPTCHA: The Turing Test We Use Daily

It is amazing that we use some things so frequently that we forget to understand the mechanism behind them, like for…
Enable Machines to Feel: Sentiment Analysis

2022年5月5日

Enable Machines to Feel: Sentiment Analysis

Have you ever got a text from someone and couldn't tell if they were kidding or not? Unless we clearly tell the person…
Introduction to Time Series Analysis

2022年4月28日

Introduction to Time Series Analysis

Time series is a sequence of data points organized in time order. Forecast of data by analyzing time-based data is Time…

1 条评论
Dimensionality Reduction by PCA using Orange

2022年4月21日

Dimensionality Reduction by PCA using Orange

The curse of dimensionality haunts every data scientist dealing with a dataset containing a large number of attributes.…

1 条评论
Principal Component Analysis????

2022年4月1日

Principal Component Analysis????

What is PCA? Principal Component Analysis, or PCA, is a dimensionality-reduction method that is often used to reduce…

3 条评论
Curse of Dimensionality

2022年3月17日

Curse of Dimensionality

Yes, data scientists and the data handling community do suffer from this well-known curse. So, is it really a curse or…
Market Basket Analysis:- What will I buy next?

2022年3月10日

Market Basket Analysis:- What will I buy next?

Have you ever wondered, while entering a shopping store that how they organize or stack the things in a particular…
What do Data Engineer Do?

2022年3月3日

What do Data Engineer Do?

So, to define it very shortly a data engineer is that person who is responsible to collect the data from various…

4 条评论
A beginner’s Guide to data mining : RapidMiner

2022年2月24日

A beginner’s Guide to data mining : RapidMiner

RapidMiner studio is a data science and data mining platform that lets users extract transform and load data to draw…
Database Vs Data Warehouse Vs Data Lake

2022年2月17日

Database Vs Data Warehouse Vs Data Lake

In this article, we are going to discuss the difference between databases, data warehouses, and data lakes. So, to need…

1 条评论

See all articles

Model Drift in Machine Learning

Utkarsh Sharma

SME & Manager | SAP Certified Application Associate | Certified Data Scientist | Intel certified Machine Learning Instructor| Mentor

领英推荐

Utkarsh Sharma的更多文章

社区洞察

其他会员也浏览了

Blog 79 # Demystifying Machine Learning: Understanding the Limitations of Accuracy Predictions

Decision Tree

Beyond the Model: Why MLOps is the Key to Reliable Machine Learning

Embeddings explained in plain English

Simplest Guide on Overfitting and Underfitting in Machine Learning

Machine learning and macro trading strategies

How Machine Learning Actually Works…

Chapter 1: Meet The Buzzwords

How Machines Learn (and Why It Matters)

Navigating the Machine Learning Development Life Cycle: A Comprehensive Guide

领英推荐

Utkarsh Sharma的更多文章

reCAPTCHA: The Turing Test We Use Daily

Enable Machines to Feel: Sentiment Analysis

Introduction to Time Series Analysis

Dimensionality Reduction by PCA using Orange

Principal Component Analysis????

Curse of Dimensionality

Market Basket Analysis:- What will I buy next?

What do Data Engineer Do?

A beginner’s Guide to data mining : RapidMiner

Database Vs Data Warehouse Vs Data Lake

社区洞察

其他会员也浏览了

Blog 79 # Demystifying Machine Learning: Understanding the Limitations of Accuracy Predictions

Decision Tree

Beyond the Model: Why MLOps is the Key to Reliable Machine Learning

Embeddings explained in plain English

Simplest Guide on Overfitting and Underfitting in Machine Learning

Machine learning and macro trading strategies

How Machine Learning Actually Works…

Chapter 1: Meet The Buzzwords

How Machines Learn (and Why It Matters)

Navigating the Machine Learning Development Life Cycle: A Comprehensive Guide