登录查看更多内容

Data Science: Overfitting

Meher Mullapudi, FRM?, PMP?

Global Managed Services Delivery & Risk Management | Expert in Operational Efficiency, Strategy, Contracts and Stakeholder Management

发布日期: 2024年10月22日

#MeherLearningML

While testing a classification model, you find that classification errors are lower in training data and higher in test data. It is because the 'Model was Overfitted'.

Overfitting means the model has not only learned from training data but has also picked up the noise or random fluctuations. As the model goes into test data, it exhibits higher (>+/-5% between test and train data) variation.

Reasons for this discrepancy could be:

a)????????Complex Model: The model might be too complex (e.g., too many features), capturing irrelevant details in the training data.

b)????????Insufficient Data: The training data is not representative of the real-world scenarios the model will encounter, so it performs poorly on the test set.

c)????????Data Leakage: Sometimes, unintentional inclusion of test data or its features during training can also lead to apparent good performance on training data and poor performance on test data.

To address overfitting, you can: Simplify the model (reduce complexity), Add more training data and/or perform cross-validation to assess the model's generalization.

Link to Poll

要查看或添加评论，请登录

Meher Mullapudi, FRM?, PMP?的更多文章

Book Review: Learn to Earn – A beginner’s guide to the Basics of Investing and Business

2024年11月8日

Book Review: Learn to Earn – A beginner’s guide to the Basics of Investing and Business

Authors - Peter Lynch and John Rothchild Published by – Simon & Schuster paperbacks Summary Learn to Earn covers the…
From Typewriters to AI: The Transformative Journey of Executive Assistants

2024年4月13日

From Typewriters to AI: The Transformative Journey of Executive Assistants

The role of the Executive Assistant (EA) has evolved significantly over the years, adapting to changes in technology…

3 条评论
ChatGPT: The Wizard Behind the Curtain

2024年4月10日

ChatGPT: The Wizard Behind the Curtain

In today's digital age, where the line between science fiction and reality blurs, a new form of magic has emerged. This…

10 条评论

Data Science: Overfitting

Meher Mullapudi, FRM?, PMP?

Global Managed Services Delivery & Risk Management | Expert in Operational Efficiency, Strategy, Contracts and Stakeholder Management

Meher Mullapudi, FRM?, PMP?的更多文章

社区洞察

其他会员也浏览了

Cpk and Ppk: Process Capability Insights

Statistics Stumbles: 10 Common Mistakes in Data Analysis Training

Possible ways to Induce Bias in a Model?

How to Learn Intermediate Statistics for Data Science As A Self Starter[ Day - 11 ]

Unlocking the Power of the Central Limit Theorem in Data Analysis

K-fold Cross-Validation: Reliable Model Evaluation

What is overfitting?

Understanding Correlation in Data Analysis

Is it wise to use CART technique when the dependent variable is skewed towards one of the class?

What is time-series analysis?(Very Basic Introduction)

Meher Mullapudi, FRM?, PMP?的更多文章

Book Review: Learn to Earn – A beginner’s guide to the Basics of Investing and Business

From Typewriters to AI: The Transformative Journey of Executive Assistants

ChatGPT: The Wizard Behind the Curtain

社区洞察

其他会员也浏览了

Cpk and Ppk: Process Capability Insights

Statistics Stumbles: 10 Common Mistakes in Data Analysis Training

Possible ways to Induce Bias in a Model?

How to Learn Intermediate Statistics for Data Science As A Self Starter[ Day - 11 ]

Unlocking the Power of the Central Limit Theorem in Data Analysis

K-fold Cross-Validation: Reliable Model Evaluation

What is overfitting?

Understanding Correlation in Data Analysis

Is it wise to use CART technique when the dependent variable is skewed towards one of the class?

What is time-series analysis?(Very Basic Introduction)