Data Science: Overfitting

Data Science: Overfitting

#MeherLearningML

While testing a classification model, you find that classification errors are lower in training data and higher in test data. It is because the 'Model was Overfitted'.

Overfitting means the model has not only learned from training data but has also picked up the noise or random fluctuations. As the model goes into test data, it exhibits higher (>+/-5% between test and train data) variation.

Reasons for this discrepancy could be:

a)????????Complex Model: The model might be too complex (e.g., too many features), capturing irrelevant details in the training data.

b)????????Insufficient Data: The training data is not representative of the real-world scenarios the model will encounter, so it performs poorly on the test set.

c)????????Data Leakage: Sometimes, unintentional inclusion of test data or its features during training can also lead to apparent good performance on training data and poor performance on test data.

To address overfitting, you can: Simplify the model (reduce complexity), Add more training data and/or perform cross-validation to assess the model's generalization.

Link to Poll

要查看或添加评论,请登录

Meher Mullapudi, FRM?, PMP?的更多文章

社区洞察

其他会员也浏览了