登录查看更多内容

Your Machine Learning (ML) Model is Wrong, now what?

Bassel Haidar

| Data & AI | Booz Allen |

发布日期: 2020年12月27日

In ML learning parlance, the bias-variance trade-off means that the model must find a happy medium between underfitting and overfitting.

If the ML model has a high bias, it will underfit because it did not learn enough about the relationship between the model's features and labels. In contrast, a high variance model has overlearned or memorized the training data and is incapable of generalizing on unseen data, resulting in an overfit model.

As ML practitioners, we are seekers of the low bias and low variance ML Model, the "Holy Grail." By increasing model complexity, we decrease bias but increase variance while reducing model complexity, we increase bias but decrease variance. The bias-variance trade-off is a perfect illustration of the no free lunch theorem.

To improve model performance, we need a super algorithm or meta-algorithms that combine several ML models providing access to the proverbial bias/variance "dial knob," leading us to a quick synopsis of the ensemble methods.

Bagging (bootstrap aggregating, i.e., Random Forest) reduces high variance. How? Assuming that each model in the ensemble will not make the same errors on the test data set, by averaging individual models' predictions, the errors cancel out, yielding better predictions, which is quite akin to asking the audience (wisdom of the crowd).
Boosting (i.e., XGBoost, AdaBoost, GBM) constructs an ensemble model with more capacity than the individual member models. It reduces bias more than variance. Each successive model focuses "the learning" on the examples the previous model got wrong.
Stacking is similar to boosting but uses different ML models and combines their outputs to feed a secondary ML model to produce a prediction. It decreases variance but also controls high bias.

Back to the no free lunch theorem, although ensemble learning allows us to regulate the bias-variance trade-off, it also increases training time (i.e., compute resources), design time (i.e., which model to choose and types of architecture), and decreases model interpretability.

Jimmy Haidar

Low Voltage Systems Contractor

4 年

This is almost like a formula to life.. moderation.

2 次回应

Carlos Mercado

economics, ai, and crypto research @ flipside

4 年

Variance: biased toward training data. Bias: high variance in training data. Don't you love ML vocabulary? ??

1 次回应

查看更多评论

要查看或添加评论，请登录

Bassel Haidar的更多文章

Beyond Interpolation: Empowering LLMs to Generate New Knowledge

2025年3月18日

Beyond Interpolation: Empowering LLMs to Generate New Knowledge

Large Language Models (LLMs) have demonstrated remarkable capabilities in generating human-like text, translating…

3 条评论
From Passive to Agentic AI

2025年3月6日

From Passive to Agentic AI

While LLMs have demonstrated remarkable capabilities, they've been constrained by a fundamental limitation: isolation…

1 条评论
AI's Next Step in Reasoning

2024年10月3日

AI's Next Step in Reasoning

We've all heard about AI's impressive feats lately - from writing essays to coding programs. But let's be honest, when…

4 条评论
The Rise of Compound AI Systems

2024年7月19日

The Rise of Compound AI Systems

As we move beyond the era of monolithic AI models, a new approach is gaining traction: Compound AI Systems. This…

10 条评论
Talk to your PDF Documents

2023年5月31日

Talk to your PDF Documents

Have you ever wished you had a way to simply ask your PDFs questions? I developed ChatPDF specifically to do just that.…

11 条评论
Transform your Data Landscape with the Superheroes of Data Management

2023年5月9日

Transform your Data Landscape with the Superheroes of Data Management

We all know that data is a powerful asset, but how can we harness its full potential to make smarter decisions…

2 条评论
Large datasets, slow queries, now what?

2023年5月3日

Large datasets, slow queries, now what?

Background With unprecedented, unrelenting data growth, large-scale applications are more prevalent than ever. With…

2 条评论
Nothing is Ever Beyond Your Reach

2022年12月23日

Nothing is Ever Beyond Your Reach

Faith, hope, and love are humans' most powerful virtues with transformative powers. Faith is confidence in something or…

2 条评论
Move over Judge Judy AI is here

2022年6月6日

Move over Judge Judy AI is here

Although many conversations on AI focus on how to reduce bias, others have pointed out limitations in framing the…

12 条评论

See all articles

Your Machine Learning (ML) Model is Wrong, now what?

Bassel Haidar

| Data & AI | Booz Allen |

Bassel Haidar的更多文章

社区洞察

其他会员也浏览了

Regularization in Machine Learning

An Introduction to Machine Learning

Hyperparameter optimization in Machine Learning Part-1: Algorithms

Ensemble Methods in Machine Learning

Machine Learning Explained

Achieving Balance: Strategies for Training Robust Machine Learning Models with Imbalanced Datasets

Machine Learning Process is inspired by Human Learning Process

What is Machine Learning?

What is Machine Learning and why is it important or useful?

?? Building a Top-Notch Machine Learning Product: Tips from my many missteps! ????

Bassel Haidar的更多文章

Beyond Interpolation: Empowering LLMs to Generate New Knowledge

From Passive to Agentic AI

AI's Next Step in Reasoning

The Rise of Compound AI Systems

Talk to your PDF Documents

Transform your Data Landscape with the Superheroes of Data Management

Large datasets, slow queries, now what?

Nothing is Ever Beyond Your Reach

Move over Judge Judy AI is here

社区洞察

其他会员也浏览了

Regularization in Machine Learning

An Introduction to Machine Learning

Hyperparameter optimization in Machine Learning Part-1: Algorithms

Ensemble Methods in Machine Learning

Machine Learning Explained

Achieving Balance: Strategies for Training Robust Machine Learning Models with Imbalanced Datasets

Machine Learning Process is inspired by Human Learning Process

What is Machine Learning?

What is Machine Learning and why is it important or useful?

?? Building a Top-Notch Machine Learning Product: Tips from my many missteps! ????