登录查看更多内容

Understanding statistical definition of Bias and Variance

Amineh Dadsetan

Software Engineer at Amazon

发布日期: 2024年12月14日

+ 关注

Why Bias and Variance matter?

In practical Machine Learning, we would want to know if trained models are overfitting or under-fitting.

Overfitting is associated with high Variance and under-fitting is associated with high Bias.

But the use of term variance is not exactly the same as the statistical variance.

Why overfitting is associated with high Variance?

Overfitting means that if we pick a different training set, we would get a different model. That is roughly what we mean by Variance.

But Variance has a very specific definition in statistics. How would that relate to the above definition?

In more exact terms the Variance is the variance of the predictions for y_0 = f(x_0).

Statistic definition of variance

In statistics variance is the average of the squared differences from the Mean of a set of number.

This brings us to our next question. Variance over which set? Answer is over set of all possible training sets. Yeah sets of sets sounds too nerdy to me as well.

领英推荐

COURSE LAUNCH! Microsoft Copilot for Excel

Maven Analytics 7 个月前

To make this more concrete I use a 1D input feature X and sampled 1000 Y's for that.

On the left side you you see the whole data set X in Grey and the training set we have chosen in blue.

On the right side you see the line that would be trained on the training set.

The red dots is the test datapoint for x0 = 10 and it's trained value y0.

Now you might ask why we might sample different training sets. Why don't we just use the whole data set?

The answer is that we don't know the whole data set. The whole data set is the population and we can only have some assumption about it.

I will talk about the population data set and Bias-Variance tradeoff in next post.

要查看或添加评论，请登录

Amineh Dadsetan的更多文章

Configure zsh shell.

2024年11月24日

Configure zsh shell.

Zsh: A Better Shell ?? A bit of background: you can always customize your shell environment by adding stuff to .bashrc…
Implementing Paxos: Simulating Consensus Protocols with Docker and Portainer

2024年11月21日

Implementing Paxos: Simulating Consensus Protocols with Docker and Portainer

Recently I have been studying consensus protocols such as Paxos. I came across this article from Leslie Lamport.
What to Do with Your GPU Tonight

2024年11月17日

What to Do with Your GPU Tonight

Got an NVIDIA GPU and wanna start playing around? Try this: 1. Check NVIDIA Driver Installation Run: 2.
?? SSH Access & Application Stacks on My Personal PC

2024年11月11日

?? SSH Access & Application Stacks on My Personal PC

I’ve always wanted to have my personal projects on a central server, just like the setups many companies provide their…

4 条评论

Understanding statistical definition of Bias and Variance

Amineh Dadsetan

Software Engineer at Amazon

Why Bias and Variance matter?

Why overfitting is associated with high Variance?

Statistic definition of variance

领英推荐

Amineh Dadsetan的更多文章

社区洞察

其他会员也浏览了

Feature Selection and Dimensionality Reduction

Machine Learning Unveils House Price Predictions!

The Curse of Dimensionality: When "More Data" Becomes a Nightmare.

The single biggest problem in Data and AI is...

Data Optimizations Techniques in the Machine Learning

From Startups to Success in Data Science Industry: 5 Important Tools

All models are wrong, but some are useful

Grad Descent, GDWM, RMSProp & Adam Optimizers

Data Distribution in Machine Learning

Column Transformer in Machine Learning - Part 11

Why Bias and Variance matter?

Why overfitting is associated with high Variance?

Statistic definition of variance

领英推荐

Amineh Dadsetan的更多文章

Configure zsh shell.

Implementing Paxos: Simulating Consensus Protocols with Docker and Portainer

What to Do with Your GPU Tonight

?? SSH Access & Application Stacks on My Personal PC

社区洞察

其他会员也浏览了

Feature Selection and Dimensionality Reduction

Machine Learning Unveils House Price Predictions!

The Curse of Dimensionality: When "More Data" Becomes a Nightmare.

The single biggest problem in Data and AI is...

Data Optimizations Techniques in the Machine Learning

From Startups to Success in Data Science Industry: 5 Important Tools

All models are wrong, but some are useful

Grad Descent, GDWM, RMSProp & Adam Optimizers

Data Distribution in Machine Learning

Column Transformer in Machine Learning - Part 11