登录查看更多内容

Bias-Variance Tradeoff: What is it and why is it important?

Ravi Shankar

Machine Learning Manager | RecSys, LLM, CV, NLP | Scalable AI/ML

发布日期: 2017年1月3日

+ 关注

What is Bias- Variance Tradeoff?

The bias-variance tradeoff is an important aspect of machine/statistical learning.

All learning algorithms use a mathematical/statistical approach that contains an “error” term which can be further split into two components: reducible and irreducible error. As the name suggests, the Irreducible error is an inherent uncertainty associated with the model and is associated with a natural variability in a system. This cannot be reduced and nothing can be done about it. On the other hand, Reducible error, as the name suggests, can be and should be minimized further to maximize accuracy.

In supervised learning algorithms, this reducible error can be further decomposed into “error due to squared bias” and “error due to variance” The goal of the learning algorithm is to simultaneously reduce bias and variance in order to obtain an model that is the most feasible. However, achieving that is not so easy and in real life, there is a tradeoff to be made when selecting models of different flexibility or complexity and to minimize these sources of error!

Are you are a singer?

Let’s relate this to music, as I believe most of us love music and can easily relate to it. Imagine you are singer and recorded yourself singing and then you use an digital equalizer to edit the music. After all you don’t want your fans to listen to all that noise in the background. This noise can be that of a fan or someone shouting from behind, or of wind, can be anything. You can reduce this to minimum but cannot fully remove it. This noise can be paralleled with the error. What you’ll want to do is remove it. As you start removing it, you feel that the song clarity has increased and you feel better about your song. But if you keep tuning it more, you’ll feel that the quality has started to go down after a point. This might be due to the fact that you started to remove certain qualities of your own voice which made up the song. You started to remove the very definition of the music with which you started and this gave rise to what we call bias. So what needs to be done? There is a sweet spot in the middle. There is no formula to find it. But that’s where your listeners will be happy at.

What is Bias?

Bias is when your model misses to connect the predictors in the data to the response. Bias refers to the error that is introduced by approximating a problem, which may be extremely complicated, by a much simpler model. In fig 1 (left side), we can see that when we try to fit a simple model to a complex data, there is a high bias. In simpler terms, a simple model will not be able to fit complex data and would have a high bias. For example, one of the assumptions of a linear regression model is that there is a linear relationship between Y (the response variable) and X1,X2, . . . , Xn (the predictors or independent variables). It is unlikely that any real-life problem actually has such a simple linear relationship, and so performing linear regression will undoubtedly result in some bias in the estimate, always. This bias happens because this model cannot capture the true relationship between the predictors and the response.

In the figure below, there are three parts. The first two from the left have a substantially linear relationship between the predictor(s) and the response (Y) but the third one clearly has a non-linear relationship. So no matter how many training observations we are given, it will not be possible to produce an accurate estimate using a linear regression model. In other words, linear regression results in high bias in this example. However, the first two are very close to linear, and so given enough data, it should be possible for linear regression to produce an accurate estimate.

What is Variance?

Think of variance like this – suppose I ask you to give me 20 numbers between 1 to 1000. Now there are multiple options in front of you. Assuming that our brain acts as a learning algorithm, its quite possible that the values you gave are spread a lot between 1 to 1000. If I again asked to you give another 20 numbers between 100 to 1000, the values might be quite different (1st and 2nd set of 20 values). What happened – a slight change in data/conditions gave completely different values.

Along with the squared bias error, the error due to variance is the amount by which the prediction, over one training set, differs from the expected predicted value, over all the training sets. Variance measures how inconsistent are the predictions from one another, over different training sets, not whether they are accurate or not. Unlike bias, we do not compare the variance errors to the predicted and actual values, but different set of predicted values. For example, if a trained model gives wild predictions on multiple data sets, it would mean that the model has a large variance.

A model starts to have a large variance primarily because it starts to model the noise within the data which is inherent and cannot be removed. Ideally the predicted values should not vary too much between the training sets. However, if a method has high variance then small changes in the training data can result in large changes in the predicted values. In general, more flexible/ complex statistical methods have higher variance.

If we again use regression model that we used above to learn the pattern in the data and also assume the same functional form to estimate the target function, then the number of possible estimated function will be limited. Even though we get different functions for different training data, our search space is limited due to functional form (linear). If instead we used a decision tree to estimate the target function in large dimensional space, we might get different predictions for different training data for the same variables and the estimated function suggests changes from the previous function.

What’s the Solution? Which is Better? Bias or Variance!

Ideally a tilt towards either of them is not desired but while modelling real world problems, it is impossible to get rid of both of them at the same time. This is where the term “tradeoff” comes in.

he “tradeoff” between bias and variance can be viewed in this manner – a learning algorithm with low bias must be “flexible” so that it can fit the data well. But if the learning algorithm is too flexible (for instance, too linear), it will fit each training data set differently, and hence have high variance. By tuning the supervised learning models, it is possible to achieve the right amount of tradeoff, i.e. a sweet spot. There are two common metrics used in machine learning: training error and test error. The training set is for model fitting. The validation set is for estimating the prediction error so an appropriate model is chosen. And the test set is used to assess the model (and its error) once the final model is chosen. Underfit happens when the error is large, and overfit happens when there is a considerable different between the errors on training and testing set, although the overall errors are less.

In figure above, it can be observed that only one of bias or variance can be reduced at once and the optimum point should be chosen.

Still not clear? Please check out my post at https://analyticsbot.ml/2017/01/bias-variance-tradeoff-what-is-it-and-why-is-it-important/ for a detailed understanding.

Archit Singhal

AI Leader at Amazon

8 年

The more complex the model is, the more it overfits, so the bias ( the difference between actual and desired output) is less. Also, the variance, which can be seen in terms of how the performance gets affected when a new data set (test data) is presented to the trained model. This performance will definitely go down as you increase model's complexity and thus variance increases. The case for less complex models can be explained in a similar manner.

Divye Kapoor

Engineering Leadership/Distributed Systems

8 年

Ravi Shankar - shouldn't the graph be the reverse? Increasing model complexity reduces variance because of overfitting but introduces bias; reducing the model complexity reduces bias but increases variance due to underfitting.

Pradeep Sharma

DevRel Consultant | Ex Developer Advocate at Jina AI, Founder of Invide dev community and GitCommit.Show conf | #OpenSourceDiscovery newsletter | Author

8 年

Great post! In simple terms what I have understood by bias-variance trade-off is that : You can do one of following things A. Either your solution can be more precise for a specific set of problem to give you good result(low bias, high variance) B. Or scope of your problem can be a lot more extensive to predict accurately on larger data-set (low variance, high bias)

1 次回应

查看更多评论

要查看或添加评论，请登录

Ravi Shankar的更多文章

How I started with Deep Learning?

2017年5月22日

How I started with Deep Learning?

Note: In this post, I talk about my learning in deep learning, the courses I took to understand, and the widely used…

4 条评论
Measuring Text Similarity in Python

2017年5月15日

Measuring Text Similarity in Python

Note: This article has been taken from a post on my blog. A while ago, I shared a paper on LinkedIn that talked about…

1 条评论
Getting started with Apache Spark

2017年4月14日

Getting started with Apache Spark

If you are in the big data space, you must have head of these two Apache Projects – Hadoop & Spark. To read more on…
Intuitive Explanation of "MapReduce"

2017年2月21日

Intuitive Explanation of "MapReduce"

How many unique words are there in this sentence which you are reading? The answer which you will say is 12 (Note: word…
Getting started with Hadoop

2017年2月15日

Getting started with Hadoop

Note: This is a long post. It talks about big data as a concept, what is Apache Hadoop, "Hello World" program of Hadoop…

7 条评论
What is the Most Complex thing in the Universe?

2017年2月5日

What is the Most Complex thing in the Universe?

What is the most complex piece of creation (natural/artificial) in this universe? Is it the human brain? But if the…

11 条评论
Automate Finding Items on Craigslist || Python & Selenium to the Rescue

2017年1月28日

Automate Finding Items on Craigslist || Python & Selenium to the Rescue

If necessity is the mother of invention, then laziness is sometimes its father! Craigslist, especially in the United…

7 条评论
Getting Started with Python!

2017年1月23日

Getting Started with Python!

Note: This post is only for Python beginners. If you are comfortable with it, there might be nothing new to learn.

2 条评论
L1, L2 Regularization – Why needed/What it does/How it helps?

2017年1月14日

L1, L2 Regularization – Why needed/What it does/How it helps?

Simple is better! That’s the whole notion behind regularization. I recently wrote about Linear Regression and Bias…

4 条评论
Understanding Linear Regression

2016年12月25日

Understanding Linear Regression

In my recent post on my blog, I tried to present my understanding of linear regression with charts and tables. Here's…

See all articles

Bias-Variance Tradeoff: What is it and why is it important?

Ravi Shankar

Machine Learning Manager | RecSys, LLM, CV, NLP | Scalable AI/ML

Ravi Shankar的更多文章

社区洞察

其他会员也浏览了

Machine Learning: The Ultimate Battle Royale of Algorithms

Bias-Variance Tradeoff in Machine Learning

An Impractical Intro to Machine Learning

Extracting Link Level Features from Graphs for Machine Learning Models: Part 3 of X of my notes

Common Machine Learning Algorithms

Bias and Variance in Machine Learning

How to Evaluate a Machine Learning Based Trading System?

Unlocking the Potential of Machine Learning: A Look into the Various Applications

A CLASS QUESTION

Ravi Shankar的更多文章

How I started with Deep Learning?

Measuring Text Similarity in Python

Getting started with Apache Spark

Intuitive Explanation of "MapReduce"

Getting started with Hadoop

What is the Most Complex thing in the Universe?

Automate Finding Items on Craigslist || Python & Selenium to the Rescue

Getting Started with Python!

L1, L2 Regularization – Why needed/What it does/How it helps?

Understanding Linear Regression

社区洞察

其他会员也浏览了

Machine Learning: The Ultimate Battle Royale of Algorithms

Bias-Variance Tradeoff in Machine Learning

An Impractical Intro to Machine Learning

Extracting Link Level Features from Graphs for Machine Learning Models: Part 3 of X of my notes

Common Machine Learning Algorithms

Bias and Variance in Machine Learning

How to Evaluate a Machine Learning Based Trading System?

Unlocking the Potential of Machine Learning: A Look into the Various Applications

A CLASS QUESTION