Bias-Variance Tradeoff: What is it and why is it important?
What is Bias- Variance Tradeoff?
The bias-variance tradeoff is an important aspect of machine/statistical learning.
All learning algorithms use a mathematical/statistical approach that contains an “error” term which can be further split into two components: reducible and irreducible error. As the name suggests, the Irreducible error is an inherent uncertainty associated with the model and is associated with a natural variability in a system. This cannot be reduced and nothing can be done about it. On the other hand, Reducible error, as the name suggests, can be and should be minimized further to maximize accuracy.
In supervised learning algorithms, this reducible error can be further decomposed into “error due to squared bias” and “error due to variance” The goal of the learning algorithm is to simultaneously reduce bias and variance in order to obtain an model that is the most feasible. However, achieving that is not so easy and in real life, there is a tradeoff to be made when selecting models of different flexibility or complexity and to minimize these sources of error!
Are you are a singer?
Let’s relate this to music, as I believe most of us love music and can easily relate to it. Imagine you are singer and recorded yourself singing and then you use an digital equalizer to edit the music. After all you don’t want your fans to listen to all that noise in the background. This noise can be that of a fan or someone shouting from behind, or of wind, can be anything. You can reduce this to minimum but cannot fully remove it. This noise can be paralleled with the error. What you’ll want to do is remove it. As you start removing it, you feel that the song clarity has increased and you feel better about your song. But if you keep tuning it more, you’ll feel that the quality has started to go down after a point. This might be due to the fact that you started to remove certain qualities of your own voice which made up the song. You started to remove the very definition of the music with which you started and this gave rise to what we call bias. So what needs to be done? There is a sweet spot in the middle. There is no formula to find it. But that’s where your listeners will be happy at.
What is Bias?
Bias is when your model misses to connect the predictors in the data to the response. Bias refers to the error that is introduced by approximating a problem, which may be extremely complicated, by a much simpler model. In fig 1 (left side), we can see that when we try to fit a simple model to a complex data, there is a high bias. In simpler terms, a simple model will not be able to fit complex data and would have a high bias. For example, one of the assumptions of a linear regression model is that there is a linear relationship between Y (the response variable) and X1,X2, . . . , Xn (the predictors or independent variables). It is unlikely that any real-life problem actually has such a simple linear relationship, and so performing linear regression will undoubtedly result in some bias in the estimate, always. This bias happens because this model cannot capture the true relationship between the predictors and the response.
In the figure below, there are three parts. The first two from the left have a substantially linear relationship between the predictor(s) and the response (Y) but the third one clearly has a non-linear relationship. So no matter how many training observations we are given, it will not be possible to produce an accurate estimate using a linear regression model. In other words, linear regression results in high bias in this example. However, the first two are very close to linear, and so given enough data, it should be possible for linear regression to produce an accurate estimate.
What is Variance?
Think of variance like this – suppose I ask you to give me 20 numbers between 1 to 1000. Now there are multiple options in front of you. Assuming that our brain acts as a learning algorithm, its quite possible that the values you gave are spread a lot between 1 to 1000. If I again asked to you give another 20 numbers between 100 to 1000, the values might be quite different (1st and 2nd set of 20 values). What happened – a slight change in data/conditions gave completely different values.
Along with the squared bias error, the error due to variance is the amount by which the prediction, over one training set, differs from the expected predicted value, over all the training sets. Variance measures how inconsistent are the predictions from one another, over different training sets, not whether they are accurate or not. Unlike bias, we do not compare the variance errors to the predicted and actual values, but different set of predicted values. For example, if a trained model gives wild predictions on multiple data sets, it would mean that the model has a large variance.
A model starts to have a large variance primarily because it starts to model the noise within the data which is inherent and cannot be removed. Ideally the predicted values should not vary too much between the training sets. However, if a method has high variance then small changes in the training data can result in large changes in the predicted values. In general, more flexible/ complex statistical methods have higher variance.
If we again use regression model that we used above to learn the pattern in the data and also assume the same functional form to estimate the target function, then the number of possible estimated function will be limited. Even though we get different functions for different training data, our search space is limited due to functional form (linear). If instead we used a decision tree to estimate the target function in large dimensional space, we might get different predictions for different training data for the same variables and the estimated function suggests changes from the previous function.
What’s the Solution? Which is Better? Bias or Variance!
Ideally a tilt towards either of them is not desired but while modelling real world problems, it is impossible to get rid of both of them at the same time. This is where the term “tradeoff” comes in.
he “tradeoff” between bias and variance can be viewed in this manner – a learning algorithm with low bias must be “flexible” so that it can fit the data well. But if the learning algorithm is too flexible (for instance, too linear), it will fit each training data set differently, and hence have high variance. By tuning the supervised learning models, it is possible to achieve the right amount of tradeoff, i.e. a sweet spot. There are two common metrics used in machine learning: training error and test error. The training set is for model fitting. The validation set is for estimating the prediction error so an appropriate model is chosen. And the test set is used to assess the model (and its error) once the final model is chosen. Underfit happens when the error is large, and overfit happens when there is a considerable different between the errors on training and testing set, although the overall errors are less.
In figure above, it can be observed that only one of bias or variance can be reduced at once and the optimum point should be chosen.
Still not clear? Please check out my post at https://analyticsbot.ml/2017/01/bias-variance-tradeoff-what-is-it-and-why-is-it-important/ for a detailed understanding.
AI Leader at Amazon
8 年The more complex the model is, the more it overfits, so the bias ( the difference between actual and desired output) is less. Also, the variance, which can be seen in terms of how the performance gets affected when a new data set (test data) is presented to the trained model. This performance will definitely go down as you increase model's complexity and thus variance increases. The case for less complex models can be explained in a similar manner.
Engineering Leadership/Distributed Systems
8 年Ravi Shankar - shouldn't the graph be the reverse? Increasing model complexity reduces variance because of overfitting but introduces bias; reducing the model complexity reduces bias but increases variance due to underfitting.
DevRel Consultant | Ex Developer Advocate at Jina AI, Founder of Invide dev community and GitCommit.Show conf | #OpenSourceDiscovery newsletter | Author
8 年Great post! In simple terms what I have understood by bias-variance trade-off is that : You can do one of following things A. Either your solution can be more precise for a specific set of problem to give you good result(low bias, high variance) B. Or scope of your problem can be a lot more extensive to predict accurately on larger data-set (low variance, high bias)