Your Machine Learning (ML) Model is Wrong, now what?
In ML learning parlance, the bias-variance trade-off means that the model must find a happy medium between underfitting and overfitting.
If the ML model has a high bias, it will underfit because it did not learn enough about the relationship between the model's features and labels. In contrast, a high variance model has overlearned or memorized the training data and is incapable of generalizing on unseen data, resulting in an overfit model.
As ML practitioners, we are seekers of the low bias and low variance ML Model, the "Holy Grail." By increasing model complexity, we decrease bias but increase variance while reducing model complexity, we increase bias but decrease variance. The bias-variance trade-off is a perfect illustration of the no free lunch theorem.
To improve model performance, we need a super algorithm or meta-algorithms that combine several ML models providing access to the proverbial bias/variance "dial knob," leading us to a quick synopsis of the ensemble methods.
- Bagging (bootstrap aggregating, i.e., Random Forest) reduces high variance. How? Assuming that each model in the ensemble will not make the same errors on the test data set, by averaging individual models' predictions, the errors cancel out, yielding better predictions, which is quite akin to asking the audience (wisdom of the crowd).
- Boosting (i.e., XGBoost, AdaBoost, GBM) constructs an ensemble model with more capacity than the individual member models. It reduces bias more than variance. Each successive model focuses "the learning" on the examples the previous model got wrong.
- Stacking is similar to boosting but uses different ML models and combines their outputs to feed a secondary ML model to produce a prediction. It decreases variance but also controls high bias.
Back to the no free lunch theorem, although ensemble learning allows us to regulate the bias-variance trade-off, it also increases training time (i.e., compute resources), design time (i.e., which model to choose and types of architecture), and decreases model interpretability.
Low Voltage Systems Contractor
3 年This is almost like a formula to life.. moderation.
economics, ai, and crypto research @ flipside
3 年Variance: biased toward training data. Bias: high variance in training data. Don't you love ML vocabulary? ??