The need of ensembling
Debi Prasad Rath
@AmazeDataAI- Technical Architect | Machine Learning | Deep Learning | NLP | Gen AI | Azure | AWS | Databricks
Hi connections. Trust you are doing well. In this article we will try to deep dive about "ensembling". Let us get started.
Ensemble modelling as the name suggests tries to find the best model there by averaging a collective set of models with out of bag samples. This in turn offers the flexibility to build a robust model by comparing and combining various models with an idea that captures different data behavior and aspects. During the model predictions, errors get reduced by the performance of individual model. In simpler terms emsembling helps us to build a model that generalizes well and thus less prone to overfitting. At the same time each model is trained on different subsets of data (with replacement) so that each model learns all about the data behavior as much as possible.
In general there are three types of ensembling as bagging, boosting and stacking. Bagging is often termed as bootstapped aggregating with an idea that builds model from different subsets of data. Finally it averages all of their predictions to come up with a robust model. The process of bagging runs in parallel to get that final model meaning each individual model runs in parallel. Conversely, boosting creates sequential list of models where each model tries to correct error from previous model by weigting their predictions. It is conceived based on the idea of weak learners to become strong learner. Stacking is a phenomena of building a model that learns from one model to the other and combines predictions, often abbreviated as a meta model.
领英推荐
Ensembling often considers combining different models performance to seek better predictions from two or more models. Intuitively, ensembling invloves fitting many models(tree based) with different samples of same data to average predictions. It is a non-parametric model meaning small changes in data will not affect. Furthermore, these models are less prone to overfitting due to the fact final model is getting diversified as they capture different patterns in data. It provides that "confidence" in providing reliable estimates of data. Ensembling also tries to find a strong learner sequentially that corrects prediction from weak learners respectively. Ensembling can also stack different models using same data to combine predictions to the best possible way.
Thanks for all your time and support. Happy learning.