BxD Primer Series: Bagging Ensemble Models
Hey there ??
Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on?Bagging Ensemble Models. Let’s get started:
The What:
Bagging (short for?Bootstrap?aggregating) works by randomly selecting a subset of training data (with replacement) and training a model on that subset. This process is repeated multiple times, with each iteration producing a new model. Then, all trained models are combined by averaging their predictions (in regression or probabilistic problems) or taking a majority vote (in classification problems).
Bagging reduces overfitting and increase the stability and accuracy of final model. It can help to improve the performance of models that are prone to high variance, such as decision trees.
Number of models to include, also known as bagging size, in the ensemble is an important parameter to tune in bagging. A larger number of models will generally produce a more accurate ensemble model, but at the cost of increased complexity and training time.
Bagging is used with a variety of machine learning algorithms, including decision trees, random forests, K-nearest neighbors (KNN), support vector machines (SVM), and neural networks.
We will cover a random forests in more detail below:
Random Forests:
Random forests are an extension of the decision tree algorithm (check our edition on decision trees?here ?and?here ) that used concept of bagging and feature randomization. They are widely used for a variety of applications including classification, regression, and feature selection.
The name "random forest" comes from the fact that each decision tree in the ensemble is trained on a random subset of data and a random subset of features.
To build a random forest:
To make a prediction, simply aggregate the predictions of all decision trees in ensemble:
Advantages of random forests:
Disadvantages of random forests:
Main hyper-parameters of random forests, which are tuned using grid or random search or bayesian optimization with k-fold validation:
领英推荐
Using Out-of-Bag as Substitute for Validation Set:
When we create a bagging ensemble, we randomly select a subset of the training data (with replacement) to train each model in the ensemble. This means that some of the data points in the original dataset are not used to train any of the models in the ensemble. These unused data points are referred to as "out-of-bag" (OOB) samples.
The out-of-bag error (OOB error) is calculated by evaluating the performance of each model in the ensemble on the out-of-bag samples that were not used to train that particular model. By averaging the performance of all the models on their respective out-of-bag samples, we can get an estimate of the generalization error of the entire ensemble model.
Estimating Feature Importance:
Several methods can be used to determine the ranking of features based on their contribution to predictive performance of ensemble model. Most common methods are:
? Mean Decrease Impurity (MDI): It is commonly used for tree-based ensemble models such as random forests and gradient boosting machines. MDI calculates the total reduction in impurity (e.g., Gini impurity or entropy) that is achieved by splitting on a particular feature, averaged over all trees in the ensemble. Features with higher MDI scores are considered to be more important.
? Permutation Feature Importance (PFI): It is a permutation-based method that randomly shuffles the values of each feature in test set and measures the decrease in performance of model. This process is repeated for each feature. The idea is that if a feature is important, shuffling its values should lead to a significant drop in accuracy. Features with higher PFI scores are considered to be more important.
Where,
? SHapley Additive exPlanations (SHAP): SHAP is a game-theoretic method that assigns an importance score to each feature based on the contribution of that feature to prediction for each instance in test set. SHAP values can be used to explain the predictions of individual instances, as well as to estimate feature importance at a global level.
SHAP, can also provide additional insights into how each feature contributes to the model's output, which can be useful in understanding the model's behavior and identifying areas for improvement. Read detailed interpretation and calculation?here .
? Drop-Column Importance: This method estimates feature importance by training ensemble model with and without each feature and comparing the difference in performance. The idea is that drop in performance when a feature is removed indicates importance of that feature.
In general, it's a good idea to use multiple methods to estimate feature importance and compare the results to determine rank of feature important.
Time for you to support:
In next coming posts, we will cover three more Ensemble Models: Boosting, Ensemble of Experts, Bayesian Model Averaging
Let us know your feedback!
Until then,
Have a great time! ??