BxD Primer Series: Bagging Ensemble Models

BxD Primer Series: Bagging Ensemble Models

Hey there ??

Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on?Bagging Ensemble Models. Let’s get started:

The What:

Bagging (short for?Bootstrap?aggregating) works by randomly selecting a subset of training data (with replacement) and training a model on that subset. This process is repeated multiple times, with each iteration producing a new model. Then, all trained models are combined by averaging their predictions (in regression or probabilistic problems) or taking a majority vote (in classification problems).

Bagging reduces overfitting and increase the stability and accuracy of final model. It can help to improve the performance of models that are prone to high variance, such as decision trees.

Number of models to include, also known as bagging size, in the ensemble is an important parameter to tune in bagging. A larger number of models will generally produce a more accurate ensemble model, but at the cost of increased complexity and training time.

No alt text provided for this image

Bagging is used with a variety of machine learning algorithms, including decision trees, random forests, K-nearest neighbors (KNN), support vector machines (SVM), and neural networks.

We will cover a random forests in more detail below:

Random Forests:

Random forests are an extension of the decision tree algorithm (check our edition on decision trees?here ?and?here ) that used concept of bagging and feature randomization. They are widely used for a variety of applications including classification, regression, and feature selection.

The name "random forest" comes from the fact that each decision tree in the ensemble is trained on a random subset of data and a random subset of features.

To build a random forest:

  • Create large number of decision trees, each of which is trained on a different subset of training data. This reduces variance of model and improve its generalization performance.
  • At each split of decision tree, instead of considering all of the available features, randomly select a subset of features to consider. This reduces overfitting by introducing additional randomness into the model.

To make a prediction, simply aggregate the predictions of all decision trees in ensemble:

  • For classification problems, use a majority vote to determine the final prediction.
  • For regression problems, average the predictions to get the final prediction.

Advantages of random forests:

  • Ability to handle high-dimensional data with many features.
  • Robust to noise and outliers in data, which makes them a good choice for real-world applications.

Disadvantages of random forests:

  • Because a random forest is an ensemble of decision trees, it can be difficult to interpret how the model is making its predictions.
  • Computationally expensive to train.

Main hyper-parameters of random forests, which are tuned using grid or random search or bayesian optimization with k-fold validation:

  1. n_estimators: The number of decision trees in the forest.
  2. max_depth: The maximum depth of each decision tree.
  3. max_features: The maximum number of features to consider at each split.
  4. min_samples_split: The minimum number of samples required to split an internal node.
  5. min_samples_leaf: The minimum number of samples required to be at a leaf node.
  6. criterion: The function used to measure the quality of a split (e.g. Gini impurity or entropy).
  7. min_impurity_decrease: The minimum impurity decrease required to split an internal node.
  8. ccp_alpha: The complexity parameter used for post-pruning the decision trees to reduce overfitting.
  9. max_leaf_nodes: The maximum number of leaf nodes in each decision tree.

Using Out-of-Bag as Substitute for Validation Set:

When we create a bagging ensemble, we randomly select a subset of the training data (with replacement) to train each model in the ensemble. This means that some of the data points in the original dataset are not used to train any of the models in the ensemble. These unused data points are referred to as "out-of-bag" (OOB) samples.

The out-of-bag error (OOB error) is calculated by evaluating the performance of each model in the ensemble on the out-of-bag samples that were not used to train that particular model. By averaging the performance of all the models on their respective out-of-bag samples, we can get an estimate of the generalization error of the entire ensemble model.

  • It allows us to evaluate the performance of model without need for a separate validation set. This is useful when the size of dataset is limited.
  • It can also be used to tune the hyper-parameters of bagging ensemble model. For example, we can vary the number of estimators in ensemble and choose the value that gives lowest out-of-bag error.

Estimating Feature Importance:

Several methods can be used to determine the ranking of features based on their contribution to predictive performance of ensemble model. Most common methods are:

? Mean Decrease Impurity (MDI): It is commonly used for tree-based ensemble models such as random forests and gradient boosting machines. MDI calculates the total reduction in impurity (e.g., Gini impurity or entropy) that is achieved by splitting on a particular feature, averaged over all trees in the ensemble. Features with higher MDI scores are considered to be more important.

? Permutation Feature Importance (PFI): It is a permutation-based method that randomly shuffles the values of each feature in test set and measures the decrease in performance of model. This process is repeated for each feature. The idea is that if a feature is important, shuffling its values should lead to a significant drop in accuracy. Features with higher PFI scores are considered to be more important.

No alt text provided for this image

Where,

  • X_j is the j’th feature
  • m is the number of test samples
  • Acc is the accuracy of model on test set
  • Acc_i(X_j) is the accuracy of model on test set when values of feature?j?are randomly shuffled for sample?i

? SHapley Additive exPlanations (SHAP): SHAP is a game-theoretic method that assigns an importance score to each feature based on the contribution of that feature to prediction for each instance in test set. SHAP values can be used to explain the predictions of individual instances, as well as to estimate feature importance at a global level.

SHAP, can also provide additional insights into how each feature contributes to the model's output, which can be useful in understanding the model's behavior and identifying areas for improvement. Read detailed interpretation and calculation?here .

? Drop-Column Importance: This method estimates feature importance by training ensemble model with and without each feature and comparing the difference in performance. The idea is that drop in performance when a feature is removed indicates importance of that feature.

In general, it's a good idea to use multiple methods to estimate feature importance and compare the results to determine rank of feature important.

Time for you to support:

  1. Reply to this article with your question
  2. Forward/Share to a friend who can benefit from this
  3. Chat on Substack with BxD (here )
  4. Engage with BxD on LinkedIN (here )

In next coming posts, we will cover three more Ensemble Models: Boosting, Ensemble of Experts, Bayesian Model Averaging

Let us know your feedback!

Until then,

Have a great time! ??

#businessxdata ?#bxd ?#Bagging #Ensemble ?#timeseries ?#primer

要查看或添加评论,请登录

Mayank K.的更多文章

  • What we look for in new recruits?

    What we look for in new recruits?

    Personalization is the #1 use case of most of AI technology (including Generative AI, Knowledge Graphs…

  • 500+ Enrollments, ?????????? Ratings and a Podcast

    500+ Enrollments, ?????????? Ratings and a Podcast

    We are all in for AI Driven Marketing Personalization. This is the niche where we want to build this business.

  • What you mean 'Build A Business'?

    What you mean 'Build A Business'?

    We are all in for AI Driven Personalization in Business. This is the niche where we want to build this business.

  • Why 'AI-Driven Personalization' niche?

    Why 'AI-Driven Personalization' niche?

    We are all in for AI Driven Personalization in Business. In fact, this is the niche where we want to build this…

  • Entering the next chapter of BxD

    Entering the next chapter of BxD

    We are all in for AI Driven Personalization in Business. And recently we created a course about it.

    1 条评论
  • We are ranking #1

    We are ranking #1

    We are all in for AI Driven Personalization in Business. And recently we created a course about it.

  • My favorites from the new release

    My favorites from the new release

    The Full version of BxD newsletter has a new home. Subscribe on LinkedIn: ?? https://www.

  • Many senior level jobs inside....

    Many senior level jobs inside....

    Hi friend - As you know, we recently completed 100 editions of this newsletter and I was the primary publisher so far…

  • People need more jobs and videos.

    People need more jobs and videos.

    From the 100th edition celebration survey conducted last week- one point is standing out that people need more jobs and…

  • BxD Saturday Letter #202425

    BxD Saturday Letter #202425

    Please take 2 mins to send your feedback. Link: https://forms.

社区洞察

其他会员也浏览了