Aggregating Bootstrap (Bagging)

We discussed the Bootstrapping in my previous article. Based on that, we can take a look at another popular machine learning algorithm - Aggregating Bootstrap, also called Bagging. As it is described in Wikipedia, Bagging is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression.

Before we have a deep dive in Bagging, we need to know what the ensemble algorithm is. There is a Chinese saying “the wisdom of the masses exceeds that of the wisest individual” or simply as “two heads are better than one”. That is actually the core of the ensemble algorithm – multiple weak classifiers are better than one strong classifier (I use classifier instead of learner or other term because I will focus on the classification rather than regression in this article). Specifically, assuming that there is a certain degree of difference between the weak classifiers (such as different algorithms, or different parameters of the same algorithm), that will cause different classification results. In another words, the weak classifiers will make different mistakes, especially on the data points which are close to the decision boundary. But, if we can “somehow” combine the results from those weak classifiers to get a more reasonable boundary and reduce the overall error, we should be able to achieve better classification results. And that “somehow” is where the ensemble algorithm jumps in.

As mentioned above, Bagging is one of the machine learning ensemble meta-algorithms. Given the training data set D of size n, Bagging generates m new training sets Di, each of size n’, by sampling from D with replacement (bootstrap sampling). Then, it uses these m new training sets to train m weak classifiers and lastly it combines the outputs by voting for classification (by averaging the values for regression). That’s why this algorithm called aggregating bootstrap. The weak classifiers can be any algorithm/model, such as decision tree, SVM, etc. They don’t have to be same in one Bagging. The different weak classifiers would return different results. And the result of the most votes will be the final result of Bagging.

If we tweak Bagging a little bit, we will get another very powerful classifier called Random Forest. Compare to Bagging, Random Forest has the following unique features:

  • The weak classifiers are all decision trees. That’s why it is a forest.
  • For the decision trees, when selecting a split point, the learning algorithm is limited to a random sample of features of which to search, instead of searching all features and all feature values (if you are not familiar with decision tree algorithm, I have a previous article talked about it). That’s why it is called random.

Why is that helpful to split the decision tree in a random sample of features? Well, that’s because Bagging prefers its weak classifiers are uncorrelated or weakly correlated. Otherwise, if all classifiers provide the similar predications, a Bagging would approximate to a weak classifier. Unfortunately, the decision trees within a Bagging are likely correlated because decision tree algorithm is greedy. They all choose which variable to split on using a greedy algorithm that minimizes error. Thus, there is a chance that all decision trees within a Bagging have a lot of structural similarities and in turn have high correlation in their predications. Therefore, we force to randomly sample the features for the splitting to reduce the correlation of the decision trees.

要查看或添加评论,请登录

Ken Chen的更多文章

  • Unlocking New Dimensions in ChatGPT's Performance: Innovative Practices & Insights

    Unlocking New Dimensions in ChatGPT's Performance: Innovative Practices & Insights

    Recently, while scrolling through Weibo, I stumbled upon an intriguing article (https://weibo.com/1812166904/NormlrC1M)…

  • My Simple Implementation of AlphaGo Zero on Connect4

    My Simple Implementation of AlphaGo Zero on Connect4

    I recently implemented and trained a simple version of AlphaGo Zero on Connect4 game by using Python+Keras(Tensorflow…

  • Notes about Monte Carlo Tree Search

    Notes about Monte Carlo Tree Search

    1. Monte Carlo Tree is a tree-like data structure in which a node can have multiple child nodes.

  • K-means Algorithm

    K-means Algorithm

    Since the auto encoder algorithm I introduced earlier, let us talk about another unsupervised machine learning…

  • Gradient Descent and Backpropagation

    Gradient Descent and Backpropagation

    In previous articles, I have referred to the concepts of gradient descent and backpropagation for many times. But I did…

  • Autoencoder

    Autoencoder

    The machine learning methods discussed in my previous articles are all supervised machine learning methods. The…

    1 条评论
  • Activation Functions and Loss Functions

    Activation Functions and Loss Functions

    The medical definition of neurons is that they are the fundamental units of the brain and nervous system, the cells…

  • Boosting

    Boosting

    In the last article we introduced an ensemble algorithm based on Bootstrap technology. In this article, we will…

  • Some Notes about Bootstrapping

    Some Notes about Bootstrapping

    I wrote an article about Decision Tree several days ago. On this basis, we could try to expend our knowledge by…

  • Some of my notes about Convolution

    Some of my notes about Convolution

    This article only contains some of my study notes about convolution and convolutional neural network (CNN). If you…

社区洞察

其他会员也浏览了