Random Forest



Decision trees(DT) use the purity or information gain to decide on the most "helpful" feature to split the data. In all datasets, usually, there will be a few features that will be most impactful, and hence it is possible that while you are building an ensemble of DTs, many trees will use the same features for splitting. This phenomenon is called feature dominance. Random forest avoids this by simply taking random samples of features for each tree. To eliminate this correlation, Breiman suggested that a subset of features should be selected from the pool of features to build the model on the bootstrapped sample. By ensuring that each model is trained on a random subset of features, the correlation between the different trees is reduced and the model becomes more robust.

A random subset of observations is chosen every time a new tree is built in a forest.

A random subset of features is chosen every time a node is being split inside a tree.


Random forest, which is one of the bagging models, is composed of deep decision trees to create a forest which has low variance. Thus, this ensemble model resolves the problem of overfitting, which we have when we work with individual decision trees.

"Random Forests is a substantial modification of bagging that builds a large collection of?de-correlated?trees, and averages them."

When constructing a tree within a bagging ensemble, all input features are considered to determine the best split. If the data contains one or two dominant features, those dominant features are always selected first in every tree within the ensemble, resulting in a high correlation among the trees.

The Random Forest algorithm further reduces the model error due to variance by de-correlating each tree within the ensemble. This is acheived by considering only a sample of features from the available feature set at each split in the decision tree. This procedure results in a higher diversity of trees within the ensemble, which can reduce the overall variance.

There is no need to prune trees in a random forest because even if some trees overfit the training set, it will not matter when the results of all the trees are aggregated.


The most important hyperparameter in RF(Random Forest)

Number of trees?- The number of trees to be built in the forest.?( By default: n_estimatos=100)

Depth of each tree?- Recall that the tree depth refers to the number of splits a tree can make before coming to a prediction. The deeper the tree more complex it is, and the shallower the tree, the simpler it is. Since the assumption is that an ensemble model combines weak learners, the depth of the threshold is less.(By deafault: max_depth=None)

Max features to consider?- the number of features to be subsampled from the pool of features. As a rule of thumb, the number of features is the square root of the total number of features. While this is a good place to start, it has been derived from experience and not set in stone. We recommend that you start at this point and experiment to pick the one that suits your model.(max_features : {"auto", "sqrt", "log2"}, int or float, default="auto" (If "auto", then?max_features=sqrt(n_features)).

Max samples?- The number of samples to be considered in each bootstrapped sample.( int or float, default=None If bootstrap is True, the number of samples to draw from X to train each base estimator.)?

min_samples_leaf?: int or float, default=1 The minimum number of samples required to be at a leaf node. A split point at any depth will only be considered if it leaves at least?min_samples_leaf?training samples in each of the left and right branches. This may have the ef


"""Tuning"""


Grid Search:?The traditional method of hyperparameter tuning creates a grid using the predefined values of hyperparameters.It simply tests all combinations and gives you the ones that performed the best in the given set. Grid search is also computationally expensive.

Random Search:?Tests random points in a predefined hyperparameter space to test for best performance. Using the random search, you will find the actual minimum of the model error by sheer luck. The more points the random search tries out, the more likely it is you will stumble onto the actual minimum.

Bayesian Method:?More intelligent than grid search or random search. Bayesian methods use the performance of earlier attempts to improve their next attempts.Bayesian methods are preferred when the actual model is computationally expensive.


[283]:


要查看或添加评论,请登录

Haiqing Hua的更多文章

社区洞察

其他会员也浏览了