a. What is random forests? b. What is the process of growing a random forest?
Haiqing Hua
I share news from Chinese website (you can use google translate please also subscribe my YouTube Channel) | Ideologist | Poet | Futurist | Educator | Technologist | Business Analyst | Data Analyst | Realtor |
?
Example 8: Solution
a. Decision-tree learners can create over-complex trees that do not generalize the data well. This is called overfitting. Decision trees can be unstable because small variations in the data might result in a completely different tree being generated. The algorithms cannot guarantee to return the globally optimal decision tree.
Decision tree learners create biased trees if some classes dominate. By applying a series of simple rules or criteria over and over again, which choose variables that best predict our target variable. While decision trees proceed by searching for a split on every variable in every node, Random Forests searches for a split on only one variable in a node. The variable that has the largest association with the Target among candidate explanatory variables but only among those explanatory variables that have been randomly selected to be tested for that node.
?
b. The process of growing random forests
·???????? First, a small subset of explanatory variables is selected at random.
·???????? Next the node is split with the BEST variable among the small number of randomly selected variables.
·???????? Not the best variable of all the variables, as is true when we are interested in creating only single decision tree.
·???????? Once the best variable from the eligible random subset of variables is used to split the node in question.
·???????? A new list of eligible explanatory variables is selected on random to split on the next node.
·???????? This continues until the tree is fully grown, and ideally there is one observation in each terminal mode.
?
?
·???????? With a large number of explanatory variables, the Eligible variables set will be quite different from node to node.
·???????? However, important variables will eventually make it into the tree.
·???????? And their relative success in predicting the target variable will begin to get them larger and larger numbers of "votes" in their favor.
·???????? Importantly, each tree is growing on a different randomly selected sample of Bagged data with the remaining Out of Bag (OOB) data available to test the accuracy of each tree.
·???????? For each tree, the Bagging Process selects about 60% of the original sample, while the resulting tree is tested against the remaining 40% of the sample.
?
Example 9: [KTO1, KTO2 , KTO3 , STO1]
a. What is Na?ve Bayes Theorem?
b. What is Na?ve Bayes Classifier?
c. What is smoothing and diagnostics for Na?ve Bayes Classifier?
?
领英推荐
Example 9: Solution
a. Bayes' theorem defines P(C|A) = P(A|C)P(C)/P(A).
The probability of testing positive, that is P(A), needs to be computed first. That computation is shown in Equation below:
???????????????????????????????
?
According to Bayes' theorem, the probability
???????????????????????????????
?
b. With two simplifications, Bayes' theorem can be extended to become a na?ve Bayes classifier. The first simplification is to use the conditional independence assumption. That is, each attribute is conditionally independent of every other attribute given a class label ci. See Equation 7-13.
??????????????? ???
Therefore, this na?ve assumption simplifies the computation of P(a1, a2,..., am|ci) .
The second simplification is to ignore the denominator P(a1, a2, ..., am ).
Because P(a1, a2, ..., am ) appears in the denominator of P(ci|A) for all values of i, removing the denominator will have no impact on the relative probability scores and will simplify calculations.
Na?ve Bayes classification applies the two simplifications mentioned earlier and, as a result, P(ci| a1, a2, ..., am ) is proportional to the product of P(aj|ci) times P(ci).
This is shown in Equation 7-14
?
c. Smoothing and diagnostics for Na?ve Bayes Classifier
If one of the attribute values does not appear with one of the class labels within the training set, the corresponding P(aj|ci) will equal zero. When this happens, the resulting P(ci|A) from multiplying all the P(aj|ci)(j∈[1, m]) immediately becomes zero regardless of how large some of the conditional probabilities are.
Therefore overfitting occurs. Smoothing techniques can be employed to adjust the probabilities of P(aj|ci) and to ensure a nonzero value of P(ci|A). A smoothing technique assigns a small nonzero probability to rare events not included in the training dataset.
?
Smoothing techniques are available in most standard software packages for na?ve Bayes classifiers. However, if for some reason (like performance concerns) the na?ve Bayes classifier needs to be coded directly into an application, the smoothing and logarithm calculations should be incorporated into the implementation.
Diagnostics
Unlike logistic regression, na?ve Bayes classifiers can handle missing values. Na?ve Bayes is also robust to irrelevant variables—variables that are distributed among all the classes whose effects are not pronounced.
The model is simple to implement even without using libraries. The prediction is based on counting the occurrences of events, making the classifier efficient to run. Na?ve Bayes is computationally efficient and is able to handle high-dimensional data efficiently.
?
Compared to decision trees, na?ve Bayes is more resistant to overfitting, especially with the presence of a smoothing technique. Despite the benefits of na?ve Bayes, it also comes with a few disadvantages. Na?ve Bayes assumes the variables in the data are conditionally independent. Therefore, it is sensitive to correlated variables because the algorithm may double count the effects. As an example, assume that people with low income and low credit tend to default. If the task is to score “default” based on both income and credit as two separate attributes, na?ve Bayes would experience the double-counting effect on the default outcome, thus reducing the accuracy of the prediction.
?
Although probabilities are provided as part of the output for the prediction, na?ve Bayes classifiers in general are not very reliable for probability estimation and should be used only for assigning class labels. Na?ve Bayes in its simple form is used only with categorical variables. Any continuous variables should be converted into a categorical variable with the process known as discretization, as shown earlier. In common statistical software packages, however, na?ve Bayes is implemented in a way that enables it to handle continuous variables as well.