登录查看更多内容

a. What is random forests? b. What is the process of growing a random forest?

Haiqing Hua

I share news from Chinese website (you can use google translate please also subscribe my YouTube Channel) | Ideologist | Poet | Futurist | Educator | Technologist | Business Analyst | Data Analyst | Realtor |

发布日期: 2024年3月20日

Example 8: Solution

a. Decision-tree learners can create over-complex trees that do not generalize the data well. This is called overfitting. Decision trees can be unstable because small variations in the data might result in a completely different tree being generated. The algorithms cannot guarantee to return the globally optimal decision tree.

Decision tree learners create biased trees if some classes dominate. By applying a series of simple rules or criteria over and over again, which choose variables that best predict our target variable. While decision trees proceed by searching for a split on every variable in every node, Random Forests searches for a split on only one variable in a node. The variable that has the largest association with the Target among candidate explanatory variables but only among those explanatory variables that have been randomly selected to be tested for that node.

b. The process of growing random forests

·???????? First, a small subset of explanatory variables is selected at random.

·???????? Next the node is split with the BEST variable among the small number of randomly selected variables.

·???????? Not the best variable of all the variables, as is true when we are interested in creating only single decision tree.

·???????? Once the best variable from the eligible random subset of variables is used to split the node in question.

·???????? A new list of eligible explanatory variables is selected on random to split on the next node.

·???????? This continues until the tree is fully grown, and ideally there is one observation in each terminal mode.

·???????? With a large number of explanatory variables, the Eligible variables set will be quite different from node to node.

·???????? However, important variables will eventually make it into the tree.

·???????? And their relative success in predicting the target variable will begin to get them larger and larger numbers of "votes" in their favor.

·???????? Importantly, each tree is growing on a different randomly selected sample of Bagged data with the remaining Out of Bag (OOB) data available to test the accuracy of each tree.

·???????? For each tree, the Bagging Process selects about 60% of the original sample, while the resulting tree is tested against the remaining 40% of the sample.

Example 9: [KTO1, KTO2 , KTO3 , STO1]

a. What is Na?ve Bayes Theorem?

b. What is Na?ve Bayes Classifier?

c. What is smoothing and diagnostics for Na?ve Bayes Classifier?

领英推荐

Everything About Decision Tree From Scratch

Learnbay 2 年前

How to Choose the Right Machine Learning Model for…

10Alytics 6 个月前

Understanding XGBoost from A to Z!

Damien Benveniste, PhD 9 个月前

Example 9: Solution

a. Bayes' theorem defines P(C|A) = P(A|C)P(C)/P(A).

The probability of testing positive, that is P(A), needs to be computed first. That computation is shown in Equation below:

???????????????????????????????

According to Bayes' theorem, the probability

???????????????????????????????

b. With two simplifications, Bayes' theorem can be extended to become a na?ve Bayes classifier. The first simplification is to use the conditional independence assumption. That is, each attribute is conditionally independent of every other attribute given a class label ci. See Equation 7-13.

??????????????? ???

Therefore, this na?ve assumption simplifies the computation of P(a1, a2,..., am|ci) .

The second simplification is to ignore the denominator P(a1, a2, ..., am ).

Because P(a1, a2, ..., am ) appears in the denominator of P(ci|A) for all values of i, removing the denominator will have no impact on the relative probability scores and will simplify calculations.

Na?ve Bayes classification applies the two simplifications mentioned earlier and, as a result, P(ci| a1, a2, ..., am ) is proportional to the product of P(aj|ci) times P(ci).

This is shown in Equation 7-14

c. Smoothing and diagnostics for Na?ve Bayes Classifier

If one of the attribute values does not appear with one of the class labels within the training set, the corresponding P(aj|ci) will equal zero. When this happens, the resulting P(ci|A) from multiplying all the P(aj|ci)(j∈[1, m]) immediately becomes zero regardless of how large some of the conditional probabilities are.

Therefore overfitting occurs. Smoothing techniques can be employed to adjust the probabilities of P(aj|ci) and to ensure a nonzero value of P(ci|A). A smoothing technique assigns a small nonzero probability to rare events not included in the training dataset.

Smoothing techniques are available in most standard software packages for na?ve Bayes classifiers. However, if for some reason (like performance concerns) the na?ve Bayes classifier needs to be coded directly into an application, the smoothing and logarithm calculations should be incorporated into the implementation.

Diagnostics

Unlike logistic regression, na?ve Bayes classifiers can handle missing values. Na?ve Bayes is also robust to irrelevant variables—variables that are distributed among all the classes whose effects are not pronounced.

The model is simple to implement even without using libraries. The prediction is based on counting the occurrences of events, making the classifier efficient to run. Na?ve Bayes is computationally efficient and is able to handle high-dimensional data efficiently.

Compared to decision trees, na?ve Bayes is more resistant to overfitting, especially with the presence of a smoothing technique. Despite the benefits of na?ve Bayes, it also comes with a few disadvantages. Na?ve Bayes assumes the variables in the data are conditionally independent. Therefore, it is sensitive to correlated variables because the algorithm may double count the effects. As an example, assume that people with low income and low credit tend to default. If the task is to score “default” based on both income and credit as two separate attributes, na?ve Bayes would experience the double-counting effect on the default outcome, thus reducing the accuracy of the prediction.

Although probabilities are provided as part of the output for the prediction, na?ve Bayes classifiers in general are not very reliable for probability estimation and should be used only for assigning class labels. Na?ve Bayes in its simple form is used only with categorical variables. Any continuous variables should be converted into a categorical variable with the process known as discretization, as shown earlier. In common statistical software packages, however, na?ve Bayes is implemented in a way that enables it to handle continuous variables as well.

要查看或添加评论，请登录

Haiqing Hua的更多文章

A conversation about Bitcoin and AI

2025年1月2日

A conversation about Bitcoin and AI

What is the significance of Bitcoin in the financial field? ChatGPT said: ChatGPT As a decentralized digital currency…
A conversation about Bitcoin and AI关于比特币与AI的一次对话

2025年1月2日

A conversation about Bitcoin and AI关于比特币与AI的一次对话

比特币在财经领域的存在意义是什么？ ChatGPT said: ChatGPT 比特币作为一种去中心化的数字货币，在财经领域具有以下存在意义和价值： 1. 去中心化的货币体系…
海庆未来思想书《how they become rich》作者：华海庆

2024年9月16日

海庆未来思想书《how they become rich》作者：华海庆

Email: [email protected] 根据一位注册理财规划师，他白手起家的百万富翁客户拥有10个常见的理财习惯，普通人也可以效仿。…
The War of Souls — A Stunning Work of Future Thought by Haiqing Hua

2024年9月15日

The War of Souls — A Stunning Work of Future Thought by Haiqing Hua

The War of Souls is the latest science fiction masterpiece by Haiqing Hua, filled with profound philosophical…
Colonizing Mars

2024年9月10日

Colonizing Mars

The following is a synopsis of the 50-episode TV series "Colonizing Mars", which shows a story structure full of…
Haiqing Hua

2024年9月7日

Haiqing Hua

Haiqing Hua is a prominent Chinese-Canadian writer, poet and social commentator currently living in Toronto, Canada…
Stories Inspired by Zhuangzi (Chuang-Tzu, 369—298 BCE) 解读庄子深刻思想，细分古画传统意境

2024年7月26日

Stories Inspired by Zhuangzi (Chuang-Tzu, 369—298 BCE) 解读庄子深刻思想，细分古画传统意境

Stories Inspired by Zhuangzi (Chuang-Tzu, 369—298 BCE) Prologue In the vast expanse of ancient Chinese philosophy, the…
The Wise Old Turtle and the Helpful Frog

2024年7月26日

The Wise Old Turtle and the Helpful Frog

https://www.amazon.
Haiqing Chinese-English Bilingual Amazon Bookstore

2024年6月27日

Haiqing Chinese-English Bilingual Amazon Bookstore

Due to issues with Amazon's publishing software recognizing Chinese characters correctly, errors in recognition may…
Summary Reviews of Haiqing Hua's Recent Novels

2024年6月19日

Summary Reviews of Haiqing Hua's Recent Novels

AI Socialism --- A Dream Come True In this speculative fiction, Hua envisions a world where AI-powered socialism has…

See all articles

a. What is random forests? b. What is the process of growing a random forest?

Haiqing Hua

I share news from Chinese website (you can use google translate please also subscribe my YouTube Channel) | Ideologist | Poet | Futurist | Educator | Technologist | Business Analyst | Data Analyst | Realtor |

领英推荐

Haiqing Hua的更多文章

社区洞察

其他会员也浏览了

KNN:K-Nearest Neighbor

Balancing the Scales : Handling Class Imbalance

Bagging , Random Forest and Adaboost

Machine Learning Example: Predicting the Price of Diamonds using R

Notes - Decision Trees, Random Forests, Bagging, Boosting (AdaBoost, XGBoost), Stacking

Use SGD Regressor When Data is Constantly Updating: A Comprehensive Guide to Stochastic Gradient Descent in Machine Learning

Kfold Cross Validation for the LightGBM Classifier

Outlier Detection with Machine Learning

Crucial Metrics Decoded: "R2" and "Adjusted R2" Unveiled for Data Scientists

Support Vector Machine- Simple analysis

领英推荐

Haiqing Hua的更多文章

A conversation about Bitcoin and AI

A conversation about Bitcoin and AI关于比特币与AI的一次对话

海庆未来思想书《how they become rich》作者：华海庆

The War of Souls — A Stunning Work of Future Thought by Haiqing Hua

Colonizing Mars

Haiqing Hua

Stories Inspired by Zhuangzi (Chuang-Tzu, 369—298 BCE) 解读庄子深刻思想，细分古画传统意境

The Wise Old Turtle and the Helpful Frog

Haiqing Chinese-English Bilingual Amazon Bookstore

Summary Reviews of Haiqing Hua's Recent Novels

社区洞察

其他会员也浏览了

KNN:K-Nearest Neighbor

Balancing the Scales : Handling Class Imbalance

Bagging , Random Forest and Adaboost

Machine Learning Example: Predicting the Price of Diamonds using R

Notes - Decision Trees, Random Forests, Bagging, Boosting (AdaBoost, XGBoost), Stacking

Use SGD Regressor When Data is Constantly Updating: A Comprehensive Guide to Stochastic Gradient Descent in Machine Learning

Kfold Cross Validation for the LightGBM Classifier

Outlier Detection with Machine Learning

Crucial Metrics Decoded: "R2" and "Adjusted R2" Unveiled for Data Scientists

Support Vector Machine- Simple analysis