登录查看更多内容

Random Forest: The Math of Intelligence

Lorenzo Castagno

Consultant

发布日期: 2020年7月31日

+ 关注

We’re going to talk about building and evaluating random forests.

Random forests are built from decision trees.

Decision trees are easy to build, easy to use and easy to interpret, but in practice they are not that awesome…

Quote from the Elements of Statistical Learning

No hay texto alternativo para esta imagen

The good news is that random forests combine the simplicity of decision trees with flexibility, resulting in a vast improvement in accuracy.

So let’s illustrate a Random Forest!

Step 1.

Create a Bootstrap Dataset:

Imagine that these 4 samples are the entire data set that we are going to build a tree from this original Dataset:

To create a bootstrap data set that is the same size as the original.

We just randomly select samples from the original data set, the important detail is that we’re allowed to pick the same sample more than once this is the first sample that we randomly select:

This is the second randomly selected sample from the original Dataset.

So it’s the second sample in our bootstrap Dataset:

Lastly… here’s the fourth randomly selected sample note.

It’s the same as the third:

Step2.

For creating a random forest will create a decision tree using the bootstrap Dataset.

Note, we’ll talk more about how to determine the optimal number of variables to consider later.

Thus instead of considering all four variables to figure out how to split the root node.

This case we randomly selected good blood circulation and blocked arteries as candidates for the root node, just for the sake of the example assume that good blood circulation.

Did the best job separating the samples?

Since we used a good blood circulation, I’m going to gray it out so that we focus on the remaining variables.

Now we need to figure out how to split samples at this node just like for the route we randomly select two variables as candidates instead of all three remaining columns and:

We just build the tree as usual, but only considering a random subset of variables at each step:

Here’s the tree we just made.

Now go back to step one and repeat:

Make a new bootstrap data set and build a tree considering a subset of variables at each step.

Using a bootstrap sample and considering only a subset of variables at each step results in a wide variety of trees.

The variety is what makes random forests more effective than individual decision trees.

Now that we’ve created a random forest. How do we use it?

要查看或添加评论，请登录

Lorenzo Castagno的更多文章

Maximum Likelihood for the Normal Distribution

2020年4月24日

Maximum Likelihood for the Normal Distribution

Let’s start with the equation for the normal distribution or normal curve It has two parameters the first parameter…
Steps to build a?Startup? (Case for AI)

2020年1月10日

Steps to build a?Startup? (Case for AI)

Step 1 List Personal Problems Step 2 Market Research (competing products) Step 3 Buy Domain (in case of a website) Step…
What to do in Case of an Economic Downturn or Recession?

2019年11月22日

What to do in Case of an Economic Downturn or Recession?

There is a high probability of a recession affecting the global economy at the end of 2020. A recession is a decrease…
Warren Buffett's investments

2019年11月12日

Warren Buffett's investments

Warren Buffett invests in the stock market through its business holding Berkshire Hathaway INC, it's a portfolio of…
El portafolio de Warren Buffet

2019年11月10日

El portafolio de Warren Buffet

Warren Buffett invierte en bolsa a través de su holding empresarial Berkshire Hathaway INC, su cartera de acciones está…
Apple está tratando de matar la tecnología web

2019年11月8日

Apple está tratando de matar la tecnología web

La compa?ía ha hecho que sea extremadamente difícil usar tecnología basada en la web en sus plataformas. Los lenguajes…

4 条评论
Resumen Mercados Financieros: Jul 19

2019年8月7日

Resumen Mercados Financieros: Jul 19

El mes de Julio ha empezado siendo un mes tranquilo con una aparente tregua en la guerra comercial, pero conforme iba…
How to predict price stocks using Deep Learning

2019年8月5日

How to predict price stocks using Deep Learning

Investors make guesses by analyzing data reading the news, study the company history industry and trends: there are…

2 条评论

See all articles

Random Forest: The Math of Intelligence

Lorenzo Castagno

Consultant

Now that we’ve created a random forest. How do we use it?

Lorenzo Castagno的更多文章

社区洞察

其他会员也浏览了

Fun with Graphing in Power BI - Part SQRT(POWER(SQRT(8),2) + POWER(SQRT(8),2))

Learning path for Datascience

The Ultimate Data Scientist Roadmap: From Beginner to Mastery

DIY - Simple Exponential Smoothing with Excel

I tested 9 data augmentation techniques so you don’t have to — here’s what actually worked! ??

What is Decision Trees and Random Forests

Detecting Global Optimum Convergence

Pandas - Duplicate Row Detection and Grouping

Unlocking Insights: How Everyday Charts Boost Business Understanding and Decision-Making

Optimizing Forecasting Models with Root Mean Squared Logarithmic Error (RMSLE)

Now that we’ve created a random forest. How do we use it?

Lorenzo Castagno的更多文章

Maximum Likelihood for the Normal Distribution

Steps to build a?Startup? (Case for AI)

What to do in Case of an Economic Downturn or Recession?

Warren Buffett's investments

El portafolio de Warren Buffet

Apple está tratando de matar la tecnología web

Resumen Mercados Financieros: Jul 19

How to predict price stocks using Deep Learning

社区洞察

其他会员也浏览了

Fun with Graphing in Power BI - Part SQRT(POWER(SQRT(8),2) + POWER(SQRT(8),2))

Learning path for Datascience

The Ultimate Data Scientist Roadmap: From Beginner to Mastery

DIY - Simple Exponential Smoothing with Excel

I tested 9 data augmentation techniques so you don’t have to — here’s what actually worked! ??

What is Decision Trees and Random Forests

Detecting Global Optimum Convergence

Pandas - Duplicate Row Detection and Grouping

Unlocking Insights: How Everyday Charts Boost Business Understanding and Decision-Making

Optimizing Forecasting Models with Root Mean Squared Logarithmic Error (RMSLE)