Random Forest: The Math of Intelligence

We’re going to talk about building and evaluating random forests.

Random forests are built from decision trees. 

Decision trees are easy to build, easy to use and easy to interpret, but in practice they are not that awesome… 
Quote from the Elements of Statistical Learning


No hay texto alternativo para esta imagen

The good news is that random forests combine the simplicity of decision trees with flexibility, resulting in a vast improvement in accuracy.

No hay texto alternativo para esta imagen

So let’s illustrate a Random Forest!

Step 1.

Create a Bootstrap Dataset:

Imagine that these 4 samples are the entire data set that we are going to build a tree from this original Dataset:

No hay texto alternativo para esta imagen

To create a bootstrap data set that is the same size as the original.

We just randomly select samples from the original data set, the important detail is that we’re allowed to pick the same sample more than once this is the first sample that we randomly select:

No hay texto alternativo para esta imagen

This is the second randomly selected sample from the original Dataset.

So it’s the second sample in our bootstrap Dataset:

No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen

Lastly… here’s the fourth randomly selected sample note. 

It’s the same as the third:

No hay texto alternativo para esta imagen

Step2.

For creating a random forest will create a decision tree using the bootstrap Dataset.

Note, we’ll talk more about how to determine the optimal number of variables to consider later.

Thus instead of considering all four variables to figure out how to split the root node.

No hay texto alternativo para esta imagen

This case we randomly selected good blood circulation and blocked arteries as candidates for the root node, just for the sake of the example assume that good blood circulation. 

Did the best job separating the samples?

Since we used a good blood circulation, I’m going to gray it out so that we focus on the remaining variables.

 Now we need to figure out how to split samples at this node just like for the route we randomly select two variables as candidates instead of all three remaining columns and:

No hay texto alternativo para esta imagen

We just build the tree as usual, but only considering a random subset of variables at each step:

No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen

Here’s the tree we just made.

Now go back to step one and repeat:

Make a new bootstrap data set and build a tree considering a subset of variables at each step.

No hay texto alternativo para esta imagen

Using a bootstrap sample and considering only a subset of variables at each step results in a wide variety of trees.

The variety is what makes random forests more effective than individual decision trees.

Now that we’ve created a random forest. How do we use it?

No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen
No hay texto alternativo para esta imagen





要查看或添加评论,请登录

Lorenzo Castagno的更多文章

  • Maximum Likelihood for the Normal Distribution

    Maximum Likelihood for the Normal Distribution

    Let’s start with the equation for the normal distribution or normal curve It has two parameters the first parameter…

  • Steps to build a?Startup? (Case for AI)

    Steps to build a?Startup? (Case for AI)

    Step 1 List Personal Problems Step 2 Market Research (competing products) Step 3 Buy Domain (in case of a website) Step…

  • What to do in Case of an Economic Downturn or Recession?

    What to do in Case of an Economic Downturn or Recession?

    There is a high probability of a recession affecting the global economy at the end of 2020. A recession is a decrease…

  • Warren Buffett's investments

    Warren Buffett's investments

    Warren Buffett invests in the stock market through its business holding Berkshire Hathaway INC, it's a portfolio of…

  • El portafolio de Warren Buffet

    El portafolio de Warren Buffet

    Warren Buffett invierte en bolsa a través de su holding empresarial Berkshire Hathaway INC, su cartera de acciones está…

  • Apple está tratando de matar la tecnología web

    Apple está tratando de matar la tecnología web

    La compa?ía ha hecho que sea extremadamente difícil usar tecnología basada en la web en sus plataformas. Los lenguajes…

    4 条评论
  • Resumen Mercados Financieros: Jul 19

    Resumen Mercados Financieros: Jul 19

    El mes de Julio ha empezado siendo un mes tranquilo con una aparente tregua en la guerra comercial, pero conforme iba…

  • How to predict price stocks using Deep Learning

    How to predict price stocks using Deep Learning

    Investors make guesses by analyzing data reading the news, study the company history industry and trends: there are…

    2 条评论

社区洞察

其他会员也浏览了