登录查看更多内容

What is Random Forest?

Martin Keen

NC Inventor Of The Year with 400 patents | Presents on AI on the IBM Technology YouTube channel | IBM Training

发布日期: 2022年2月8日

I just can't decide. Should I play a round of golf today? Let's make a decision tree to decide. First off do I have the time? Is it sunny? Do I have my clubs with me? This decision tree is an example of a classification problem, where the class labels are "golf" and "no golf". While they are helpful, decision trees can be prone to problems, such as bias and overfitting. That's where random forest can help.?Random forest is a type of machine learning model that uses an ensemble of decision trees to make its predictions. And why do we call it random forest?

Can't see the forest for the (decision) trees

Well, the reason is because it's actually built by taking a random sample of my data and then building an ongoing series of decision trees on those subsets. So, we're essentially creating a bunch of smaller decision trees that work together as one larger model or group. Chances are other people have built different, and maybe better, decision trees to answer the same question. Maybe those trees consider things like time of day, or the difficulty of the course. The more decision trees I use with different criteria, the better my random forest will perform because it's essentially increasing my prediction accuracy. And if one or two of these smaller decision trees are not relevant on a certain day then that information is tossed out and not used to make the overall prediction.

One of the primary benefits of random forest is that it can reduce overfitting, which occurs when your model starts to memorize the data rather than trying to generalize for making predictions on future data. Essentially, it helps me get around the limitations of my data, which might not be fully representative of all golfers, or the best features to use in my model. It can also help reduce bias which can occur when there is a certain degree of error introduced into your model. Bias occurs when you are not evenly splitting your instance space during training, so instead of seeing all of my data points, you might only see half because of how I set up my model.

领英推荐

Decision Tree Algorithm

Harry Thapa 1 年前

Fun with Graphing in Power BI - Part 5-by-5

Greg Deckler 7 年前

Using Hypothesis Development Canvas to Predict Golden…

Bill Schmarzo 6 年前

How many trees is too many?

To set up a random forest algorithm, you'll set parameters for node size, number of trees, and number of features to randomly sample from your training data set. It can be challenging at first because you'll want a lot of trees to get the best predictive accuracy, but you don't want too many trees because it will take a long time to train the model and use a lot of memory space.

Once you set up your parameters, you'll use a random forest model to make predictions on your test data, and you can even segment or slice your results by different criteria. Maybe you want to know how your random forest does on certain types of golf courses, or how it performs during different times of day.?

It's not just about a tee time

Random forest is pretty popular among data science professionals, and with good reason. It can be extremely helpful in all sorts of classification problems. In finance it can be used to predict the likelihood of default. In medical diagnostics, it can be used to predict prognosis or survival rates depending on treatment options. And in economics it can help me understand whether a policy is effective or not.

So what do you think? Should I play golf today? The sum of my random forest decision trees says... yes.?I'll see you out on the course.

要查看或添加评论，请登录

Martin Keen的更多文章

Can worthwhile technical content be... funny?

2022年4月25日

Can worthwhile technical content be... funny?

Thirty seconds into my tech talk about text mining the producer stopped me. "You need to wiggle your hips more" he said.

10 条评论
How our project about a 60 year-old programming language became a hot topic

2020年5月27日

How our project about a 60 year-old programming language became a hot topic

I didn’t expect the topic of my newest project to be on the nightly news. Nor for our resulting deliverable to be…

9 条评论
Online video education - here's why even book guys are adopting it

2017年10月10日

Online video education - here's why even book guys are adopting it

I hold the record for the biggest book ever published by the 40+ year-old IBM Redbooks division. The book is over 1,900…

6 条评论
What if your day job was like Pokémon Go?

2016年7月14日

What if your day job was like Pokémon Go?

Picture in your mind a traditional gamer and what do you see? A kid playing FIFA soccer on a games console, an online…

2 条评论
3 Tips for Surviving a Las Vegas Conference

2015年2月23日

3 Tips for Surviving a Las Vegas Conference

You've probably heard the stories of Las Vegas conventions. Sore feet, huge crowds, and mass confusion.

11 条评论
How PlayStation owners are creating private clouds for gaming

2015年2月10日

How PlayStation owners are creating private clouds for gaming

When Sony launched the PlayStation 4, it touted that the new games console was built around five design principles:…

4 条评论
Video games at work? How IBMers are learning while they play

2015年1月19日

Video games at work? How IBMers are learning while they play

Did you know that IBM makes video games—proper video games with 3D graphics, avatars and online leaderboards? The games…

14 条评论

See all articles

What is Random Forest?

Martin Keen

NC Inventor Of The Year with 400 patents | Presents on AI on the IBM Technology YouTube channel | IBM Training

Can't see the forest for the (decision) trees

领英推荐

How many trees is too many?

It's not just about a tee time

Martin Keen的更多文章

社区洞察

其他会员也浏览了

Using Hypothesis Development Canvas to Predict Golden State Warrior Victories

Linear Regression

Fun with Graphing in Power BI - Part SQRT(4)

The Key to Better Sports Science: Mastering Spreadsheet Fundamentals Before Anything Else

Time Series Analysis with Facebook Prophet: How it works and How to use it

The Most Powerful Superheroes (and Villains)

Data Visualization Research is Great...but What Do I Do About It?

R Vs. Power BI: The 2025 Showdown in Data Analysis

Beyond the Comfort Zone: How I Leveled Up by Ditching Ready-Made Datasets

Why R is One of My Tools—and a Secret to Better Decisions in Sports Science

Can't see the forest for the (decision) trees

领英推荐

How many trees is too many?

It's not just about a tee time

Martin Keen的更多文章

Can worthwhile technical content be... funny?

How our project about a 60 year-old programming language became a hot topic

Online video education - here's why even book guys are adopting it

What if your day job was like Pokémon Go?

3 Tips for Surviving a Las Vegas Conference

How PlayStation owners are creating private clouds for gaming

Video games at work? How IBMers are learning while they play

社区洞察

其他会员也浏览了

Using Hypothesis Development Canvas to Predict Golden State Warrior Victories

Linear Regression

Fun with Graphing in Power BI - Part SQRT(4)

The Key to Better Sports Science: Mastering Spreadsheet Fundamentals Before Anything Else

Time Series Analysis with Facebook Prophet: How it works and How to use it

The Most Powerful Superheroes (and Villains)

Data Visualization Research is Great...but What Do I Do About It?

R Vs. Power BI: The 2025 Showdown in Data Analysis

Beyond the Comfort Zone: How I Leveled Up by Ditching Ready-Made Datasets

Why R is One of My Tools—and a Secret to Better Decisions in Sports Science