登录查看更多内容

Gradient Boosting: Introduction, Implementation, and Mathematics behind it.

Mahesh S.

USA Foundation Research Fellow | Ph.D. Candidate in Computer Science with Hands-on Experience with ML Projects | Ex-Full Stack Software Developer | Like Leveraging Advanced ML Techniques to Solve Real-World Problems

发布日期: 2024年9月8日

+ 关注

A beginner-friendly introduction and an implementation in Python.

Introduction

Gradient Boosting is powerful and one of the most important topics in ensemble learning. It is a type of Boosting ensemble. I have explained about Boosting in our last two articles—AdaBoost: Introduction, Implementation, and Mathematics behind it and Boosting: Introduction.

If you haven’t already done so, I would highly recommend that you read it. It will help you understand more about Boosting and ultimately help you understand Gradient Boosting more deeply.

Now with that said, let's just quickly recap.

Boosting is an ensemble technique that combines multiple weak learners to form a strong learner. Weak learners are the learners/models that do not do better than a random guess.

Similar to AdaBoost, Gradient Boosting combines the weak learners sequentially to form a strong learner that performs better, but how it does is different.

How it actually works?

In a nutshell, Gradient Boosting works like this:

Takes data → initializes with a constant value → finds a residual → trains a model to predict the residual → calculate the predicted values based on the predicted residual → find another residual based on previous residual and repeat.

Before we start breaking down the steps of Gradient Boosting, let’s start with housing price data that we would be woking on.

Here, Bedroom and Bathroom are the Features and the Price is the prediction. We will use the features to predict the price of a house.

Let’s break it down the steps of Gradient Boosting now.

Remember that in Boosting method, we have to start with a Base Learner. In Gradient Boosting, before we create Base Learner, we have to go through couple of steps.

Initialize the model with a initial constant value/initial guess

In regression, there are lots of loss function we can use. We will use Mean Squared Error (MSE) here.

here, y it the original value, ? is the predicted value.

We want to find a value of ? here, since ? is the value we do not know.

Now, the equation (i) tells us that we have to find minimum value of the loss function.

We know that, in order to find a minimum of a function, we have to:

Take the first derivate of that function (equation (ii))
Set the value to 0
Solve for the unknown value (for us it is ?)

In our data above, y is the Price and we need to find ?. For simplicity we will omit the 000 from the price. So instead of 313000 we will write 313

Let’s find the initial constant/guess value, i.e. ?. Here is the calculation of the value.

One concept that you should remember is that, the first derivate of a function helps us find a minima (minimum value).

Upon calculation we found that the initial constant/guess value ? is 1013.

Great. Here is our updated table.

Calculate a Pseudo Residuals (r)

This is one of the important steps in this algorithm.

In this step, we find a residual (r) given by the formula:

One important concept to understand here is that, F(xi) is the ? value of the previous model.

lets make this equation easy to understand.

Now lets find what that partial derivate is.

Upon calculation, you can see that residual (r) is nothing but (y-?)

Upon the calculation of our first residual value (r1), our table looks like this:

Good, we successfully calculated our first residual (r).

Now, we use this value of r as a target and our features(bedroom and bathroom) to train a Decision Tree. ← This will be our next Base Learner.

Train a Base Learner hm(x)

In this step we train our first model where, r is the dependent feature and bedroom and bathroom as the independent features.

After training the Decision Tree, we predict the residual and we predict a value of r.

Note: The predicted value here are just for examples. (it may not reflect the true value)

Find a value of Gamma(??) that minimizes the loss for our model

From this equation, our MSE loss will be:

so our ??_m becomes

Now we calculate the minimum value of ??_m, similar to how we did in step 1 (by finding a first derivative with respect to ??)

here we got the optimal value of ?? as 0.

Update the model and Find the Prediction.

After finding the minimal value of ??, we update the model to get the final prediction.

After updating the model, and putting the values, we get the next predictions.

here, we got 1013 for all 3 data. (Note: this is just an example, and it is skewed from the real values)

Now, we go back to step 2 and repeat the process until we want (typically we set how many decision tree we want)

Although, the mathematics seems a little bit intimidating, the coding part is not hard at all.

Here is the implementation of Gradient Boosting Regressor in Scikit-Learn.

Its that easy to implement in python. We just have to know which learning rate to use, which loss function to use and how many decision trees (weak learners) we want to train.

Here is the example of Gradient Boosting used for House Price prediction.

I hope you learned about Gradient Boosting Regression and the mathematics behind it.

We will talk about Gradient Boosting Classification next.

This is a 8th article in the Series Forming a strong foundation. Here are the links to the previous articles:

References and Further Resources

https://en.wikipedia.org/wiki/Gradient_boosting

https://machinelearningmastery.com/gentle-introduction-gradient-boosting-algorithm-machine-learning/

https://dataaspirant.com/gradient-boosting-algorithm/

Awesome Gradient Boosting Research Papers

https://www.kaggle.com/code/kashnitsky/topic-10-gradient-boosting

要查看或添加评论，请登录

Mahesh S.的更多文章

Gradient Boosting: Introduction, Implementation, and Mathematics behind it - For Classification

2024年9月28日

Gradient Boosting: Introduction, Implementation, and Mathematics behind it - For Classification

A detailed beginner friendly introduction and an implementation in Python. Gradient Boosting(GB) is an ensemble…
Linear Time Complexity Explained

2024年8月10日

Linear Time Complexity Explained

Understanding Big O Notation. Have you ever written a for loop? let’s refresh our memory How many times do you think…

4 条评论
AdaBoost: Introduction, Implementation and Mathematics behind it.

2024年7月27日

AdaBoost: Introduction, Implementation and Mathematics behind it.

A beginner-friendly introduction and an implementation in Python Introduction AdaBoost is one of the first ensemble…

2 条评论
Boosting: Introduction

2024年7月7日

Boosting: Introduction

Machine learning is rapidly evolving, and so is the available data. One of the main challenges of Machine Learning is…

1 条评论
Random Forest: Introduction & Implementation in Python

2024年6月29日

Random Forest: Introduction & Implementation in Python

As always, let's start with a question. Have you ever been in a situation where you needed the opinion of more than one…

4 条评论
Decision Trees: Introduction

2024年6月22日

Decision Trees: Introduction

A beginner friendly introduction to Decision Trees Continuing our House Price Example: Imagine you are planning to buy…

4 条评论
Regression: Evaluation Metrics/Loss Functions

2024年6月15日

Regression: Evaluation Metrics/Loss Functions

A beginner-friendly introduction to the Evaluation Metrics of Regression. Whenever we create a model, we need to check…

1 条评论
Linear Regression: Introduction

2024年6月8日

Linear Regression: Introduction

Let’s start with a question: Have you ever wondered how the Price of a house is predicted? Or have you ever tried to…

6 条评论
Why Should I Learn from the Beginning?

2024年5月31日

Why Should I Learn from the Beginning?

And have a strong foundation. (from an AI/ML perspective) When you start learning Machine Learning, one of the first…

2 条评论

See all articles

Introduction

How it actually works?

Initialize the model with a initial constant value/initial guess

Calculate a Pseudo Residuals (r)

Train a Base Learner hm(x)

Find a value of Gamma(??) that minimizes the loss for our model

Update the model and Find the Prediction.

References and Further Resources

Mahesh S.的更多文章

Gradient Boosting: Introduction, Implementation, and Mathematics behind it - For Classification

Linear Time Complexity Explained

AdaBoost: Introduction, Implementation and Mathematics behind it.

Boosting: Introduction

Random Forest: Introduction & Implementation in Python

Decision Trees: Introduction

Regression: Evaluation Metrics/Loss Functions

Linear Regression: Introduction

Why Should I Learn from the Beginning?