Gradient Boosting: Introduction, Implementation, and Mathematics behind it.

Gradient Boosting: Introduction, Implementation, and Mathematics behind it.

A beginner-friendly introduction and an implementation in Python.

Introduction

Gradient Boosting is powerful and one of the most important topics in ensemble learning. It is a type of Boosting ensemble. I have explained about Boosting in our last two articles—AdaBoost: Introduction, Implementation, and Mathematics behind it and Boosting: Introduction.

If you haven’t already done so, I would highly recommend that you read it. It will help you understand more about Boosting and ultimately help you understand Gradient Boosting more deeply.


Now with that said, let's just quickly recap.

Boosting is an ensemble technique that combines multiple weak learners to form a strong learner. Weak learners are the learners/models that do not do better than a random guess.

Similar to AdaBoost, Gradient Boosting combines the weak learners sequentially to form a strong learner that performs better, but how it does is different.

How it actually works?

In a nutshell, Gradient Boosting works like this:

Takes data → initializes with a constant value → finds a residual → trains a model to predict the residual → calculate the predicted values based on the predicted residual → find another residual based on previous residual and repeat.

Before we start breaking down the steps of Gradient Boosting, let’s start with housing price data that we would be woking on.

Here, Bedroom and Bathroom are the Features and the Price is the prediction. We will use the features to predict the price of a house.

Let’s break it down the steps of Gradient Boosting now.

Remember that in Boosting method, we have to start with a Base Learner. In Gradient Boosting, before we create Base Learner, we have to go through couple of steps.


Initialize the model with a initial constant value/initial guess


In regression, there are lots of loss function we can use. We will use Mean Squared Error (MSE) here.

here, y it the original value, ? is the predicted value.

We want to find a value of ? here, since ? is the value we do not know.

Now, the equation (i) tells us that we have to find minimum value of the loss function.

We know that, in order to find a minimum of a function, we have to:

  1. Take the first derivate of that function (equation (ii))
  2. Set the value to 0
  3. Solve for the unknown value (for us it is ?)

In our data above, y is the Price and we need to find ?. For simplicity we will omit the 000 from the price. So instead of 313000 we will write 313

Let’s find the initial constant/guess value, i.e. ?. Here is the calculation of the value.

One concept that you should remember is that, the first derivate of a function helps us find a minima (minimum value).

Upon calculation we found that the initial constant/guess value ? is 1013.

Great. Here is our updated table.


Calculate a Pseudo Residuals (r)

This is one of the important steps in this algorithm.

In this step, we find a residual (r) given by the formula:


One important concept to understand here is that, F(xi) is the ? value of the previous model.

lets make this equation easy to understand.


Now lets find what that partial derivate is.

Upon calculation, you can see that residual (r) is nothing but (y-?)


Upon the calculation of our first residual value (r1), our table looks like this:

Good, we successfully calculated our first residual (r).

Now, we use this value of r as a target and our features(bedroom and bathroom) to train a Decision Tree. ← This will be our next Base Learner.

Train a Base Learner hm(x)

In this step we train our first model where, r is the dependent feature and bedroom and bathroom as the independent features.

After training the Decision Tree, we predict the residual and we predict a value of r.

Note: The predicted value here are just for examples. (it may not reflect the true value)


Find a value of Gamma(??) that minimizes the loss for our model

From this equation, our MSE loss will be:

so our ??_m becomes

Now we calculate the minimum value of ??_m, similar to how we did in step 1 (by finding a first derivative with respect to ??)

here we got the optimal value of ?? as 0.


Update the model and Find the Prediction.

After finding the minimal value of ??, we update the model to get the final prediction.

After updating the model, and putting the values, we get the next predictions.

here, we got 1013 for all 3 data. (Note: this is just an example, and it is skewed from the real values)

Now, we go back to step 2 and repeat the process until we want (typically we set how many decision tree we want)


Although, the mathematics seems a little bit intimidating, the coding part is not hard at all.

Here is the implementation of Gradient Boosting Regressor in Scikit-Learn.


Its that easy to implement in python. We just have to know which learning rate to use, which loss function to use and how many decision trees (weak learners) we want to train.

Here is the example of Gradient Boosting used for House Price prediction.


I hope you learned about Gradient Boosting Regression and the mathematics behind it.

We will talk about Gradient Boosting Classification next.


This is a 8th article in the Series Forming a strong foundation. Here are the links to the previous articles:

  1. Why Should I Learn from the Beginning?
  2. Linear Regression: Introduction
  3. Regression: Evaluation Metrics/Loss Functions
  4. Decision Tree: Introduction
  5. Random Forest: Introduction & Implementation in Python
  6. Boosting: Introduction
  7. AdaBoost: Introduction, Implementation and Mathematics behind it.


References and Further Resources

https://en.wikipedia.org/wiki/Gradient_boosting

https://machinelearningmastery.com/gentle-introduction-gradient-boosting-algorithm-machine-learning/

https://dataaspirant.com/gradient-boosting-algorithm/

Awesome Gradient Boosting Research Papers

https://www.kaggle.com/code/kashnitsky/topic-10-gradient-boosting

要查看或添加评论,请登录

Mahesh S.的更多文章

  • Gradient Boosting: Introduction, Implementation, and Mathematics behind it - For Classification

    Gradient Boosting: Introduction, Implementation, and Mathematics behind it - For Classification

    A detailed beginner friendly introduction and an implementation in Python. Gradient Boosting(GB) is an ensemble…

  • Linear Time Complexity Explained

    Linear Time Complexity Explained

    Understanding Big O Notation. Have you ever written a for loop? let’s refresh our memory How many times do you think…

    4 条评论
  • AdaBoost: Introduction, Implementation and Mathematics behind it.

    AdaBoost: Introduction, Implementation and Mathematics behind it.

    A beginner-friendly introduction and an implementation in Python Introduction AdaBoost is one of the first ensemble…

    2 条评论
  • Boosting: Introduction

    Boosting: Introduction

    Machine learning is rapidly evolving, and so is the available data. One of the main challenges of Machine Learning is…

    1 条评论
  • Random Forest: Introduction & Implementation in Python

    Random Forest: Introduction & Implementation in Python

    As always, let's start with a question. Have you ever been in a situation where you needed the opinion of more than one…

    4 条评论
  • Decision Trees: Introduction

    Decision Trees: Introduction

    A beginner friendly introduction to Decision Trees Continuing our House Price Example: Imagine you are planning to buy…

    4 条评论
  • Regression: Evaluation Metrics/Loss Functions

    Regression: Evaluation Metrics/Loss Functions

    A beginner-friendly introduction to the Evaluation Metrics of Regression. Whenever we create a model, we need to check…

    1 条评论
  • Linear Regression: Introduction

    Linear Regression: Introduction

    Let’s start with a question: Have you ever wondered how the Price of a house is predicted? Or have you ever tried to…

    6 条评论
  • Why Should I Learn from the Beginning?

    Why Should I Learn from the Beginning?

    And have a strong foundation. (from an AI/ML perspective) When you start learning Machine Learning, one of the first…

    2 条评论