"Unravelling the Power of XGBoost: Boosting Performance with Extreme Gradient Boosting"
Dishant Kharkar
Physics Faculty | Data Science & AI Enthusiast | Educator & Analyst | Exploring the Intersection of Physics & Data Science
XGBoost is a powerful machine-learning algorithm that has been dominating the world of data science in recent years.
XGBoost offers a great deal of control to the user. In this blog post, we’ll look at what XGBoost is and how it works, so that you can get started using it in your projects.
What is XGBoost?
Why we use XGBoost:
Overall, the combination of speed, accuracy, interpretability, and versatility makes XGBoost a popular choice for data scientists and machine learning practitioners when working with structured/tabular data.?
However, it's essential to choose the right algorithm based on the specific characteristics of the data and the problem at hand. While XGBoost excels in many scenarios, there are instances where other algorithms may be more appropriate, such as deep learning for unstructured data or time-series analysis with specialized models.
key components:
How does XGBoost work?
That's it! XGBoost builds an ensemble of decision trees, where each tree learns from the errors of the previous trees, and together they create a powerful and accurate predictive model. The process of learning from errors and building more trees continues until the model is robust and performs well on new data.
领英推荐
How to Solve the XGBoost mathematically:
Here we will use simple Training Data, which has a Drug dosage on the x-axis and Drug effectiveness on the y-axis. The above two observations(6.5, 7.5) have a relatively large value for Drug Effectiveness and which means that the drug was helpful and the below two observations(-10.5, -7.5) have a relatively negative value for Drug Effectiveness, which means that the drug did more harm than good.
The very 1st step in fitting XGBoost to the training data is to make an initial prediction. This prediction could be anything but by default, it is 0.5, regardless of whether you are using XGBoost for Regression or Classification.
The prediction 0.5 corresponds to the thick black horizontal line.
Unlike unextreme Gradient Boost which typically uses regular off-the-shelf, Regression Trees. XGBoost uses a unique Regression tree that is called an XGBoost Tree.
Now we need to calculate the Quality score or Similarity score for the Residuals.
Here λ? is a regularisation parameter.
So we split the observations into two groups, based on whether or not the Dosage<15.
The observation on the left is the only one with a Dosage<15. All of the other residuals go to the leaf on the right.
When we calculate the similarity score for the observations –10.5,-7.5,6.5,7.5 while putting λ =0
we got similarity =4 and
Hence the result we got is:
Hyperparameter in XGBoost:
These are just some of the hyperparameters available in XGBoost. There are more advanced hyperparameters and configurations to explore, such as using different booster types (gbtree, gblinear, dart), tree construction types (hist vs. exact), and others. Proper hyperparameter tuning is crucial to find the optimal combination for your specific dataset and problem, and it often involves using techniques like grid search, random search, or Bayesian optimization.
Comparison between XGBoost and traditional Gradient Boosting:
Application of XGBOOST:
XGBoost, being a versatile and powerful machine learning algorithm, finds applications in a wide range of domains. Here are some common and notable applications of XGBoost:
If you learned something from this blog, make sure you give it a ????
Will meet you in some other blog, till then Peace ???.
?
Thank_You_