Unlocking the Power of Extreme Gradient Boosting with SAP IBP 2402

This article is part of SAP IBP Hands-On Series by?Bolders Consulting Group, where consultants write about practical, real-world applications and share insider tips on effectively using SAP Integrated Business Planning. Each piece in this series draws from direct expertise of our consultants, offering unique insights and actionable strategies for optimizing your supply chain management.

Greetings, dear readers. With the recent launch of SAP IBP 2402, SAP has introduced several impressive functionalities tailored to different business requirements. Notably, the integration of the extreme gradient boosting algorithm into demand planning is one among the key functionalities. In this article, I aim to provide concise insights for the conceptual understanding of XG Boost, as it can be challenging for people to grasp the underlying principles of gradient boosting algorithms. Given the similarities between gradient boosting and XG Boost, both leveraging multiple independent variables and prediction factors, understanding the conceptual intricacies is paramount. Accordingly, I will elucidate the process through which XG Boost generates final predictions before delving into the specific settings introduced in SAP IBP 2402.

In this blog, I have intentionally maintained a focus on providing qualitative content to offer readers a comprehensive understanding of the extreme gradient boosting algorithm. Given the specificity of the blog to SAP, my aim is to align the content closely with the documentation available from SAP while ensuring clarity and depth. If you find this topic interesting, I welcome your feedback. So that I can write a separate blog post dedicated to exploring the foundational mathematics behind extreme gradient boosting, which will be applicable beyond the scope of SAP. Please feel free to leave a comment if you would like to see this topic covered in future posts.

Deciphering Machine learning fundamentals: Regression, Classification, Clustering

In the context of machine learning, understanding patterns, making predictions and extracting insights from data are paramount. Three fundamental techniques regression, classification and clustering play vital role in achieving these objectives. Be in predicting the future sales or segmenting your products in your supply chain. Regression and classification are termed as supervised learning (Using input & output data to form predictions), where as clustering is termed as unsupervised learning (Using data to uncover underlying patterns).

Extreme gradient boosting is an ensemble machine learning algorithm (Combining multiple individual models in the form of decision trees) to create a more robust predictive model that supports with both regression (Data with continuous pattern, example: Historical sales vs Statistical forecast) and classification data (Data with categorical variables, example: predicting stock out risk with output variable being binary (yes or no)).

Discerning Boosting landscape:

There are different ensemble machine learning techniques such as Bagging, Boosting, and Stacking offer powerful methods for improving predictive performance. In this discussion, the focus is on Boosting, a method that iteratively builds subsequent weak learners by learning from past mistakes. With each iteration, Boosting corrects errors from previous models, taking small steps toward the optimal solution. This iterative process focuses on minimizing the loss function by assigning greater weight to misclassified data from preceding models using gradient descent algorithm. While this explanation already delves into bit of technical details, subsequent sections will provide more clarity and further simplify the concepts.

Conceptual Overview: Extreme Gradient Boosting


Conceptual Process Flow: XGBoost

Additive Strategy: The term refers to the approach used to combine the predictions of multiple weak learners (decision trees) to arrive a single strong learner. This strategy involves sequentially adding new decision trees to the ensemble in a way that minimizes the overall loss function.

Lets take the below simple data as an example that uses temperature (Independent variable) to predict expected sales volume in the future.

If we plot the above data, it looks as follows in a graph.

XG Boost starts by making an initial prediction in this case I have taken a simple average of the target variables in the dataset for conceptual understanding.

50 + 100 + 250 + 300 / 4 = 175.

The second step is to calculate the residuals. This is to calculate the difference between the Initial predicted value and the actual sales volume and below are the information of the residuals. Then the algorithm tries to calculate something called similarity or quality score for the residuals.

???

So we can plot the initial prediction along with residuals in our existing graph.?

The next step is to see if we can cluster residuals so that we can improve the final predictions as I mentioned earlier in each of these steps below the algorithm tries to takes small steps towards the correct direction.

The first split we make is at the median between the first two data points to form a decision tree to see how well the model performs. The node at the top is called the root and the nodes and the end are called the leaves in the below tree, the similarity scores are then calculated for the leaves as well. Here for the sake of simplicity I am keeping it with a single node (max depth of a tree) so that the below example doesn’t go out of the hand.

Then the algorithm tries to calculate something called gain factor in-order to quantity how better the leaves cluster similar residuals by using the similarity scores it has calculated for the first decision tree. Then the algorithm changes the threshold from the input data to make different decision trees and see which fits the best with the available data.

Then the algorithm tries to calculate similarity scores, gain and new residuals for the second decision tree. Then the algorithm changes the threshold of the decision tree and keep repeating the same process until the max tree defined.

Then the algorithm tries to pick the decision tree with highest gain in our case lets assume that the temperature < 17.5 has the highest gain among all the trees it could make.

Then the algorithm tries to prune the decision trees it has built, the basic idea behind pruning is to remove unnecessary branches from the tree thereby prevent overfitting (A data with overfitting refers to the ones with noise and outliers).

So finally after pruning, the algorithm tries to calculate the output values for each nodes with the branches in the tree and here it uses a regularization parameter (lambda).

So now we have the output value for each leaves and then finally for each leaves system tries to make the prediction scores using the initial predicted value, learning rate and the output values.

Basically as we can see below, the graph plots the new residuals (green) based on the above notional tree that we have created, algorithm has tried to make new predicted scores by decreasing the residuals from the initial prediction, so it has taken a small step towards the correct direction.

Now, when we get a new record for the future with temperature equals 16.5, system would try to make the forecasted value as 125 and if we get a new record for the future with temperature equals 21 system would try to make the forecasted value as 225.

Note: All the values that I have used to describe the examples are purely notional as there are many ways to build an XG Boost trees but the underlying concept remains the same so that it gives reader a conceptual clarity to connect the dots before delving into the settings of an algorithm.

Settings for Extreme Gradient Boosting in SAP IBP 2402:

1.????? As of 2402 release XG Boost can be used only with the weekly periodicity and cannot be used either at day level or monthly and so on.

2.????? SAP recommends to use at least thirty weeks of training data for reliable outputs.?

3.????? Maximum number of trees: This is the maximum number of decision that we would want the model to cluster and split with different thresholds before it arrives at the optimal solution. The value must be within the range between 1 and 50.

4.????? Learning rate: This is the weight applied to calculate the output (predicted scores) of the decision trees, the value has to be between 0 and 1. This helps to decrease the contribution of each decision tree when added-up.

5.????? Maximum Tree depth: The maximum number of levels in a tree. Each additional level increases the number of leaves (terminal nodes in the tree) with a factor of 2. This means that the number of leaves in a tree is equal to 2L where L represents the number of levels. Since the algorithm builds many trees, the individual trees don’t need to be very deep. The value has to be an integer between 1 and 5 inclusive.

6.????? Independent Variables and Impact Analysis: The system runs an analysis to determine the impact of various independent variables on the final calculation.

7.????? Baseline for Forecast: To calculate impact analysis, we need to select a key figure to store the baseline values, which are calculated based on the zero-impact setting you define for each independent variable.

8.????? Independent Variables: These are the variables that system uses to determine the decision trees, in our conceptual example I used temperature but you have the below variables as standard.

a.????? Ordered Quantity

b.????? Delivered Quantity

c.????? Forecast?(optional)

d.????? Forecast Snapshot?(optional)

e.????? Consider System Calendar?(optional)

f.?????? Consider Change Points?(optional)

9.????? Date Range: Use this setting to specify whether the algorithm should consider present and future values of the key figure in addition to historical values. Any key figure derived from the main input key figure should be set as historical.

10.?? Period Offset: For some independent variables, you can define the period offset in the past and an offset in future to mimic the impact of these variables on the periods before and after each period for which the variable contains values

11.?? Zero-Impact Setting: The independent variable value that should be considered as having no impact on the forecast during impact analysis.

12.?? Impact Key Figure: The key figure in which you want the results of independent variable impact analysis to be stored. You can choose the same key figure for more than one variable if you wish, except for the system calendar variable, which must be unique. If you don’t choose an impact key figure for an independent variable, the impact calculated for it is added to the baseline value.

13.?? Additional Independent Variables: These are additional independent variables that act as a signals for the algorithm to consider in building trees. The below restriction applies when using the additional independent variables.

a.????? They should have daily or weekly periodicity.

b.????? Their base planning level is either the same level at which the algorithm is run for example, product-location-customer) or a more granular level that can directly be aggregated up to the level at which the algorithm is executed.

14.?? For each additional signal, you can choose a key figure, a date range, period offsets, an importance key figure, an impact key figure, and a zero-impact setting, and also can define if the additional signals are either categorical or not.

For more information on the available settings, please refer the SAP IBP help link at the end of this article.

Recommendations & Considerations to pick Extreme Gradient Boosting:

1.????? For reliable outputs, thirty weeks of training data should be the minimum considered; however XGB can process a minimum of six weeks of training data. To see seasonality effects, additional data that includes at least two seasonal cycles is required.

2.????? The quality of the training data is important for good predictive accuracy results. If the quality of the data is poor, for example, minimal training data, data that has a lot of null values, or data that is not reflective of future growth or contractions, then the predictive accuracy results may reflect the same.

3.????? Complex training datasets may increase run time.

4.????? Consider choosing extreme gradient boosting

a.????? When you want to perform lag based demand forecast.

b.????? When you have data with Non-linear relationships,

c.????? When you have large and complex data

d.????? When you want to perform Feature engineering by including external factors as additional features in the forecast model etc.

Looking forward to reading your insights on the latest machine learning algorithm for supply chain optimization! ???? Shivanesh Kumar, CPIM, DDPP

Best SAP IBP Training

Solutions Architect at BEST SAP IBP Training

9 个月

Thanks a lot for sharing this, very well explained, cheers

Excited to learn more about XG Boost in supply chain forecasting! ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了