登录查看更多内容

A Deep Dive into Finding Best Performing Methods for Lead-time Demand Planning and Forecasting

Hans Levenbach PhD CPDF

发布日期: 2021年5月7日

Most business (macro) and demand (micro) planners and forecast practitioners know there are many different kinds of forecasting techniques available, ranging from elementary and more complex ARIMA time series to econometric models and newer computer-intensive statistical/machine learning (SL/ML) approaches, recently popularized in the M5-forecasting competition.?In the supply chain, e-commerce forecasts are needed when?

·????????planning product mix based on future patterns of retail demand at the item, product group and store level over a planning horizon

·????????setting safety stock levels for SKUs at multiple locations for inventory planning

·????????conducting S&OP and annual budget planning meetings for demand planning

When participating in the M3 Competition some thirty years ago, I was prompted to submit a modelling approach that could achieve high forecasting accuracy and best performance for lead-time demand forecasts in a ‘big data’ setting. The technology available at the time (10Mhz 8088s on IBM PC-AT with optional math co-processor chip and a 20 MB hard disk drive,?included as standard) did not allow for much data analysis. So, now I would like to explore?the M3 competition data with exploratory data analysis (EDA) tools to get better insights into the effectiveness of the various methods and models.?

I have become somewhat skeptical of the published M3-competition as my preliminary explorations revealed significant input data quality issues, like negative forecasts, clear outliers and un-noted data anomalies that can impact forecasting performance results with unintended consequences. In the M3 competition, the ‘big dataset’ is comprised of 3003 business time series. From these, I selected 1428 monthly series, and proceeded on a careful data cleansing exercise.

Why Use Exploratory Data Analysis (EDA)?

The most widely used accuracy measure in demand planning is the Mean Absolute Percentage Error (Mean APE). While King of the APEs, the MAPE has to be used with caution because a zero actual (e.g., intermittent demand) produces an indeterminate result (i.e., infinite big APE).?However, for performance measurement purposes, the key objective is to determine a typical value of central tendency, In this case, as well as for other accuracy measures, a smarter option is to use M-estimation to create a Typical APE (TAPE) ?measure instead of using an arithmetic mean to average results (the Gaussian mindset). To start with, the median APE (MdAPE) is also a typical measure of central tendency and performs better than the arithmetic mean.

The Gaussian mindset: An arithmetic mean is something you COULD always calculate, but SHOULD you always?

What you may find, once you actually start looking at data, is that just a few unusual values or outliers can have a big impact on how you should be making comparisons and summarize results. I learned this early in my career from John W. Tukey (1923 - 2000) who looms large over the field of data visualization and data science generally. He famously coined the term “bit” and invented the box plot. Tukey developed EDA best practices in his 1977 book Exploratory Data Analysis and also directed the Princeton Robustness study (1970-1971).?

Other accuracy measures, also based on the arithmetic Mean, like sMAPE, MASE, sMASE, and MAE, may also get misinterpreted with the same shortcomings as the arithmetic mean. When forecasting with a moving origin, overrides and management adjustments are commonly made over the horizon period and it may then a best practice to use MSE, MAE, MAPE, sMAPE, MASE, and variations thereof for performance analysis. but this may not appropriate for lead-time demand forecasts.

A Profile Analysis for Forecast Model Evaluations

When a lead-time forecast gets created, it is defined as a multi-step ahead forecast from a fixed origin and for a predetermined time horizon. The M-competition forecasts are lead-time forecasts in that sense. In practice, the lead-time is regarded as a frozen period in which no changes or overrides are made. Hence, a lead-time forecast can be viewed as a one-step ahead forecast of a historical pattern or profile. It seems a questionable practice when a “best method” becomes identified without repeating lead-time forecasting cycles with holdout samples, as should be recommended for a best practice.?For practical reasons, a best practice should require multiple, follow-up lead-time forecasts to be created with identical time horizons. Though possible, that has not been a best practice with the M-competitions.

?Using information-theoretic concepts for profile analysis, demand planners and forecast practitioners can assess accuracy measures and forecast performance evaluations of lead-time demand forecasts in a more objective manner

This information-theoretic methodology was previously introduced in several of my articles: (1, Dec 2020), (2, Jan 2021), (3, Feb 2021) and (4, March 2021) on my LinkedIn Profile. The performance of an e-commerce forecasting process that creates trend/seasonal forecast profiles is now outlined. The first step is to encode or map a lead-time forecast into a Forecast Alphabet Profile (FAP), which has the same pattern as the forecast except that the data have been rescaled (see the spreadsheet example below). In this step, the FAP values are created by dividing a Lead-time Total into each component of the respective profiles. Likewise, the Actual Alphabet Profile (AAP) is obtained by dividing each actual by the sum of the actuals over the predetermined horizon. This makes all profiles comparable for forecast performance evaluations.?

A Profile Error is defined as a ‘difference’ between the profile of the actuals and a forecast profile, but not in the conventional sense of measuring accuracy by ‘Actual minus Forecast’ differences. Here, the difference between Actual and Forecast alphabet profiles is given by the formula

Accuracy can be measured with a ‘distance’ measure between a Forecast Alphabet Profile (FAP) and Actual Alphabet Profile (AAP). A forecast Profile Miss (PMISS) is given by the sum

A forecast Profile Accuracy is defined by the Kullback-Leibler divergence measure D(a|f). The sum can be interpreted as a measure of ignorance or uncertainty about Profile Accuracy. When D(a|f) = 0, the alphabet profiles overlap, which is considered to be 100% accuracy.?

The profile accuracy D(a|f) is greater than zero and equals zero if and only if ai = fi, for all i.?In other words, when the forecast pattern is identical to the pattern of the actuals.

For an initial exploration of trend/seasonal profiles, I have selected the 1428 monthly series from the M3-competition data with seven of the 24 Methods most commonly used, and easily reproduceable so that additional forecasting cycles can be analyzed with the data. These M3 methods were designed to produce level, trend or trend/seasonal profiles over a forecast time horizons.

In addition, I am including three modified methods to make the analysis more objective and more readily reproduced:

1.??????Na?ve-1, has a level forecast profile generated from latest actual repeated as a forecast over the 18 monthly holdout time horizons.?Unlike Na?ve-1, Na?ve- 2, has been re-seasonalized with factors that are not readily reproduceable from the documentation.

2.??????SES is a simple exponential smoothing model that also has a level profile. M3 SINGLE is a seasonalized SES not readily reproduceable from the documentation.

3.??????Holt (2) is the algorithm with a straight-line forecast profile as defined by the original Holt Method. M3 HOLT is a seasonalized Holt Method

Point Forecast Accuracy May Not Be Adequate for Assessing the Performance of Forecast Profiles

While striving for the best method may not be feasible or seen as a best practice, it is not uncommon for lead-time forecasting to get misinterpreted as a multiple, one-step-ahead point forecasting approach, created with moving or rolling origins. This is not the same as a one-step ahead lead-time profile forecast from a fixed origin and predetermined time horizon. With point forecasting and a moving origin, overrides and management adjustments are commonly made over the horizon period and it is then a best practice to use MSE, MAE, MAPE, sMAPE, MASE, and variations thereof for performance analysis. but this may not appropriate for lead-time demand forecasts.

Identifying Effective Methods for Lead-time Forecasting

W need to recognize that there can be no best forecasting method, only useful ones as uncertainty always enters into the fray as a certain factor driving demand. In a relevant quote, attributed to the world-renowned statistician George E. P. Box (1919-2013), it is worth remembering that: “All Models Are Wrong, Some Are Useful”.?

For purposes of this data-analytic exploration, I have paraphrased it to say “All Data are Wrong, Some Are Useful”, which applies to the lead-time forecasts in the M-competitions, as well.

Bernard Marr 8 年前

Why Demand Forecasting does not work well and how…

Ashutosh Bansal 1 年前

The Power of Data Analytics in Supply Chain Management

Rafael A. Vela 1 年前

Using a proper skill score, we can rank every series forecasted by a Method with the D(a|f) accuracy measure for forecast profiles. I define a Levenbach L-Skill score by?1 - [D(a|Method)/D(a|Benchmark])]. The L-Skill score ranges from minus infinity to +1.?The Na?ve1 and Naive2 benchmarks have L-Skill scores = 0, by definition.

Positive L-Skill scores are associated with an effective profile forecasting Method. The Methods with the highest percentage of positively contributing profile forecasts should be of interest.

The first two columns in the table below show the results for seven M3 Methods using the Na?ve-1 (N1) and Naive-2 (N2) as benchmark methods. Thus, Naive-1, as a Method, has only 16% better forecasts than the Naive-2 benchmark Method. Likewise, the Naive-2 Method is 49% more effective than the Naive-1 benchmark Method. For the other Methods, the percentages are comparable, but note that the series calculated with sMAPE 'accuracy difference' scores do not have identical distributions between benchmark methods (see scatter diagram below).

How to Associate Effective Methods with Skill Scores

The M3 competition organizers calculated a spread between accuracy measures as a means of ranking Methods. In particular, the sMAPE(Na?ve-2 Benchmark) -?sMAPE(selected Method) spread was calculated and averaged over all 1428 series to rank Methods with a simple average. The box and whisker plots in the left diagram show the distribution of the 1428 “sMAPE accuracy differences” for the THETA Method with the two benchmark methods.?

The series with positive spreads contribute to the effectiveness of a Method. The simple averages of the sMAPE accuracy spreads are 3 and 4, respectively (left box and whisker plots), and the median of the sMAPE accuracy spread is 2 for each benchmark. However, with the very skewed distribution of the underlying numbers, I have reservations that differences in arithmetic means calculated from these data would yield much meaning by ranking Methods. As seen on the right scatter diagram above, the series identified with positive accuracy spreads are not the same for each benchmark method, either. Thus, many series can be effective contributors (positive spreads) when using one benchmark method and not with another (contrast quadrants I and III in the scatter plot).

Note, in the left scatter diagram above, that the 'spreads in accuracy' are a function of the benchmark method used. Like a regression scatter plot, the diagram shows the effective 'sMAPE(Na?ve-2) -?sMAPE(THETA Method)' spreads versus sMAPE(Na?ve-2) point accuracy for the M3 monthly series with the 18-period holdout sample. So, simple averaging does not tell the right story. Rather than using spreads in accuracy measures, we should be using ratios instead. On the right scatter diagram, it shows the relationship between the sMAPE accuracy spreads with the two benchmarks for the entire 1428 monthly dataset.?They are not closely related. This leads to using ‘proper’ skill scores for performance analysis.

We can get further insights into effective Methods with proper Skill scores. The sMAPE Skill score is given by [1- sMAPE(selected Method)/sMAPE(benchmark)], which has a numerator that is the same as the accuracy spread : [sMAPE(benchmark) -?sMAPE(selected Method)]. Using the Naive-1 and Naive-2 benchmarks for comparison, we find a 72% effectiveness rating for the THETA Method in this, single static planning cycle evaluation. The benchmark matters if you need to make comparisons. In a comparison using the L-Skill score, the percent effective is about the same (68 - 69%), although the scatter diagram suggests that they are not necessarily closely related. For comparison, I have added the MAPE Skill score and the RAE Skill score to the table.

In an earlier data exploration, only three of the 24 methods were used on the 1428 monthly time series with the 18-month lead-time holdout samples. Method A was shown to be effective for about two-thirds of the time series, as were Methods B and C (not used in the Table above).?Getting experience with dynamic lead-time forecasting cycles is essential for a demand planner to determine whether a change in effectiveness ('doing the right things") rankings are important in a particular context.

What Does Data Exploration Reveal?

With experience, we can get smarter at the task and more agile in the forecasting process when we monitor data quality throughout the entire forecasting process, i.e., before and after using a lead-time forecast. In an article posted on LinkedIn and Delphus.com, I demonstrate the importance of identifying and correcting even a single outlier in a highly seasonal, readily forecastable series (M3 series N2796). This was not the only instance in which I found consequential problems with isolated outliers.

Paraphrasing Ed Deming, I find that "bad data beats a good forecast every time".

N1906 is an M3 series that has an L-Skill score of 0.99, the highest encountered among 19 out of the 24 methods. As you can see in the table below, the series N1906 is ranked high, if not the highest, for Methods using Skill scores based on more familiar accuracy measures.

It may also be useful to examine the data visually, so that you can understand the difference between point forecast accuracy, based on vertical distances between actuals and forecast, and profile accuracy measurement with D(a|f). For example, N2519 also has a high-profile skill score for a number of Methods but may end up with quite different results using accuracy measures based on absolute differences between actuals and forecasts.?

Practical Takeaways

To validate effective forecasting methods for demand planning, one should follow transparent empirical findings that complete dynamic, multiple profile forecast cycles over a predetermined lead-time (time horizon) with real-world data.?Then, calculate and calibrate the skill scores.
It is smarter forecasting, in my experience, when you understand what forecast profiles get generated from a Method than specialize in the details of how the model was estimated. You will achieve greater agility and productivity in your demand forecasting process by not getting too tied up with tweaking parameter estimates and “optimal fitting” issues.
The spreadsheet environment is more than adequate, and perhaps more flexible for data explorations than established commercial systems. One generally requires simple operations and visualizations for which the spreadsheet environment shines. Also, the open-source language, like R is a free, modern and convenient software tool to analyze and compare forecasting results.

If you want to contact me at [email protected], I will be happy to share the spreadsheet used for the calculations in this article. The M3 data are freely available in a convenient format (csv) for use in Excel and R. A paper (pdf) referencing the M3 Competition results can be downloaded as well.

Temper your trained Gaussian Mindset and remain skeptical of simple averaging as the best means of comparing things. Central tendencies may depend a lot on how the rest of the numbers are distributed in your measurement and correlated with corresponding values in a different measure.

Hans Levenbach, PhD is Owner/CEO of Delphus, Inc and Executive Director,?CPDF Professional Development Training and Certification Programs.

Dr. Hans is the author of a forecasting book (Change&Chance Embraced) recently updated with the new LZI method for intermittent demand forecasting in the Supply Chain.

With endorsement from the International Institute of Forecasters (IIF), he created CPDF, the first IIF certification curriculum for the professional development of demand forecasters. and has conducted numerous, hands-on?Professional Development Workshops?for Demand Planners and Operations Managers in multi-national supply chain companies worldwide.

The 2021 CPDF Workshop manual is available for self-study, online workshops, or in-house professional development courses.

Hans is a Fellow, Past President and former Treasurer, and member of the Board of Directors of the?International Institute of Forecasters.

He is Owner/Manager of these LinkedIn groups: (1)?Demand Forecaster Training and Certification, Blended Learning, Predictive Visualization, and (2)?New Product Forecasting and Innovation Planning, Cognitive Modeling, Predictive Visualization.

I invite you to join these groups and share your thoughts and practical experiences with demand data quality and demand forecasting performance in the supply chain. Feel free to send me the details of your findings, including the underlying data without identifying proprietary descriptions. If possible, I will attempt an independent analysis and see if we can collaborate on something that will be beneficial to everyone.

Akram Khan (MA KAN)

Sr. Operations Admin_FedEx

3 年

Awesome, thanks for sharing, have wonderful mother day!

1 次回应

查看更多评论

要查看或添加评论，请登录

Hans Levenbach PhD CPDF的更多文章

Don't Ban the Mean APE; Instead Fire the Consultants Who Advance the Ban

2024年9月13日

Don't Ban the Mean APE; Instead Fire the Consultants Who Advance the Ban

Over the years, there has been much written by consultants and academic researchers about retiring the Mean Absolute…

2 条评论
Why Demand Forecasting is So Crucial to Supply Chain Planners and Managers

2024年8月18日

Why Demand Forecasting is So Crucial to Supply Chain Planners and Managers

Why Demand Forecasting? Forecasting for demand planning and management in the supply chain generally attempts to…

14 条评论
Creating Useful Models for Demand Forecasting and Planning Applications

2024年3月23日

Creating Useful Models for Demand Forecasting and Planning Applications

All Models are Wrong. Some are Useful.

2 条评论
How to Gain Insights Into Forecasting the Demand for New Products and Services

2024年2月8日

How to Gain Insights Into Forecasting the Demand for New Products and Services

Forecasting the demand for new products and services requires a combination of data analysis, market research, and…

2 条评论
e-Commerce Forecasting: A New Challenge for Demand Planners in the Supply Chain

2024年1月30日

e-Commerce Forecasting: A New Challenge for Demand Planners in the Supply Chain

E-commerce demand planning has indeed presented a new set of challenges for supply chain professionals, often requiring…

4 条评论
How to Detect and Correct Outliers in Correlation Analysis

2024年1月2日

How to Detect and Correct Outliers in Correlation Analysis

In a correlation analysis, outliers can have a significant impact on the interpretation of a correlation coefficient…

3 条评论
Why Demand Planners May Need to Use More Nonconventional Approaches in the Sales and Operations Planning (S&OP) Process

2023年11月12日

Why Demand Planners May Need to Use More Nonconventional Approaches in the Sales and Operations Planning (S&OP) Process

Demand planners in supply chain organizations are so accustomed to using the Mean Absolute Percentage Error (MAPE) as…

1 条评论
How to Achieve 100% Customer Service at Close to Minimum Inventories

2023年11月2日

How to Achieve 100% Customer Service at Close to Minimum Inventories

by Klaus Spicher: Contact: [email protected] Introduction Customer Service represents the key driver for planning…

1 条评论
Why Forecasting Methods Need to Be Brought Up-To-Date Before They Become Obsolete Tools!

2023年10月2日

Why Forecasting Methods Need to Be Brought Up-To-Date Before They Become Obsolete Tools!

In today's fast-paced and ever-changing business landscape, staying ahead of the curve is essential for success. One…

7 条评论
E-Commerce Forecasting: A Smarter Role for Demand Forecasters, Planners and Managers in the Supply Chain

2023年9月27日

E-Commerce Forecasting: A Smarter Role for Demand Forecasters, Planners and Managers in the Supply Chain

Introduction In the age of e-commerce, where online shopping has become an integral part of our daily lives, the role…

See all articles

A Deep Dive into Finding Best Performing Methods for Lead-time Demand Planning and Forecasting

Hans Levenbach PhD CPDF

Why Use Exploratory Data Analysis (EDA)?

A Profile Analysis for Forecast Model Evaluations

Identifying Effective Methods for Lead-time Forecasting

领英推荐

How to Associate Effective Methods with Skill Scores

What Does Data Exploration Reveal?

Practical Takeaways

Hans Levenbach PhD CPDF的更多文章

社区洞察

其他会员也浏览了

How Data Quality Issues Impact the Performance of Seasonal Forecasts in Demand Planning

Supply Chain Analytics: Benefits, Challenges & Architecture

How to Forecast Service Parts with Intermittent Data

How to Select Useful Forecasting Methods for "Big-Data" Demand Planning and Forecasting Applications

Data Drama to Data Dream: Fixing Your Supply Chain Planning Woes

Why Demand Planners Need to Improve Data Quality Before Using Forecasting Models for Planning and Budgeting

Supply Chain Planning: Teaching an Old Gal New Tricks

Selecting the Right Forecasting Method: Key Variables and Considerations

The Importance of Supply Chain Analytics: Unlocking Efficiency and Insight

Using AI For Inventory Optimization

Why Use Exploratory Data Analysis (EDA)?

A Profile Analysis for Forecast Model Evaluations

Identifying Effective Methods for Lead-time Forecasting

领英推荐

How to Associate Effective Methods with Skill Scores

What Does Data Exploration Reveal?

Practical Takeaways

Hans Levenbach PhD CPDF的更多文章

Don't Ban the Mean APE; Instead Fire the Consultants Who Advance the Ban

Why Demand Forecasting is So Crucial to Supply Chain Planners and Managers

Creating Useful Models for Demand Forecasting and Planning Applications

How to Gain Insights Into Forecasting the Demand for New Products and Services

e-Commerce Forecasting: A New Challenge for Demand Planners in the Supply Chain

How to Detect and Correct Outliers in Correlation Analysis

Why Demand Planners May Need to Use More Nonconventional Approaches in the Sales and Operations Planning (S&OP) Process

How to Achieve 100% Customer Service at Close to Minimum Inventories

Why Forecasting Methods Need to Be Brought Up-To-Date Before They Become Obsolete Tools!

E-Commerce Forecasting: A Smarter Role for Demand Forecasters, Planners and Managers in the Supply Chain

社区洞察

其他会员也浏览了

How Data Quality Issues Impact the Performance of Seasonal Forecasts in Demand Planning

Supply Chain Analytics: Benefits, Challenges & Architecture

How to Forecast Service Parts with Intermittent Data

How to Select Useful Forecasting Methods for "Big-Data" Demand Planning and Forecasting Applications

Data Drama to Data Dream: Fixing Your Supply Chain Planning Woes

Why Demand Planners Need to Improve Data Quality Before Using Forecasting Models for Planning and Budgeting

Supply Chain Planning: Teaching an Old Gal New Tricks

Selecting the Right Forecasting Method: Key Variables and Considerations

The Importance of Supply Chain Analytics: Unlocking Efficiency and Insight

Using AI For Inventory Optimization