Improving Accuracy Measurement and Forecasting Performance for Lead-Time Demand Planning and Forecasting Applications
In a previous article on forecast accuracy and performance analysis, I explored how a Profile Analysis can help?get you more effective models with improved forecasting performance. Assessing forecasting performance has always been a challenging task, even without disruptive times.
By a lead-time forecast is meant a multi-step ahead forecast from a fixed origin and for a predetermined time horizon.
In many demand planning and forecasting applications, a lead-time forecast is commonly used for
- planning product mix based on future patterns of demand at the item, product group and store level, or
- setting inventory safety stock levels for SKUs at multiple locations, or
- conducting S&OP and annual budget planning meetings, and
- selecting best methods in forecasting competitions.
In demand planning and forecasting organizations, a variety of accuracy measures are used to evaluate forecasting performance, often for the purpose of finding the ‘best’ (interpreted as the most accurate) Method going forward. While striving for a Best Method may be an ill-conceived and illusionary practice, it is not uncommon for a lead-time forecast to get misinterpreted as a multiple one-step-ahead point forecast (that has a moving or rolling origin)?in contrast to a single one-step-ahead lead-time forecast with a fixed origin and prescribed forecast horizon.
For point forecasting with a rolling origin, overrides and management adjustments are commonly made over the horizon period and it is then a best practice to apply MAE, MSE, MAPE , sMAPE, MASE, sMASE and other measures using the arithmetic mean for deriving a typical measure of accuracy. This Gaussian mindset may not be appropriate for lead-time demand forecasting once you start examining the data. Instead, I will propose an objective,?data-driven approach, using information-theoretic concepts, that planners and managers can use to measure the accuracy and performance of both intermittent and regular demand.?
I participated in the M3 competition several decades ago with the PP-Autocast entry, which used a family of exponential smoothing algorithms for trend/seasonal patterns.?PP-Autocast created out-of-sample patterns or forecast profiles with automated algorithms designed by Everette Gardner and Ed McKenzie around 1985 for trends and trend/seasonal forecasting with damping (see Damped Trend forecast profiles below). The methods were originally shown to be very effective for use with trending telecommunications data, and later in PEER Planner, a commercial client-server forecast decision support system I developed for business forecasting applications in supply chain organizations.
The methods are illustrated with examples in my Change&Chance Embraced book. For intermittent data, Croston Methods, SES and other ‘level’ profile methods make suitable benchmark forecasts to use with the conditional LZI method for intermittent demand forecasting.?My interest here is to gain insight into how these methods can be made more useful and efficient for e-commerce forecasting through a data-driven exploratory data analysis (EDA ) process. At the time of the M3 competition , computer power was still too limited and expensive for large-scale testing and evaluations.
Encoding Forecast Profiles into Alphabet Profiles
To apply information-theoretic concepts, the individual forecasts and actuals are first ‘encoded’ or transformed into corresponding FAPs (Forecast Alphabet Profile) and AAP (Actual Alphabet Profile) without changing the underlying data pattern. This can be done by dividing a lead-time Total into each component of the respective profiles. For a given forecast or actual profile, a Forecast Alphabet Profile is denoted FAP = {f(1) f(2), . . . f(m) ], where each forecast is divided by the sum of the forecasts over the horizon m. Likewise, the actual alphabet profile is AAP ={a(1), a(2), . . . a(m)], where each actual is divided by the sum of the actuals over the lead-time m.
A forecast Profile Error (FPE) is measured by the difference:
The sum of the FPE over the horizon period, called Profile Miss, can be interpreted as a measure of ignorance about the Forecast Profile Error. The closer to zero the better. The units are known as ‘nats’ (for natural logarithms).
Thus, a forecast Profile Miss measures how different a forecast profile (or alphabet pattern) differs from a profile of actuals over a fixed horizon.
For alphabet profiles, there is a measure of information or entropy H(a) that gives the information about the actual profile (AAP) and an entropy H(f) that gives a measure of information about a forecast profile (FAP). The FAP Miss = H(f) – H(a).
The performance of the process that creates a forecast profile is of interest and can now be measured by a ‘distance’ measure between a Forecast Alphabet Profile (FAP) and Actual Alphabet Profile (AAP). A performance measure for a forecast alphabet Profile Accuracy is given by a Kullback-Leibler divergence measure D(a|f):
The sum can be interpreted as a measure of ignorance or uncertainty about Profile Accuracy.
Divergence measures are widely used as accuracy measures in various applications in climatology, neuroscience and machine learning. Here D(a,f) is interpreted as a measure of dissimilarity or ‘distance’ between?profiles FAP and AAP. This profile accuracy measure is non-negative and is equal to zero if and only if the actual equals the forecast.
When D(a|f) = 0, the alphabet profiles overlap, or what is considered to be 100% accuracy.?
This profile accuracy measure can be used as a means of identifying an effective Method with the
Levenbach L-Skill score = 1 -?[D(a|Method)/D(a|Na?ve Benchmark)]
for lead-time forecasting applications.
A Profile Accuracy Measure Decomposition
There is also a measure of information H(a|f) , which is the information about the FAP given we know the actual AAP.?The Profile Accuracy measure?D(a|f) = H(a|f) – H(a) that can be decomposed into two components: (1)?Profile Miss = H(f) – H(a)) and (2) a Profile Relative Skill measure?= H(a|f) - H(f).
?In a context of accurately aiming darts (forecasts) at a dart board (S&OP meeting), a forecaster using judgment, methods, and models should aim for the bullseye on a dart board. This requires both accuracy and skill level at how well the forecaster can get the darts to strike the board nearest the bullseye.
A Profile Relative Skill measure , designed for lead-time forecast performance evaluation, is in absolute value greater than zero, but does not include zero. The smaller, in absolute value, the better the result. When an FAP is constant, as is the case with the Croston methods, SES, Naive1 and MAVG-12 projections, the Relative Skill is always zero, meaning that with approaches producing level forecasts, a very accurate profile forecasts are not possible. On the other hand, they might be useful as na?ve benchmarks for profile forecasting.
I have divided my exploration into four steps using about half of the M3 data (1428 monthly series) with examples of data checking, data correction, and performance measurement.
- ?he relative skill score for a lead-time forecast?with a Method equals zero?when the time series has a flat forecast profile over the lead-time horizon.?For example, in the 1428 M3 monthly dataset, Methods 6 and 10 have about 7% of the forecasts with PMiss/D(a|f) = 1.0. That can also occur?with perfect forecasts for then PMiss/D(a|f) = 0/0 = 1.0. That being extremely unlikely, we must infer that the forecasts are not seasonal, but rather smooth and trending.
- Because Na?ve-2 is a re-seasonalized level forecast (with Ratio-to-Moving Average (RMA) seasonal factors), it suggests that about 28% of the series are not seasonal with the Naive2 method.?
- When the L-Skill score is negative, a Method is not effective.?Then the e-commerce forecaster would be inclined to stick with a simpler benchmark Method.
- A Method contributes to the forecasting process when the L-Skill score for a series is positive. The most effective method of the three shown in the table is the THETA Method.?
- For the methods used in the M3 competition, the L-Skill scores ranged from about 30% effective to 65% effective. This is not a definitive ranking of Methods, as the M3 competition is only?a multi-step-ahead forecast of a single hold-out period. The lead-time period would need to be repeated over a moving profile horizon to get a valid comparison of performance measures. In practice, the forecaster would need to examine the L-Skill scores on an ongoing basis.
?I invite you to share your data-driven experience with data quality and performance issues. For those interested, drop me an email message and I will forward you the spreadsheet template with M3 monthly data calculations I used to perform this EDA. As there are no programming steps involved,?you should be able to drop in your own comparable data and see if you find any helpful parallels in data quality and forecasting performance for your lead-time demand applications.
Hans Levenbach, PhD is Owner/CEO of Delphus, Inc and Executive Director,?CPDF Professional Development Training and Certification Programs .
Hans is the author of a forecasting book (Change&Chance Embraced) recently?updated with the LZI method for intermittent demand forecasting in the Supply Chain.
With endorsement from the International Institute of Forecasters, he created the first certification curriculum for demand forecasters (CPDF) and has conducted numerous, hands-on?Professional Development Workshops ?for Demand Planners and Operations Managers in multi-national supply chain companies worldwide.
The 2021 CPDF Workshop manual is available for self-study, online workshops, or in-house professional development courses.
Hans is a Fellow, Past President and former Treasurer, and member of the Board of Directors of the?International Institute of Forecasters .
He is Owner/Manager of these LinkedIn groups: (1)?Demand Forecaster Training and Certification, Blended Learning, Predictive Visualization , and (2)?New Product Forecasting and Innovation Planning, Cognitive Modeling, Predictive Visualization .
I invite you to join these groups and share your thoughts and practical experiences with demand data quality and demand forecasting performance in the supply chain. Feel free to send me the details of your findings, including the underlying data without identifying proprietary descriptions. If possible, I will attempt an independent analysis and see if we can collaborate on something that will be beneficial to everyone.
General Manager
3 年Fantastic piece. Thanks for sharing Hans.
Professor Emeritus at Georgetown University
3 年Hans: what is the best citation for your work on profiles? I am doing a discussion piece on the M5 competition and would like to cite it.