登录查看更多内容

A Universal Forecast Error Metric Triad

Stefan de Kok

? Supply Chain Innovator ?

发布日期: 2015年7月20日

Disclaimer: this article is highly technical in nature and assumes familiarity with forecasting and forecast error metrics.

For years I have been searching for a forecast error metric that could be used in all demand forecast scenarios, and have even suggested my own new metric (MANE = Mean Absolute Normalized Error) along the way. There are mountains of research to suggest that I am not the only one on this quest and also that a satisfactory result is still elusive.

In this article I put forward three new closely related metrics that I believe combined cover most scenarios - all the practical ones in the supply chain domain - and with minor variations would cover the remainder. The fact that I propose three metrics, and not just one reflects my conclusion that at least three different aspects of forecast error need to be measured:

Precision - the magnitude of the error.
Bias - the central direction of the error.
Value-add - how much value is a forecast adding compared to a reference forecast.

Of the existing error metrics, bias and forecast value-add (FVA) come close to being universal, both satisfying all but one of my requirements. Precision is the one with most issues, and as a results has spawned many different competing metrics, not one of which is perfect. As an example, the most commonly used metric for forecast error precision is the MAPE (Mean Absolute Percentage Error) which has the huge drawback that it breaks down as soon as even one period has zero demand.

Note that of the three FVA is the only aspect of forecast error that should ever be used to compare forecasts across different sets (such as different companies, different product groups, different accounts, different forecasters, etc...). The other two aspects are never fair comparison metrics.

The Requirements

The requirements that each of the metrics in this triad must satisfy are the following:

Must work for ALL demand patterns
Can be weighted to convert it to different types of values
Must allow direct comparison of traditional deterministic forecasts with the more advanced stochastic forecasts
When used on a traditional deterministic forecast ideally degrades to a commonly used existing metric.

Most existing metrics satisfy requirement 2. However, all existing precision metrics violate requirement 1, and all existing metrics for any aspect violate requirement 3, causing requirement 4 to not apply

What is Stochastic

Before I pose the new set of metrics I think I need to explain what is meant by a "stochastic forecast", since this will be new to many readers. Traditional forecast approaches predict one demand value for each future time period for each item, hence they are also known as point forecasts (or in jargon: they are deterministic). Stochastic forecasts on the other hand predict a whole continuous range of values with a probability assigned to each value for each time period and each item. An example where we all experience this is in the weather forecasts. In past decades the weather services would predict it either would rain or it would not rain tomorrow (deterministic). Nowadays they predict a 30% chance of rain (stochastic). With these modern stochastic weather forecasts we can much better plan our day than with the old deterministic one. If we are planning an outdoor event we know what chance we are taking that we may need to cancel it or find alternative accommodations. The stochastic forecast gives us more information to base our plans upon. The same is true for demand forecasting and the downstream planning processes. Most of the firefighting that is pervasive when using traditional forecasts simply does not occur when using stochastic forecasts, since risks can be properly anticipated.

Note that using confidence levels or optimistic and pessimistic scenarios are NOT equivalent to stochastic forecasts. These are certainly an improvement over single-point forecasts, but merely turn them into a 3-point forecast, whereas stochastic is an entire range with infinite possible values, all with their own probability, usually given with a probability density function.

An important side effect of satisfying requirement 3 is that if upper and/or lower confidence levels are provided with a traditional forecast these can be incorporated in the error metrics, since they constitute a small step towards stochasticity. This is something that I have not seen being done with any existing metric.

The Metrics

Each of the three metrics are based on one of two new formulas: a signed error and an absolute error. All the stochasticity resides in these two formulas, allowing further use of these to be fully equivalent to their deterministic counterparts. The first is the Stochastic Error (SE):

The second is the Stochastic Absolute Error (SAE):

where Ait is the actual demand for item i in time period t, pit(x) is the probability density function of item i in time period t across all possible demand values x. When calculating these over multiple periods or multiple items simply take the sum and divide by the number of terms.

Without loss of generality the SE can be reduced to be Ait - Eit (expectation for item i at period t) equal to the bias, where the forecast value in the bias definition is replaced by the mean of the probability distribution. And the SAE is the stochastic equivalent to the traditional Mean Absolute Error (MAE, aka MAD) metric. These can now be used wherever you would otherwise use those traditional counterparts. In fact, as error metrics in the unit of measure of the forecast, these can be used directly, with SE being equal to the traditional bias as demonstrated further down.

For the universal metrics I recommend using percentage errors since those allow evaluation without requiring deeper knowledge of the underlying data set. Also there is relatively little business value in using unit-based metrics for precision and value-add. The three metric are then as follows. Stochastic Percentage Error (SPE) for Universal Bias:

Stochastic Absolute Percentage Error (SAPE) for Universal Precision:

and Stochastic Forecast Value Add (SFVA) for Universal Value-Add:

Where all sums are taken over all items i and all time periods t. The coefficients cit are the weights and qit(x) is a probability density function of a reference forecast. For an unweighted version all cit are equal to 1, but in general they could be either or both time t and item i dependent. Whenever looking at only one time period or only one item the sum (and accompanying subscript) can simply be removed from these formulas.

Note that SPE is the general equivalent of the traditional bias in percentage terms and that SAPE is the stochastic equivalent of the MAD-Mean ratio, which is frequently preferred over the MAPE since it does not have the division by zero issue of the latter, except in the trivial case where all periods for all included items are zero. The SFVA is used to compare one forecast to be tested to another forecast used as a reference. The formula above is used to make the results intuitive: positive values mean value is added and negative values mean value is removed. A result of 25% means the tested forecast had 25% lower absolute error than the reference forecast. A big practical benefit over other FVA metrics such as MASE is that it can be used for multiple incremental improvements rather than just a comparison against a naive forecast, without an additional level of relativity, which would be highly non-intuitive.

How to Apply to Traditional Forecasts

In the above it is mentioned that the suggested universal metrics are equivalent to traditional metrics. Here I describe how to make them identical, which is requirement 4 in case the forecast is a traditional deterministic one. The trick is in how to apply the probability density function pit(x).

Insight: the connection between a stochastic forecast and a deterministic forecast is that the value of the latter equals the mean (or average) of the distribution of the former.

A deterministic forecast thus has a probability of 1 at the mean μ, and equals zero everywhere else. The cumulative density is zero for all values less than μ and 1 for all values greater than or equal to μ.

As an example regard the following stochastic forecast for one item at one specific time period. Along the horizontal axis the possible demand values and along the vertical axis is the associated probability of each value, shown both as a pdf (on the left) and a cdf (on the right):

Stochastic view of a single period forecast

The deterministic equivalent of this using the probability suggested above is:

Traditional single period deterministic forecast seen from a stochastic perspective

The average of the probability distribution is roughly 36 units, which shows as a spike in the pdf graph of the single-point forecast and a jump from zero to one in the cdf graph. When we apply this to the universal error metrics the integral of the SE degrades to a single value evaluation (A-μ) and the SPE becomes identical to the traditional Percentage Error (PE, aka relative bias). Similarly the integral of the SAE degrades to a single value evaluation (|A-μ|) and the SAPE becomes identical to the MAD-Mean ratio.

Applying to Confidence Levels

It is relatively straightforward to extend this to the case where a point forecast is given including confidence levels. Let's assume in this example a lower confidence level (LCL) is given at 5% (which evaluates to roughly 15 units with the given distribution) and an upper confidence level (UCL) at 95% (which evaluates to roughly 76 units with the given distribution. The probability density functions would look like this:

Traditional single period forecast including confidence levels seen from stochastic perspective.

With 0.05 probability small spikes occurs at the 5% and 95% percentiles each, leaving a 0.9 probability for the main forecast. Note that the point where the spike occurs is no longer equal to the mean μ, which is the mean for the entire distribution. But since the distribution - like the example - is not typically symmetrical the average once the two tails beyond the confidence levels have been accounted for will move. For the 5% and 95% confidence levels the new adjusted mean of the remaining range within the confidence bounds is given by:

This evaluates to roughly 44 units for the given distribution where the overall mean is 36 units.

The SE remains equal to the bias of the mean of the distribution: A - μ

Whilst the SAE reduces to: 0.05 * |A - LCL| + 0.9 * |A - μ*| + 0.05 * |A - UCL|

Regardless of whether one chooses to use the suggested stochastic metrics, this same approach can be used whenever forecast accuracy needs to be determined where confidence levels are provided.

Extensions

The innovations are in the definition of the Stochastic Error (SE), the Stochastic Absolute Error (SAE), and how the probability density function is applied to point-forecasts with and without confidence levels. Everything else is run of the mill. It is very easy to extend the approach outlined to mimic most of the more traditional forecast error metrics in case the suggested universal ones are not to your liking.

Since the SE and SAE are specific for one time period and one item. They can be used as a replacement for ME and MAE or MAD in any other metric you may typically use. For example one can mirror the MASE to create a more generic form of it that covers both deterministic and stochastic forecasts using the SAE in the numerator and using the traditional denominator. The MASE was introduced to avoid the issue with the MAD-Mean ratio that it was not ideal when data was heavily trended or other types of non-stationarity over long time horizons. Since the SAPE degrades to the MAD-Mean ratio for deterministic forecasts it will have this same limitation. In such cases you may want to use weighting coefficients in the SAPE that counter the long-term trend or use a stochastic MASE instead.

Also in this article I have demonstrated how to calculate the pdf for single-point forecasts, 3-point forecasts using confidence levels and the full version which assumes a closed form of the pdf is known. However, some stochastic forecasting systems may only provide numeric approximations of a pdf. In those cases the 3-point approach can be extended to whatever raster size the pdf may be provided in.

Conclusion

I have presented an approach to measuring forecast error which I believe to be novel. I have not been able to find anything similar in my quest. I have tested this only on a few small sets of time series.

I would be very interested to get criticism, comments, other feedback. Especially would love to hear from anyone who wants to test this approach on other data sets.

UPDATE: I placed an Excel file in a shared folder with some examples. You can find it here: https://www.dropbox.com/s/71s7r2dknsiumkl/M3C%20-%20SPE%20and%20SAPE%20example.xlsx?dl=0

Please see comments below for more info.

Marco Arias-Vargas

Partner and Managing Director at Macrologística

9 年

Very interesting approach, I'll try to run some tests, I think that the most important thing here, is to have an open mind.

Vladimir Ermakov

Senior Manager - Network Operations Hill's EMEA

9 年

Stochastic forecasts sound very interesting, risk-management wise; however, a lot of business processes and systems would need to be changed in most environments in order to implement them, especially when it comes to replenishment and production planning. In replenishment you can use the pdf as a factor in calculating safety stock levels (in conjunction with bias metrics); but when it comes to production and raw materials procurement, stochastic forecast might be hard to utilize properly.

2 次回应

Hans Levenbach PhD CPDF

9 年

Stefan, Very interesting work, but don't be too hasty claiming originality. There is a vast literature on accuracy (bias and precision) out there that can be found in the International Journal of Forecasting, the Journal of Forecasting and other academically oriented journals in specific disciplines like economics, finance, etc. This is not to discourage you, but to keep validating your approach with real data. There is much less known about how accuracy measurement works in real-world cases, because practitioners may be constrained to report on them or less motivated to share results with other practitioners through posts like these.

1 次回应

Yoeri Sanstra

senior Director Product Delivery at Pharming Group N.V.

9 年

Good article Stefan. These kind of challenges keep your mind busy,.. We are however far from implementing something that advanced in our processes. But this may be very usefull to some of the tool providers.

1 次回应

Andrew Gibson, Ph.D.

Chief Technology Officer at AutoScheduler.AI

9 年

Stefan - a very interesting idea and one that comes at a good time for me. I'm going to try this out on some of my data. Thanks for posting. Andrew

1 次回应

查看更多评论

要查看或添加评论，请登录

Stefan de Kok的更多文章

The Futility of Mapping Forecast Error to Business Value

2024年3月4日

The Futility of Mapping Forecast Error to Business Value

A number of years ago my mom asked me if I held it against her that she beat me so much as a child. My honest answer:…

21 条评论
Safety Stock vs Inventory Optimization

2024年2月19日

Safety Stock vs Inventory Optimization

Many companies go about setting safety stock levels the wrong way. Often this is driven by a misunderstanding by upper…

24 条评论
You Think You Understand Safety Stock?

2023年12月11日

You Think You Understand Safety Stock?

There are a lot of misunderstandings about safety stock. Most people will quibble about what formula makes a safety…

9 条评论
Why Not To Use The Normal Distribution

2023年11月16日

Why Not To Use The Normal Distribution

If you use demand variability, forecast error, or lead time variability to determine safety stocks or other buffers…

39 条评论
Your Forecast Is Already Probabilistic

2023年2月4日

Your Forecast Is Already Probabilistic

If you are like most forecasting practitioners, you may believe probabilistic forecasting is extremely difficult to do,…

10 条评论
How to Measure Forecastability

2021年12月11日

How to Measure Forecastability

In recent weeks the same question in the title above has come up repeatedly. Since I like the DRY principle (don't…

30 条评论
Is Your Accuracy Lagging?

2021年10月14日

Is Your Accuracy Lagging?

A problem I often encounter is companies measuring their demand forecast accuracy incorrectly. Most demand planning…

37 条评论
How to Integrate Information

2021年9月7日

How to Integrate Information

Removing Silos in Planning - Part 6 This article explains how to remove information silos in an easy incremental way…

5 条评论
Decision Silos

2021年8月30日

Decision Silos

Removing Silos in Planning - Part 5 In previous parts, we described process silos, data silos, and information silos…

2 条评论
Information Silos

2021年8月17日

Information Silos

Removing Silos in Planning - Part 4 So, you fixed your process silos and data silos. All your processes are aligned in…

3 条评论

See all articles

A Universal Forecast Error Metric Triad

Stefan de Kok

? Supply Chain Innovator ?

The Requirements

What is Stochastic

The Metrics

How to Apply to Traditional Forecasts

Applying to Confidence Levels

Extensions

Conclusion

Stefan de Kok的更多文章

社区洞察

其他会员也浏览了

Croston Model for Intermittent Demand Forecasting

There are no bad chart types... Right?

Predicting Success: Unraveling Trends with Regression Analysis Methodology

Stock Price Forecasting of PDD Using TimesNet, ARIMA, Transformer, and GARCH

Forecast Fails? No Problem! Mastering the 4 Key Error Metrics Without Losing Your Mind

Target: Objective or KPI

4 metric monsters

Precision and Accuracy: Stop Mixing Them!

Life Data Analysis with only 2 Failures

Superforecasting (Book Summary)

The Requirements

What is Stochastic

The Metrics

How to Apply to Traditional Forecasts

Applying to Confidence Levels

Extensions

Conclusion

Stefan de Kok的更多文章

The Futility of Mapping Forecast Error to Business Value

Safety Stock vs Inventory Optimization

You Think You Understand Safety Stock?

Why Not To Use The Normal Distribution

Your Forecast Is Already Probabilistic

How to Measure Forecastability

Is Your Accuracy Lagging?

How to Integrate Information

Decision Silos

Information Silos

社区洞察

其他会员也浏览了

Croston Model for Intermittent Demand Forecasting

There are no bad chart types... Right?

Predicting Success: Unraveling Trends with Regression Analysis Methodology

Stock Price Forecasting of PDD Using TimesNet, ARIMA, Transformer, and GARCH

Forecast Fails? No Problem! Mastering the 4 Key Error Metrics Without Losing Your Mind

Target: Objective or KPI

4 metric monsters

Precision and Accuracy: Stop Mixing Them!

Life Data Analysis with only 2 Failures

Superforecasting (Book Summary)