A Universal Forecast Error Metric Triad
Disclaimer: this article is highly technical in nature and assumes familiarity with forecasting and forecast error metrics.
For years I have been searching for a forecast error metric that could be used in all demand forecast scenarios, and have even suggested my own new metric (MANE = Mean Absolute Normalized Error) along the way. There are mountains of research to suggest that I am not the only one on this quest and also that a satisfactory result is still elusive.
In this article I put forward three new closely related metrics that I believe combined cover most scenarios - all the practical ones in the supply chain domain - and with minor variations would cover the remainder. The fact that I propose three metrics, and not just one reflects my conclusion that at least three different aspects of forecast error need to be measured:
- Precision - the magnitude of the error.
- Bias - the central direction of the error.
- Value-add - how much value is a forecast adding compared to a reference forecast.
Of the existing error metrics, bias and forecast value-add (FVA) come close to being universal, both satisfying all but one of my requirements. Precision is the one with most issues, and as a results has spawned many different competing metrics, not one of which is perfect. As an example, the most commonly used metric for forecast error precision is the MAPE (Mean Absolute Percentage Error) which has the huge drawback that it breaks down as soon as even one period has zero demand.
Note that of the three FVA is the only aspect of forecast error that should ever be used to compare forecasts across different sets (such as different companies, different product groups, different accounts, different forecasters, etc...). The other two aspects are never fair comparison metrics.
The Requirements
The requirements that each of the metrics in this triad must satisfy are the following:
- Must work for ALL demand patterns
- Can be weighted to convert it to different types of values
- Must allow direct comparison of traditional deterministic forecasts with the more advanced stochastic forecasts
- When used on a traditional deterministic forecast ideally degrades to a commonly used existing metric.
Most existing metrics satisfy requirement 2. However, all existing precision metrics violate requirement 1, and all existing metrics for any aspect violate requirement 3, causing requirement 4 to not apply
What is Stochastic
Before I pose the new set of metrics I think I need to explain what is meant by a "stochastic forecast", since this will be new to many readers. Traditional forecast approaches predict one demand value for each future time period for each item, hence they are also known as point forecasts (or in jargon: they are deterministic). Stochastic forecasts on the other hand predict a whole continuous range of values with a probability assigned to each value for each time period and each item. An example where we all experience this is in the weather forecasts. In past decades the weather services would predict it either would rain or it would not rain tomorrow (deterministic). Nowadays they predict a 30% chance of rain (stochastic). With these modern stochastic weather forecasts we can much better plan our day than with the old deterministic one. If we are planning an outdoor event we know what chance we are taking that we may need to cancel it or find alternative accommodations. The stochastic forecast gives us more information to base our plans upon. The same is true for demand forecasting and the downstream planning processes. Most of the firefighting that is pervasive when using traditional forecasts simply does not occur when using stochastic forecasts, since risks can be properly anticipated.
Note that using confidence levels or optimistic and pessimistic scenarios are NOT equivalent to stochastic forecasts. These are certainly an improvement over single-point forecasts, but merely turn them into a 3-point forecast, whereas stochastic is an entire range with infinite possible values, all with their own probability, usually given with a probability density function.
An important side effect of satisfying requirement 3 is that if upper and/or lower confidence levels are provided with a traditional forecast these can be incorporated in the error metrics, since they constitute a small step towards stochasticity. This is something that I have not seen being done with any existing metric.
The Metrics
Each of the three metrics are based on one of two new formulas: a signed error and an absolute error. All the stochasticity resides in these two formulas, allowing further use of these to be fully equivalent to their deterministic counterparts. The first is the Stochastic Error (SE):
The second is the Stochastic Absolute Error (SAE):
where Ait is the actual demand for item i in time period t, pit(x) is the probability density function of item i in time period t across all possible demand values x. When calculating these over multiple periods or multiple items simply take the sum and divide by the number of terms.
Without loss of generality the SE can be reduced to be Ait - Eit (expectation for item i at period t) equal to the bias, where the forecast value in the bias definition is replaced by the mean of the probability distribution. And the SAE is the stochastic equivalent to the traditional Mean Absolute Error (MAE, aka MAD) metric. These can now be used wherever you would otherwise use those traditional counterparts. In fact, as error metrics in the unit of measure of the forecast, these can be used directly, with SE being equal to the traditional bias as demonstrated further down.
For the universal metrics I recommend using percentage errors since those allow evaluation without requiring deeper knowledge of the underlying data set. Also there is relatively little business value in using unit-based metrics for precision and value-add. The three metric are then as follows. Stochastic Percentage Error (SPE) for Universal Bias:
Stochastic Absolute Percentage Error (SAPE) for Universal Precision:
and Stochastic Forecast Value Add (SFVA) for Universal Value-Add:
Where all sums are taken over all items i and all time periods t. The coefficients cit are the weights and qit(x) is a probability density function of a reference forecast. For an unweighted version all cit are equal to 1, but in general they could be either or both time t and item i dependent. Whenever looking at only one time period or only one item the sum (and accompanying subscript) can simply be removed from these formulas.
Note that SPE is the general equivalent of the traditional bias in percentage terms and that SAPE is the stochastic equivalent of the MAD-Mean ratio, which is frequently preferred over the MAPE since it does not have the division by zero issue of the latter, except in the trivial case where all periods for all included items are zero. The SFVA is used to compare one forecast to be tested to another forecast used as a reference. The formula above is used to make the results intuitive: positive values mean value is added and negative values mean value is removed. A result of 25% means the tested forecast had 25% lower absolute error than the reference forecast. A big practical benefit over other FVA metrics such as MASE is that it can be used for multiple incremental improvements rather than just a comparison against a naive forecast, without an additional level of relativity, which would be highly non-intuitive.
How to Apply to Traditional Forecasts
In the above it is mentioned that the suggested universal metrics are equivalent to traditional metrics. Here I describe how to make them identical, which is requirement 4 in case the forecast is a traditional deterministic one. The trick is in how to apply the probability density function pit(x).
Insight: the connection between a stochastic forecast and a deterministic forecast is that the value of the latter equals the mean (or average) of the distribution of the former.
A deterministic forecast thus has a probability of 1 at the mean μ, and equals zero everywhere else. The cumulative density is zero for all values less than μ and 1 for all values greater than or equal to μ.
As an example regard the following stochastic forecast for one item at one specific time period. Along the horizontal axis the possible demand values and along the vertical axis is the associated probability of each value, shown both as a pdf (on the left) and a cdf (on the right):
Stochastic view of a single period forecast
The deterministic equivalent of this using the probability suggested above is:
Traditional single period deterministic forecast seen from a stochastic perspective
The average of the probability distribution is roughly 36 units, which shows as a spike in the pdf graph of the single-point forecast and a jump from zero to one in the cdf graph. When we apply this to the universal error metrics the integral of the SE degrades to a single value evaluation (A-μ) and the SPE becomes identical to the traditional Percentage Error (PE, aka relative bias). Similarly the integral of the SAE degrades to a single value evaluation (|A-μ|) and the SAPE becomes identical to the MAD-Mean ratio.
Applying to Confidence Levels
It is relatively straightforward to extend this to the case where a point forecast is given including confidence levels. Let's assume in this example a lower confidence level (LCL) is given at 5% (which evaluates to roughly 15 units with the given distribution) and an upper confidence level (UCL) at 95% (which evaluates to roughly 76 units with the given distribution. The probability density functions would look like this:
Traditional single period forecast including confidence levels seen from stochastic perspective.
With 0.05 probability small spikes occurs at the 5% and 95% percentiles each, leaving a 0.9 probability for the main forecast. Note that the point where the spike occurs is no longer equal to the mean μ, which is the mean for the entire distribution. But since the distribution - like the example - is not typically symmetrical the average once the two tails beyond the confidence levels have been accounted for will move. For the 5% and 95% confidence levels the new adjusted mean of the remaining range within the confidence bounds is given by:
This evaluates to roughly 44 units for the given distribution where the overall mean is 36 units.
The SE remains equal to the bias of the mean of the distribution: A - μ
Whilst the SAE reduces to: 0.05 * |A - LCL| + 0.9 * |A - μ*| + 0.05 * |A - UCL|
Regardless of whether one chooses to use the suggested stochastic metrics, this same approach can be used whenever forecast accuracy needs to be determined where confidence levels are provided.
Extensions
The innovations are in the definition of the Stochastic Error (SE), the Stochastic Absolute Error (SAE), and how the probability density function is applied to point-forecasts with and without confidence levels. Everything else is run of the mill. It is very easy to extend the approach outlined to mimic most of the more traditional forecast error metrics in case the suggested universal ones are not to your liking.
Since the SE and SAE are specific for one time period and one item. They can be used as a replacement for ME and MAE or MAD in any other metric you may typically use. For example one can mirror the MASE to create a more generic form of it that covers both deterministic and stochastic forecasts using the SAE in the numerator and using the traditional denominator. The MASE was introduced to avoid the issue with the MAD-Mean ratio that it was not ideal when data was heavily trended or other types of non-stationarity over long time horizons. Since the SAPE degrades to the MAD-Mean ratio for deterministic forecasts it will have this same limitation. In such cases you may want to use weighting coefficients in the SAPE that counter the long-term trend or use a stochastic MASE instead.
Also in this article I have demonstrated how to calculate the pdf for single-point forecasts, 3-point forecasts using confidence levels and the full version which assumes a closed form of the pdf is known. However, some stochastic forecasting systems may only provide numeric approximations of a pdf. In those cases the 3-point approach can be extended to whatever raster size the pdf may be provided in.
Conclusion
I have presented an approach to measuring forecast error which I believe to be novel. I have not been able to find anything similar in my quest. I have tested this only on a few small sets of time series.
I would be very interested to get criticism, comments, other feedback. Especially would love to hear from anyone who wants to test this approach on other data sets.
UPDATE: I placed an Excel file in a shared folder with some examples. You can find it here: https://www.dropbox.com/s/71s7r2dknsiumkl/M3C%20-%20SPE%20and%20SAPE%20example.xlsx?dl=0
Please see comments below for more info.
Partner and Managing Director at Macrologística
9 年Very interesting approach, I'll try to run some tests, I think that the most important thing here, is to have an open mind.
Senior Manager - Network Operations Hill's EMEA
9 年Stochastic forecasts sound very interesting, risk-management wise; however, a lot of business processes and systems would need to be changed in most environments in order to implement them, especially when it comes to replenishment and production planning. In replenishment you can use the pdf as a factor in calculating safety stock levels (in conjunction with bias metrics); but when it comes to production and raw materials procurement, stochastic forecast might be hard to utilize properly.
Stefan, Very interesting work, but don't be too hasty claiming originality. There is a vast literature on accuracy (bias and precision) out there that can be found in the International Journal of Forecasting, the Journal of Forecasting and other academically oriented journals in specific disciplines like economics, finance, etc. This is not to discourage you, but to keep validating your approach with real data. There is much less known about how accuracy measurement works in real-world cases, because practitioners may be constrained to report on them or less motivated to share results with other practitioners through posts like these.
senior Director Product Delivery at Pharming Group N.V.
9 年Good article Stefan. These kind of challenges keep your mind busy,.. We are however far from implementing something that advanced in our processes. But this may be very usefull to some of the tool providers.
Chief Technology Officer at AutoScheduler.AI
9 年Stefan - a very interesting idea and one that comes at a good time for me. I'm going to try this out on some of my data. Thanks for posting. Andrew