Forecasting in practice: Is forecast accuracy really the issue?
Key points
Why bother with forecast congruence?
Last December Kandrika Pritularga and I posted online a survey asking whether firms try to enforce some stability in their forecasts. Do they accept the outputs of forecasting models as is or do they attempt to reduce the variability of forecasts produced over different periods?
The results are summarized in the table below. First, we asked whether models are revised every period or not. There can be different motivations for not revising them at every period. One of them is to keep the forecasting models more stable. We found that about 28.5% of the respondents follow this practice, but keep in mind that we do not distinguish between the various reasons firms may do this. The second question was more targeted, asking whether forecasts are adjusted to be less `jittery’ over time, where only 19% of the respondents do not act in some way to make forecasts more stable.
Although the survey has many limitations, it validated what we have seen several times in practice. There is a school of thought that prefers forecasts that are less 'jittery' or 'nervous'. To avoid any confusion, this volatility does not refer to the forecast changing over different periods due to trend or seasonality, but rather between forecasts done at different periods. We introduce the term 'congruence' to express this. 'Stable' is a reserved term, meaning something different in statistical forecasting, and 'nervousness' or 'jitteriness' have been used in many different ways, so congruence it is! The figure below exemplifies what a less or more congruent forecast looks like. Observe that the more congruent forecasts are more similar over time. Another way to say this is: forecasts from different origins (dates) for the same period are more similar. More generally we would say that forecasts conditioned on different information (that depend on different information) are similar.
Why would someone opt for a more congruent forecast and not just pick the most accurate forecast? In brief, the most accurate forecast has a statistical objective that does not necessarily (I advocate a stronger term here: ever; but let us leave this discussion for another time!) match the use case of a forecast. Take the example of inventory management. The most accurate forecast will typically minimize some mean squared error type objective, as a proxy of correctly capturing the underlying structure. (There are good reasons for this statement to be correct, but it is not foolproof!) However, a firm has to contend with ordering costs, supplier constraints, batching of orders, bullwhip, etc. This may lead to preferring a different forecast that is more aligned with the supply chain reality of the firm. Measuring all that is in no way trivial, but it does not take too much imagination to see that the preferred forecast in monetary or supply chain objectives may not be the most accurate one.
Why is accuracy so important in forecasting then? It is easy to calculate, it gives a key performance indicator, and it is not entirely misplaced. Nonetheless, it is not the objective. We do not forecast for the sake of forecasting (well, some of us do, my research is in forecasting!) but to support decisions. However, the exact connection of forecast accuracy with the decision and whether there is a better alternative remains a question. This is the focus of this post.
What are the modelling implications of congruence? Imagine a model that is not capturing well the characteristics of the data. That forecast could be rather volatile across time, as it will not follow well the observations. A good example is the random walk, where it is assumed that there is no information in a time series and the forecast is dramatically different from each forecast origin (time period). That would be a very incongruent forecast. We can also have the opposite case, where a forecast ignores any new information. A constant forecast would be doing that, outputting always the same value no matter what data suggests. That would be an over-congruent forecast. A forecast that is too similar across periods and does a poor job of updating with new information.
We may be tempted to connect the congruency of forecasts with accuracy, as the argumentation so far has been around the ability of forecasts to correctly capture the structure of a time series. However, there is a fundamental difference between the two, otherwise, all these firms would have no motivation to think of the forecast congruence on top of selecting the best possible model. Even with limited statistical background, one would quickly conclude that the two objectives provide the same outcome. This is not the case though, so let us understand why.
Forecast congruence
Although I want to spare you the technicalities, we need some. (You can enjoy all the details in our working paper here.) Suppose we have a forecast over many periods and for multiple steps-ahead. We can measure its errors and organize them as follows (h is the forecast horizon, t is the time period, 1+t|t means a forecast targeting period 1+t from forecast origin t):
Skipping notation details, simply each column has forecast errors of different horizons, and each row of different time periods. Column-wise averages measure accuracy. Row-wise averages measure congruence. This “simplifies” to: congruence is the average of the variances of forecasts across different horizons calculated for different periods. Let us take this apart:
Why do we do this funny business with the horizons? Well, it turns out that the raw information contained in forecast accuracy and forecast congruence is the same, but its different use results in different metrics. That means we always do this funny business with the horizons when we measure multi-step ahead accuracy, but as we do this more intuitively, we do not pay much attention to it. Without going through the derivation this may appear a bit cryptic. For those of you who want to follow the equations, I suggest a read of the working paper, but the matrix with the errors above should be illustrative of why the raw information contained in congruence and accuracy is identical.
领英推荐
So, is accuracy and congruence the same then? No, not at all. I presented an earlier version of this in the International Symposium of Forecasting last year, and as we hadn’t worked out the theory fully, I was confused about that myself. (By the way, this is a great conference with good practitioner and academic tracks, and I highly recommend it!) Now, I can confidently say that these two are different. Let us understand why, and its implications.
Before we go into examples, there is another interesting property that pops out from the theory. Can we ever get the forecast error to zero? In principle, no, as there will always be some randomness in the time series, and at best we can hope we somehow found an excellent approximation of the underlying (and unknown) demand-generating process that gives us the minimum possible error. If we look at the accuracy in training data (or in-sample) we may go below that minimum error. In that case, we have an overfit forecast. This means that the forecast has over-explained the data and has tried to capture the random noise. As noise is random, this can have adverse effects on the performance of forecasts in the future. Similarly, we can have an underfit forecast, where the forecast omits key aspects of the data. You may have heard about the bias-variance decomposition, where modelling approaches that are aware of this aim to identify predictions that are risk-averse to both under and over-fitting, hoping to fit just fine.
What about congruence? Suppose we knew how demand is generated. We could take that model and calculate the ideal congruence. It is easy to imagine forecasts that are more, or less, congruent than that, as discussed before. It turns out that under-congruent forecasts are best avoided. Over-congruent forecasts are beneficial, as long as they do not become overfit. This is an important statement. A forecast that is over-congruent that fits well or underfits can be beneficial. A forecast that underfits and does not move congruence is probably not going to be great. An overfit forecast is never great. This gives us some ideas about how to select forecasts.
Why does congruence matter?
It is easier to illustrate this point with some simulations. Using sales data from a firm, we produce forecasts from different models and track their accuracy and congruence. We also track the so-called pinball loss, which is how accurately we forecast the quantile of the sales distribution that corresponds to the desired inventory service rate. Subsequently, we proceed to generate orders and track the inventory performance of these forecasts. When we track the inventory performance we look at:
An ideal forecast should result in the target service rate, not over or undershoot it, with minimal lost sales, and minimal stock-on-hand, as both carry some cost. Lower volatility in the various metrics makes for easier to manage inventory. The decisions that an analyst has to make are how much and when to order, which connect directly with the last two points. A low volatility of orders suggests we order more or less the same all the time. A low volatility of ordering frequency suggests we keep a nice cadence to our ordering. All these make our lives easier, as we have to take fewer actions, but more importantly, reduce the friction in the supply chain as a whole. If you are consistent in your orders your suppliers will really really like you! Some of you will recognize that what I am referring to is the bullwhip effect, where the volatility of sales increases as we move upstream in the supply chain, increasing costs and pain for all members.
What do we find? First, congruence is mildly correlated with accuracy and pinball (less than 0.5 in our results). This agrees with the theory that I glossed over here. However, the correlation is not zero. What does this mean in practice? Bad forecasts will show up as bad forecasts both in congruence and accuracy. Once you reach a threshold in performance you will need to sacrifice accuracy for congruence and vice-versa. That also settles that the most accurate forecast will not be the most congruent forecast. The most congruent forecast has different properties. Therefore, firms that aim for congruent forecasts over accuracy are onto something.
All metrics are only mildly connected with inventory performance as measured by the deviation to the target service rate, lost sales, and stock-on-hand. This is not a new finding and remains a pain-point in forecasting research. Many research papers strive for more accurate forecasts, but what this translates to in practice is less straightforward. I should stress here that this is not a critique of the literature. This remains a very difficult problem and many of my papers fall into this trap as well. Many colleagues have highlighted this and there are a lot of well-done papers that take the extra step to evaluate forecasts on accuracy and decisions as well. This work on congruence is an attempt to address part of this issue.
Accuracy and pinball remain mildly connected with the volatility of orders and the volatility of their frequency. Congruence is strongly connected. The more congruent a forecast is, the less volatile the orders are and the more canonical in their frequency. The figure below summarizes the results from one of the experiments in the working paper. The left-hand side refers to the volatility of orders, while the right-hand side to the volatility of ordering frequency. The vertical axis provides correlation (actually, the normalized coefficients of a regression, as we want to control for various experimental design choices) and the coefficient of determination that measures the percentage of variability in the data that is explained. Results are provided for three target service rates, 90%, 95%, and 99%, for two lead times of 3 and 5 periods ahead. Observe how accuracy (Mean Squared Error) and pinball demonstrate only a mild connection, while congruence exhibits double the strength of connection.
Let us take a step back: is this unexpected? No, firms have been onto this for a while. Research has highlighted several times that accuracy metrics are not the best performance indicator for firms. What this work offers is a metric to start measuring this objectively and understand the theoretical implications. Why do we need that? So, we can guide the forecast evaluation and selection to help firms improve their forecasting process.
I mentioned in passing that there is a connection with the bullwhip effect. It turns out that forecast congruence is a reformulation (with some additional perks!) of the bullwhip ratio. Those interested can explore the relevant section in the working paper. Choosing forecasts on congruence will lead to more bullwhip-friendly forecasts.
How to use congruence in practice?
It is important to avoid giving the wrong impression about the usefulness of congruence. It does not replace accuracy, it complements it. If you choose the forecast that maximizes congruence you will end up with a constant forecast, irrespective of what your data suggests. Do not do that! Accuracy will steer you away from this. Suppose you have two fairly accurate forecasts, which one to prefer? Choose the most congruent. Suppose you have two fairly congruent forecasts, which one to prefer? Choose the most accurate. This is simpler to operationalize than it may initially sound. It is in principle a multi-objective problem, but our experiments suggest that this is not a difficult one to resolve. The reason for this is that alternative good forecasts will quickly end up having very similar accuracies.
Another outcome of our analysis is that imposing congruence in an ad-hoc manner may not result in the desired effect. This is due to what congruence implies for a forecasting model. Congruence connects with the value of its parameters and how these interact, not with their estimation uncertainty. For example, retaining a fixed model to keep forecasts from changing all the time (due to switching models or re-estimating parameters) may not result in congruent forecasts. This may be due to these fixed forecasting models having unfavorable parameters. On the other hand, methods and models that are designed in ways that forecasts end up being congruent will perform very well without too much intervention from analysts. Examples of these are temporal hierarchy forecasts (I have talked about THieF and the Multiple Aggregation Prediction Algorithm, MAPA, in previous posts - another related article in Foresight, a great resource for forecasting practitioners) and models that use shrinkage in the estimation of their parameters. These are not the only approaches, but as I have worked extensively with both, I have a good working understanding of what they do and why, so I am happy to make this claim. Our empirical analysis validates this as well.
In conclusion, congruence is a new quantity to qualify the performance of your forecasts. It is very easy to calculate, and it strongly connects with the volatility of decisions that are supported by forecasts. Here this is discussed in the context of inventory management, but it is easy to draw similar examples from other applications. Congruence compliments accuracy. Choose the forecast that is the most congruent without sacrificing accuracy. Finally, the important disclaimer: this is ongoing research. The paper is not published yet, which means that the final version may differ quite a bit from the working paper. Have trust in the review process to improve papers and ideas! (even if sometimes it can be quite a journey!)
Using data to improve Operations | Digital @ The LEGO Group
6 个月Nanna Burmeister
FWO Postdoc Researcher at KU Leuven
9 个月Nice work Kandrika Pritularga and Nikos Kourentzes! ?? While I very much like the theoretical analysis presented, as well as the effort put into evaluating the impact on downstream decision making, forecast congruence is not a new forecast quality dimension/concept, but rather a different name for rolling origin forecast instability (vs. model selection forecast instability) as we term it in our IJF and Foresight papers (see links below) or just forecast stability as it is referred to by Stefan de Kok (see link below). We not only quantify rolling origin forecast instability, but also propose a methodology to account for it in the optimization phase of global neural forecasting models. Our premise is also that rolling origin forecast (in)stability does not replace accuracy, but complements it. This is operationalized by focusing on improving forecast stability while maintaining forecast accuracy. We show that our methodology results in more stable forecasts without causing a loss in forecast accuracy. Moreover, our results (often) show improvements in both forecast stability and forecast accuracy, hence, our methodology can also be used as a time-series-specific regularization mechanism.
President, I2S
9 个月Forecasting models should neither exaggerate nor dampen the true expected volatility of demands or supplies. Models which are overly sensitive to small changes in a few key parameters are typically ignoring (or understating) the elasticity of the business architecture, i.e., the ability of the business to absorb and buffer small waves of change with minimal disruption. Models which are designed (or configured) to create an appearance of stability -- to accommodate the nervousness of stakeholders or the analytic capacity of those who must explain change -- can lead a business to be blindsided and inflexible. The ability to inquire "How sure are you?" is essential, as is the transparency of the answer. The most important accuracy of all is in the articulation of uncertainty, whether it concerns the integrity of information gathered, the robustness of the model itself, or the degree of confidence in the human interpretation of the model results. "Trustworthy" is a relational term, and should apply equally to human and digital systems, as well as the relationships between them.
Chief Investment Officer and Managing Partner - Europa Capital
9 个月It would be helpful if it wasn’t?
Catalysing the journey to data driven cultures by design using Human Centric Analytics. Proud to be dyspraxic/dyslexic and tail end menopausal! I heat the sea!
9 个月Removing high variability is the job of production control and customer management. The forecast is the messenger but not the problem. I look forward to reading this.