Best Practices for Demand Planning: Forecasting Models Review
Nicolas Vandeput
I reduce Forecast Error by 30% and Inventory by 20% | Join a community of more than 8000+ professionals who are achieving demand and supply planning excellence | Link in bio ??
The article below is a summary of one of my LinkedIn posts. If you are interested in such debates, let’s connect! I would like to thanks the following people for their insightful remarks in the original discussion: Timothy Brennan, Chris Davies, Valery Manokhin, Leonardo Cabrera, Charlie Kantz, Karl-Eric Devaux, Spyros Makridakis, and Dyci Manns Sfregola.
Supply chain demand planners often ask themselves: What is the best model to forecast my demand? What are the best practices I should follow to improve my forecast?
It is impossible to give a definitive, absolute answer to those questions. There no silver bullet model that would be the best for every single company for every single product. But I will try in two articles (this is the first one, you can find the second one about forecasting process here) to review most of the existing forecasting models and show you tips, tricks, and best practices for tweaking and selecting them. This first article will review forecasting models and see how they perform to forecast supply chain demand. Forecasting demand is nothing like forecasting electricity consumption, airplane passengers, or online connections. Supply chain demand datasets are often short — 5 years of history is already a lot — and especially volatile. It means that not all forecasting models will be appropriated.
For each model, I will shortly explain how they work (see ??), their pros and cons (?/?), and advise further reading material (either books?? or academic papers??) with a link to the document, if available.
Source: Data Science for Supply Chain Forecasting 2nd Edition. You can pre-order it here.
?? Statistical Methods
Exponential Smoothing
In supply chain demand forecasting, exponential smoothings (ETS) are kings. In my experience, most current supply chain forecasting tools rely on some variation of exponential smoothing to forecast demand.
??Those models forecast demand components (level, trend, and seasonality) by updating them slightly after each demand observation.
?Pro: easy to understand, implement, and interpret. Flexibility with additive and multiplicative seasonalities.
?Con: Difficult to add external features. Not able to forecast new products.
Exponential smoothing models are often called "Holt-Winters,'' based on the names of the researchers who proposed them. An early form of exponential smoothing forecasting was initially proposed by R.G. Brown in 1956. His equations were refined in 1957 by Charles C. Holt, a US engineer from MIT and the University of Chicago. The exponential smoothing models were again improved three years later by Peter Winters. Their two names were remembered and given to the different exponential smoothing techniques that we sometimes call "Holt-Winters.''
This is the model that we use at SKU Science (an online platform for demand planning—free to try), and that I generally advise for anyone starting with demand planning.
?? Brown, R. (1956). Exponential smoothing for predicting demand
?? Holt, C. C. (1957). Forecasting seasonals and trends by exponentially weighted moving averages.
?? Winters, P. R. (1960). Forecasting sales by exponentially weighted moving averages
?? Rob J Hyndman and George Athanasopoulos Forecasting: Principles and Practice https://otexts.com/fpp2/
?? Nicolas Vandeput Data Science for Supply Chain Forecast
(S)ARIMA(X)
ARIMA models are often used by academics and forecasters to forecast time series with plenty of historical data. In my experience, I never saw ARIMA giving accurate results (within a reasonable computation time) for supply chain demand datasets.
??ARIMA models use multiple linear regressions of historical values and errors to predict the future. SARIMA is the seasonal version of ARIMA, and ARIMAX can use external features to make predictions.
?Pro: Can use external variables.
?Con: Long history needed, computation time is usually (very) long. Not able to forecast new products.
?? Rob J Hyndman and George Athanasopoulos Forecasting: Principles and Practice https://otexts.com/fpp2/
Croston
Croston models (and its later variations: SBA and TSB) were created to forecast intermittent demand. I wrote an article about Croston models (available here), and discuss how to deal with intermittent demand in another article. In short, as they do not add much (if any) accuracy compared to simple exponential smoothing, I would not recommend using Croston models to forecast intermittent demand.
??Croston models are close to exponential smoothing. They learn both the probability of having some demand and the demand level, if any.
?Pro: Interesting concept of demand probability and level.
?Con: Does not deliver good accuracy compared to simpler models (such as simple exponential smoothing). Not able to forecast new products. Not able to take external features into account.
?? J. D. Croston Forecasting and Stock Control for Intermittent Demands
?? Ruud H. Teuntera, Aris A. Syntetos, M. Zied Babai Intermittent demand: Linking forecasting to inventory obsolescence
?? Nicolas Vandeput Forecasting Intermittent Demand with the Croston Model https://towardsdatascience.com/croston-forecast-model-for-intermittent-demand-360287a17f5f
MAPA
In 2014, Nikolaos Kourentzes proposed a new forecasting technique:
Multiple Aggregation Prediction Algorithm (MAPA).
??His idea can be summarized as:
- Aggregate the time series with various temporal hierarchies (monthly, quarterly, half-year, etc.)
- Generate a forecast for each temporal aggregation
- Use a disaggregation technique to transform all those high-level temporal forecasts into a single unified temporality (weekly or monthly).
?Pro: usually more accurate than exponential smoothing, especially for intermittent products.
?Con: difficult to interpret, longer running time, challenging (and time-intensive) optimization, damped seasonality, usually (slightly) negatively biased. Not able to forecast new products. Not able to take external features into account.
?? Nikolaos Kourentzes Improving your forecasts using multiple temporal aggregation
?? Nikolaos Kourentzes, Fotios Petropoulos, Juan Traperob, 2014, Improving forecasting by estimating time series structural components across multiple frequencies
??? Machine Learning
At the core of many machine learning models lie decision trees.
??Decision trees make predictions based on input features by asking consecutive yes/no questions. I explain their basic working in my article Machine Learning for Supply Chain Forecasting (if you are not familiar with decision trees, I advise you to read this article first before going further).
Many models were then created based on ensembles of trees. Ensemble models are making a prediction using multiple sub-models.
Ensemble #1: Bagging Models
The most famous ensemble model is the forest.
??A forest creates a hundred(s) trees and averages their respective predictions (this technique is called bagging). The trick is to populate different and accurate trees. Generating different trees is done by restricting each decision node’s available features, randomly shuffling the training data set of each tree, and—for the extremely random tree model—only allowing random feature splits.
?Pro: straightforward method.
?Con: not so accurate.
?? Tin Kam Ho, 1995, Random decision forests
?? Pierre Geurts, Damien Ernst, Louis Wehenkel, 2006, Extremely randomized trees
?? Nicolas Vandeput Data Science for Supply Chain Forecast
Ensemble #2: Boosting Models
??Boosting models do not average multiple sub-models. Instead, they train sub-model one after another based on the overall model mistake so that each new sub-model (usually a tree) is specialized in correcting the current errors.
Those boosting models are currently considered best in class and used by many data scientists.
?Pro: (very) good accuracy.
?Con: more challenging to optimize than bagging methods. Needs lots of data.
They were initially created in the late 1990s with AdaBoost, then refined with Gradient Boost (2001), Extreme Gradient Boosting (2016), and Light Gradient Boosting (2017).
?? Yoav Freund, Robert E Schapire, 1997, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting
?? Jerome H. Friedman, 2001, Greedy Function Approximation: A Gradient Boosting Machine https://statweb.stanford.edu/~jhf/ftp/trebst.pdf
?? Tianqi Chen, Carlos Guestrin, 2016, XGBoost: A Scalable Tree Boosting System https://www.kdd.org/kdd2016/papers/files/rfp0697-chenAemb.pdf
?? Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, Tie-Yan Liu, 2017, LightGBM: A Highly Efficient Gradient Boosting Decision Tree https://papers.nips.cc/paper/6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree.pdf
?? Nicolas Vandeput Data Science for Supply Chain Forecast
Deep Learning
Since the mid-2010s — thanks to new optimization models and more computation power — we saw the rise of deep learning models. The first artificial neurons were imagined in the 1940s, and the first neural network was physically implemented in the late 1950s.
??Neural networks are created by stacking multiple layers of artificial neurons. Neurons are simple units: they sum up inputs (usually the outputs of previous neurons), process the sum in an activation function, and outputs the result.
?Pro: (very) good accuracy.
?Con: Need lots of data.
Source: Data Science for Supply Chain Forecasting 2nd Edition. You can pre-order it here.
Source: Data Science for Supply Chain Forecasting 2nd Edition. You can pre-order it here.
Different kinds of neural networks were developed. Notably, Long-Short Term Memory networks (LSTM) specialized in natural language processing (NLP), and Convolutional neural network (CNN) specialized in image recognition.
LSTM for demand forecasting? The international M4 forecasting competition was won by a hybrid model using LSTM and exponential smoothing. But the M4 dataset was not your usual supply chain demand dataset: it contained long, stable time series. In her master thesis, my student Lynda Dhaeyer, shown that LSTM couldn’t beat ‘simple’ feed-forward neural networks on an actual supply chain demand dataset.
?? Nicolas Vandeput Data Science for Supply Chain Forecasting 2nd edition.
About the author
Nicolas Vandeput is a supply chain data scientist specialized in demand forecasting and inventory optimization. He founded his consultancy company SupChains in 2016 and co-founded SKU Science—a fast, simple, and affordable demand forecasting platform—in 2018. He enjoys discussing new quantitative models and how to apply them to business reality. Passionate about education, Nicolas is both an avid learner and enjoys teaching at universities: he has taught forecasting and inventory optimization to master students since 2014 in Brussels, Belgium. He published Data Science for Supply Chain Forecasting in 2018 and Inventory Optimization: Models and Simulations in 2020.
Supply Chain Practitioner | IBP Implementation | S&OP Leadership | Operational Excellence | Cost Optimization | Logistics Transformation (TMS)
8 个月Hi Nicolas Vandeput what do you think about random forest? pros and cons
Nicolas, I notice you have not recognized the important distinction between Method and Model in Demand Forecasting and Planning. Your early examples of exponential smoothing (Brown, Holt, H-W) are Methods, not Models.. Subsequent development primarily due to Gardner and McKenzie (see IJF State of the Art papers), make the important distinction by introducing uncertainty assumption in terms of an error distribution. In the 90s this was further embedded in a comprehensive MODELING environment called State Space Forecasting. It is all still 'exponential smoothing' as a concept. Accuracy is a separate matter as it depends on the measure you select.
Director of Supply Chain and Optimization at One Mount Group
4 年Thanks for posting. Well written
Industrial Engineer | Certified Supply Chain Professional | Data Analyst | Manufacturing Estimator | Procurement Specialist
4 年Interesting mix of forecasting instruments! A mix of those would be the best option, since we will never get a 100%.
Planning Supervisor, at SMTC
4 年Love this