How Data Quality Issues Impact the Performance of Seasonal Forecasts in Demand Planning

How Data Quality Issues Impact the Performance of Seasonal Forecasts in Demand Planning

Many demand planners create forecasts for the S&OP process. Many of these demand planners and managers also tend to treat?bad data?as a nuisance factor rather than an opportunity for improving forecasting performance. A typical planning forecast for inventory, budgeting, sales and logistics applications has a recognizable?forecast profile, like a trending or trend/seasonal pattern for weekly, monthly and quarterly data.?I will now demonstrate how Data Quality evaluations impact the performance of seasonal forecasts in demand planning.

The “Hauls” data are from the transportation industry.?The problem is to investigate the need for data quality adjustments prior to creating useful forecasting models. Does it matter if a few data points look unusual? For trend/seasonal data, how does it impact seasonal factors, monthly bias and overall accuracy? We would like to get an assessment of the impact of an outlier or two, and how we should deal with evaluating forecasting performance.?

No alt text provided for this image
Pie chart of Hauls data showing Irregular (55%), Seasonality ( 44%) and Trend (1%).

Exploratory Data Analysis. Based on three years (36 monthly values), the total variation is first decomposed into trend, seasonal and other with the STI_Classification procedure (using the Excel DataAnalysis Add-in ANOVA Without Replication).?

The Pie chart identifies Seasonality (44%), Trend (1%) and Other or Irregular (55%) in the STI_Class IST#3. Because of the dominant Irregular component, Trend/Seasonal models are expected to point to a significant amount of uncertainty in the prediction limits. After an outlier adjustment (shown below), the total variation breaks down to Seasonality (47%), Trend (1%) and Other (52%). This suggests that 3% of Other can be identified as seasonal variation. But how is the accuracy of the forecasts impacted by this outlier correction?

An Exploratory Forecast. I first model the unadjusted data with a trend/seasonal exponential smoothing model and test for within sample outliers. The preliminary model selected is a State Space Model (M, N, M); Multiplicative Error; No Trend, Multiplicative Seasonality consistent with an IST#3 classification. With the multiplicative error assumption, the forecast profile will have prediction limits that are not necessarily symmetrical.

The spreadsheet below shows the resulting model residuals in Column 1. The Descriptive Statistics Add-in (Source: Excel > DataAnalysis) are used to determine the bounds for outlier detection in the residuals.

No alt text provided for this image
Outlier calculation with traditional and conconventional methods.

Outlier Detection and Correction. For outlier detection, the traditional method is Mean +/- 3 * St.Dev, based on conventional normal distribution assumptions. ?The traditional method is not outlier resistant as both the Mean and Standard Deviation are distorted when the residuals contain one or more outliers. An outlier-resistant detection method is based on the calculation: 75th percentile + 1.5 (Interquartile Range); 25th Percentile – 1.5 (Interquartile Range). (see Change & Chance Embraced, Chapter 2, and Four P's in a Pod, Chapter 17 for the details).

No alt text provided for this image
Outlier replacement for October in Year 2: Actual = 5265, adjuster value =4398

We observe that

·???????Both outlier detection methods give the same conclusion (in this case): Outlier in residuals = 867.

·???????The outlier can be corrected (conservatively) by replacing the actual (= 5265) with the Upper Limit or more commonly with the fitted model value for that period (= 4397.62).

·???????Before making adjustments, outliers should always be reviewed for reasonableness with a domain expert familiar with the historical data and forecasted item.

It is essential, however, to take data quality steps before modeling as outlier effects can be confounded with the main effects, like seasonal factors, special events and promotions.

·???????Choose a hold-out sample of twelve months and create a 12-period ahead forecast for evaluation. The outlier (October of the second year) could have a significant impact on exponential smoothing forecasts because it occurs just before the 12-month hold-out period. The first period in the data is a December.

·???????By choosing the fitted value as outlier replacement, rerun the trend/seasonal exponential smoothing model with the outlier-adjusted data. The Haul data are shown in the third column in the spreadsheet above.

No alt text provided for this image

I display a useful forecast with prediction limits as the predictive visualization of uncertainty. The trend/seasonal exponential smoothing model is a State Space Model (A, N, M): Additive error; No trend, Multiplicative seasonality. The Additive error means that the bounds for the prediction limits are symmetrical around the forecasts.

No alt text provided for this image
(left) Predictive visualization of outlier-adjusted hauls data with 12-month hold-out period. (right) Predictive visualization of a preliminary forecast with outlier-adjusted hauls data

?A Forecast Evaluation. The left chart shows a predictive visualization of an exploratory forecast with Outlier-Adjusted Hauls Data (A, Ad, M); Additive error Additive damped trend, Multiplicative seasonality. This means that the bounds for the prediction limits are symmetrical around the forecasts. The twelve actuals in the hold-out sample lie within the error bounds (not shown)

Model Evaluation. The right chart shows Bias and Accuracy measures for forecasts in the hold-out Sample with and without outlier-adjusted haul data. The Mean Error (ME) and Mean Absolute Percentage Error (MAPE) are not outlier resistant, unlike their median counterparts. The ME is severely distorted by the October outlier, while the Median Error (MdE) gives a more reliable result for the bias. The effect of an outlier also appears to be greater with the MAPE than the MdAPE.

·???????In practice, both measures should always be calculated and compared. If the MAPE and MdAPE appear to be quite different (from an “impact” evaluation perspective), then investigate the underlying numbers for stray values. If comparable, then report the MAPE as this measure is typically better known. Here a MAPE of 6% would be a credible measure of performance based on the comparison in this one case.

·???????The table below (left) bias and accuracy measures shows a comparison of seasonal factors from the analysis with and without outlier adjustment. The impact is greatest for October, but also shows some effect in December/January.

No alt text provided for this image


We come up with a forecast evaluation procedure as follows:

??????Calculate the ME, MPE, MAPE, MAE and the RMSE.?Contrast these measures of accuracy with resistant measures MdE, MdPE, MdAPE, and MdAE.

·???????Create a Naive_12 (12-period-ahead) forecast for a 12-month holdout sample, by starting with January forecast = January (previous year) actual, February forecast = February (previous year) actual, and so on.?Construct a relative error measure, like RAE, comparing the forecasts with the Naive _12 forecasts.?Verify that you do a better job with a model than a Naive _12 method?

·???????On a month-to-month basis these accuracy measures might not be meaningful to management (perceived to be too large, perhaps), so you can suggest looking at the results on a quarterly basis.?Aggregate the original data and forecasts into quarterly periods (or buckets).?Repeat evaluation steps 1) and 2) using a Naive _4 as the benchmark forecast to beat.

Takeaways

·??????Before making adjustments, outliers should always be reviewed for reasonableness with a domain expert familiar with the historical data and forecasted item. It is essential, however, to take data quality steps before modeling as outlier effects can be confounded with the main effects, like seasonal factors, special events and promotions.

·???????Choose a hold-out sample and create a lead-time forecast profiles for evaluation. The outlier (October of the second year) could have a significant impact on a forecasting method when it occurs just prior to the forecast origin.

·???????By choosing the fitted value as outlier replacement, rerun the forecasting model with the outlier-adjusted data.

No alt text provided for this image

These three books, all available online (scan here) can enhance your learning experience and improve best practices (and avoid worst practices). The Four P's in a Pod contains practical examples that can be easily replicated with your own data using Excel, R or Python. No proprietary software is required. The Change & Chance book is a comprehensive account of a modern data-oriented demand forecasting function, which can be used alongside the Smarter Forecasting and Planning Workshop manual for professional development.

No alt text provided for this image

要查看或添加评论,请登录

社区洞察

其他会员也浏览了