Forecasters: Why More Data is Not Always Better
Varun Gupta
Supply Chain, Logistics, Nearshoring and Analytics SME | Professor | Ph.D., MBA, and MS (University of Texas, Dallas) | BTech, IIT Kanpur | Logistics Lab Coordinator
Many #demand #forecasters, especially in the #retail sector, subscribe to the notion that the availability of more historical data always serves as the foundation for accurate predictions. While data plays a critical role, my experience in #datascience, #forecasting, and #supply #chain #management research building pricing and demand models over the years reveals the limitations of this more data is better mindset. Let's unpack how it can cloud judgment and negatively impact effective decision-making.
The "More Data is Always Better" Trap
Let's look at an example. Say you're a footwear retailer forecasting demand for a trendy sneaker:
If you feed all 3 years of data into your model, it will give weight to those stockout periods and underestimate the impact of recent marketing, stockouts, and competition. You'll likely under-order, miss sales opportunities, and potentially create dissatisfied customers. Recent factors such as targeted marketing campaigns or supply chain disruptions can have significant impacts on demand patterns. Blindly incorporating all historical data into your model would distort predictions, leading to potential stockouts or excess inventory. ??
Issues with just using Sample Data:
Strategies for Enhanced Forecasting
Let's explore a data-driven approach that incorporates a nuanced understanding of context:
Data Segmentation & Cleaning: Dissect sales history into distinct periods reflecting major variations in demand drivers. Eliminate anomalies or periods heavily impacted by non-recurring events.
Time-Aware Modeling: Consider techniques that adjust to demand trends and seasonality:
import pandas as pd
from sklearn.linear_model import LinearRegression
# Fictitious 'Trendy Kicks' Sneaker Data (monthly)
df = pd.DataFrame({
'date': pd.date_range('2021-01-01', periods=36, freq='M'),
'sales': [120, 115, 130, 125, ..., 750, 820, 840]
})
# Weight recent 12 months more heavily
weights = np.concatenate([np.ones(12), np.linspace(0.8, 1, 24)])
model = LinearRegression()
model.fit(df[['date']], df['sales'], sample_weight=weights)
# Forecast for the next few months...
领英推荐
from statsmodels.tsa.arima.model import ARIMA
# Prepare data, ensuring time series format
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
model = ARIMA(df['sales'], order=(2, 1, 1)) # Example order
model_fit = model.fit()
# Forecast and calculate accuracy
forecast = model_fit.forecast(steps=6)
print(forecast)
Incorporating External Signals Move beyond internal sales figures. Integrate factors like:
- Market Intelligence
- Competitor activities
- Social media sentiment analysis
Rigorous Evaluation Quantify forecast accuracy using measures like Mean Absolute Error (MAE) and Mean Squared Error (MSE). This facilitates the comparison of model alternatives.
from sklearn.metrics import mean_absolute_error, mean_squared_error
# Assume 'y_true' (actual sales) and 'y_pred' (predictions) exist
mae = mean_absolute_error(y_true, y_pred)
mse = mean_squared_error(y_true, y_pred)
print("Mean Absolute Error (MAE):", mae)
print("Mean Squared Error (MSE):", mse)
Key Takeaways for Forecasters
Conclusion
In the fast-paced retail environment, demand forecasting is both an art and a science. By moving beyond simple reliance on historical data and embracing a combination of data insights, model refinement, and contextual awareness, forecasters can equip their organizations with the foresight to make profitable, customer-centric supply chain decisions.
How to reach out?
Always happy to discuss your forecasting accuracy needs. Reach out to me via LinkedIn.
#demandforecasting #datascience #supplychainmanagement #retailanalytics
Key Account Manager | DSV
1 年Enjoyed your thoughts, Varun. I'm reminded of when Hanjin declared bankruptcy in 2017, but they're publicly traded so the financials were available to all. I remember articles about some of their boats outside of port at anchor being refused a terminal because of the liability. We were very tuned in with the carriers, saw the signs, and made the call to pull back on bookings , ultimately avoiding that cliff hanger for us and our customers. My point is.. it's a daunting thing for any one person, one team, or one company; but know that some of us are helping behind the scenes. Success in this industry is a TEAM effort made up of PEOPLE. ??
Supply Chain, Logistics, Nearshoring and Analytics SME | Professor | Ph.D., MBA, and MS (University of Texas, Dallas) | BTech, IIT Kanpur | Logistics Lab Coordinator
1 年Thank you, Lee Blackstone. I have truly enjoyed your informative Thursday AI webinars which are now in Season 2.
From Analytics to AI - Guiding Companies to Responsible Innovation Grounded in Ethics and Data Science - Keynote Speaker
1 年As Always Varun, you are spot on!