Forecasting Through the Ages : From Naive to Zero-Shot GEN AI Magic
Image generated using Microsoft Designer

Forecasting Through the Ages : From Naive to Zero-Shot GEN AI Magic

Authors: Monica Ravipudi & Nirmal Venkatachalam

Forecasting especially in the context of supply chain management across large datasets is a chaotic dance balancing interpretability, scalability & computational efficiency. At the outset, we juggled complex, manually built models alongside clunky processes, hoping to predict demand with some semblance of accuracy. It was a constant struggle, a far cry from the streamlined world we dream of. But the good news is, the forecasting landscape has undergone a dramatic transformation. This article chronicles the evolution, exploring how we've moved from the complex modeling processes to a future where forecasting can be a matter of simple steps guided by powerful models that are simpler to manage, uses lesser compute & provides more time to the teams to focus on business problems as opposed to learning the intricate details of the model. We'll delve into the limitations of the past, rise of deep learning solutions, and finally, try & provide an overview of the concept of zero-shot forecasting - a future where a single model can handle thousands of predictions without ever needing specific training offering a sneak peak into the pretrained models from Salesforce (Moirai) & Amazon (Chronos) released last week.

In the heart of any smooth-running supply chain lies the ability to plan for the future that translates to accurately forecasting demand across a multitude of products/SKUs across customers. It's a delicate balancing act – underestimating demand leads to stockouts and frustrated customers, while overestimating ties up valuable resources in excess inventory. This complexity combined with factoring in seasonal trends, promotions, and ever-changing customer preferences is what makes multiple time series forecasting such a crucial weapon in our supply chain arsenal.

Struggles with traditional forecasting methods:

Classic techniques like exponential smoothing and ARIMA required us to check for conditions like stationarity in the data –meaning the data has a stable mean and variance over time. This added an extra layer of complexity, and things got even trickier when data points were missing. These models couldn't handle these gaps, making real-world application challenging.

Models like FBProphet brought some relief. It could gracefully handle missing values and didn't require tedious scaling of the data. However, a new hurdle emerged when dealing with the vast number of products (Stock Keeping Units or SKUs) we manage in today's supply chains. Imagine forecasting for thousands of SKUs – each requiring its own model!

Parallel processing with tools like Spark & multiprocessing (within a CPU) became necessary, but this introduced complexities in managing training pipelines, exception handling, and the sheer volume of hyperparameters (tuning knobs) for each model. Don't get me wrong, these traditional models are still powerful tools, especially for smaller datasets. But in the intricate world of multi-series forecasting with thousands of products, managing them all becomes a logistical nightmare. This paved the way for the need for a new approach, a simpler solution – enter deep learning models with some loss of explainability.

A Step Forward - Deep Learning Simplifies, But not without costs. The emergence of deep learning models offered a ray of hope. RNN (Recurrent Neural networks) based models like LSTM & DeepAR brought a wave of simplification & increased accuracies in the large dataset context.

Here's the game changer: a single deep learning model could be trained to handle forecasts for thousands of SKUs simultaneously. This eliminated the cumbersome process of managing individual models for each product, saving significant time and resources.

However, this newfound efficiency came with its own set of challenges. Training these deep learning models is computationally expensive, often requiring specialized hardware like Graphics Processing Units (GPUs) to handle the heavy lifting. Furthermore, achieving optimal performance involved? hyperparameter tuning adjusting various settings within the model, such as learning rates and network architecture parameters, to find the configuration that delivers the most accurate forecasts from back testing. While deep learning offered a significant leap forward in managing multi-series forecasting, it wasn't a perfect solution. The resource intensity and need for specialized tuning expertise (even with cloud providers simplifying through pre built container images) limited its adoption for some organizations.

Exogenous Variables: The Wrench in the Forecasting Machine

So far, we've discussed forecasting based solely on historical data for each product. But the real world is rarely that simple. External factors, also known as exogenous variables, can significantly impact demand. Think about a sudden weather change affecting clothing sales, or a major sporting event boosting demand.

Incorporating these external factors into forecasts has always been a challenge, regardless of the forecasting method used. Traditional methods like exponential smoothing and ARIMA struggle to account for these external influences (VARMA family of models were born). Deep learning models, while more flexible, can also struggle if the external factors are not explicitly included in the training data. This adds another layer of complexity, requiring data scientists to figure out how to integrate them effectively into the models.

Enter Zero-Shot Forecasting :

The limitations of traditional and deep learning methods pave the way for the exciting world of zero-shot forecasting where Gen AI intersects with forecasting applications. This approach promises to dramatically simplify the forecasting process, eliminating the need for individual model training for each SKU. It would be a game-changer for managing multi-series forecasts in today's complex supply chains.

Here are the functionalities of Salesforce's Moirai model that was recently released and it tackles the challenges of multi-series zero shot forecasting with four key innovations:


  1. One model/architecture for any frequency: Traditional models needs training on the specific dataset and would work only on the dataset's frequency (e.g., daily vs. hourly sales). Moirai employs a clever technique called patch size projection layers. These layers allow the model to analyze data at different granularities within the same architecture, ensuring it can handle the diverse frequencies present in real-world forecasting problems.
  2. Accept any number of exogenous variables: Any-variate Attention Mechanism enables the model to accept an arbitrary number of variates as input, whether it's historical sales data, weather patterns, or upcoming holidays.
  3. Provides probabilistic predictions through Mixture Distribution: Predicting a single point estimate for future demand can be limiting. Moirai addresses this by employing a mixture distribution. This allows the model to capture the inherent uncertainty in forecasting by generating a range of probable future values.
  4. LOTSA (Large-scale Open Time Series Archive) – Moirai's massive and diverse dataset. This rich training ground exposes the model to a wide range of time series patterns, frequencies, and variabilities, allowing it to generalize hopefully to new, unseen data.


Amazon's Chronos is another strong contender in zero-shot forecasting. It leverages transformers for powerful time series analysis, offers probabilistic predictions, and excels at handling large-scale datasets – making it ideal for organizations with vast amounts of data.

Sample Code Implementation of Salesforce's MOIRAI model:

The Sales data of a Greek dairy production company is used for this demonstration. The dataset has daily sales data of 7 products over a span of 3 years. The following is a basic code snippet demonstrating Moirai usage.

We start by declaring the model settings. Moirai model can be trained in 3 sizes – small/base/large with 14m/91m/311m parameters! Moirai tackles the diverse frequencies challenge with multiple patch size projection layers, allowing a single model to capture temporal patterns across various frequencies. Here's the link to the github repo: https://github.com/nirmal-venkat/zero-shot-forecasting

SIZE = "small"  # model size: choose from {'small', 'base', 'large'}
PDT = 20  # prediction length: any positive integer
CTX = 200  # context length: any positive integer
PSZ = "auto"  # patch size: choose from {"auto", 8, 16, 32, 64, 128}
BSZ = 32  # batch size: any positive integer
TEST = 100  # test set length: any positive integer        

We load the data from a pandas dataframe into a GluonTS dataset & create a train, test set. We prepare the pre-trained model by downloading weights from huggingface.

 # Convert into GluonTS dataset
ds = PandasDataset.from_long_dataframe(df, target="daily_unit_sales", item_id="Product")

# Split into train/test set
train, test_template = split(
    ds, offset=-TEST
)  # assign last TEST time steps as test set

# Construct rolling window evaluation
test_data = test_template.generate_instances(
    prediction_length=PDT,  # number of time steps for each prediction
    windows=TEST // PDT,  # number of windows in rolling window evaluation
    distance=PDT,  # number of time steps between each window - distance=PDT for non-overlapping windows
)

# Prepare pre-trained model by downloading model weights from huggingface
model = MoiraiForecast.load_from_checkpoint(
    checkpoint_path=hf_hub_download(
        repo_id=f"Salesforce/moirai-1.0-R-{SIZE}", filename="model.ckpt"
    ),
    prediction_length=PDT,
    context_length=CTX,
    patch_size=PSZ,
    num_samples=100,
    target_dim=1,
    feat_dynamic_real_dim=ds.num_feat_dynamic_real,
    past_feat_dynamic_real_dim=ds.num_past_feat_dynamic_real,
    map_location="cuda:0" if torch.cuda.is_available() else "cpu",
)

predictor = model.create_predictor(batch_size=BSZ)
forecasts = predictor.predict(test_data.input)

input_it = iter(test_data.input)
label_it = iter(test_data.label)
forecast_it = iter(forecasts)        

Here are the visualizations of zero-shot forecasts from Moirai on the dairy sales dataset. This depicts that MOIRAI is able to accurately capture the seasonal trend patterns.

Figure 1: Visualizations of zero-shot forecasts from MOIRAI on Dairy Sales Dataset

Conclusion: The Dawn of Usable Generative AI for Structured Data:

There is still a lot of things to be perfected & built but these developments over the past few weeks have opened up new avenues & exciting possibilities. Gen AI comes to structured data( which is where 90% of data is in enterprises)in usable forms moving beyond unstructured data (text & images) offering a glimpse of the possibility of?complex forecasting tasks without the need for constant, individual training & tuning.

We expect to see domain specific models that would be better at predicting than a one model fits all paradigm similar to what we have been seeing happen with LLMs - smaller LLMs tuned for a particular task outperforming the large ones. This would further be accentuated by tuning with proprietary data at the enterprise making it better at understanding the patterns in house. The future holds even more promise with advancements like Google AI's AutoBNN, which strives to bridge the gap between interpretability of traditional models and the scalability/flexibility of neural networks.

Disclaimer: The views expressed in this article are solely those of the author and do not necessarily reflect the opinions or policies of the organization with which the author is affiliated.

References:


  • https://blog.salesforceairesearch.com/moirai/
  • Gerald Woo, Chenghao Liu, Akshat Kumar, Caiming Xiong, Silvio Savarese, and Doyen Sahoo. Unified training of universal time series forecasting transformers. arXiv preprint arXiv:2402.02592, 2024.
  • I. Siniosoglou, K. Xouveroudis, V. Argyriou, T. Lagkas, S. K. Goudos, K. E. Psannis and P. Sarigiannidis, "Evaluating the Effect of Volatile Federated Timeseries on Modern DNNs: Attention over Long/Short Memory," in the 12th International Conference on Circuits and Systems Technologies (MOCAST 2023), April 2023, Accepted

要查看或添加评论,请登录

Monica Ravipudi的更多文章

社区洞察

其他会员也浏览了