登录查看更多内容

Forecasting Stock Prices and Realized Volatility: A Hybrid Approach Using LSTM, SARIMAX, and Topological Data Analysis

Larry liang

Business Intelligence Engineer@ Costco Wholesale | Data Visualization Expert

发布日期: 2024年10月29日

+ 关注

Author: Larry Liang

Date: 2024/10/29

Abstract

This paper presents a hybrid approach to stock price and volatility forecasting, integrating machine learning models, traditional statistical techniques, and Topological Data Analysis (TDA). Specifically, we use a Long Short-Term Memory (LSTM) network to predict stock closing prices, a SARIMAX model to forecast realized volatility, and Wasserstein Distance (WD) from TDA to capture topological patterns in price changes. In addition, SHAP (SHapley Additive exPlanations) interpretation is used to enhance the transparency of the LSTM predictions. Our findings highlight the effectiveness of this hybrid framework for both price prediction and volatility forecasting, providing valuable insights for traders and portfolio managers.

Introduction

Financial markets are inherently complex, requiring sophisticated models to forecast stock prices and volatility. Traditional models such as ARIMA and SARIMA rely on time-series trends, while machine learning approaches can capture non-linear dependencies. This study aims to combine the strengths of both paradigms.

Additionally, we introduce Topological Data Analysis (TDA) to quantify persistent homology in price changes, enriching a novel feature set with Wasserstein Distance (WD) values that captures Hidden Market Patterns: WD reveals structural dependencies in market returns that are missed by traditional metrics and enhances Prediction because during Volatile Periods, WD improves the model’s ability to handle nonlinear market behavior. This hybrid framework offers a robust toolset for price forecasting, risk management, and decision-making.

Methodology

Data Collection and Preprocessing

We obtained historical data for Pinduoduo Inc. (PDD) over a period of 5 years from Yahoo Finance. The primary features used include:

Closing Price (scaled): The target variable for price prediction.
Percentage Change in Close Price: To model realized volatility.
Wasserstein Distance (WD): Derived from TDA to capture topological patterns.

The MinMaxScaler was applied to normalize the Close prices, ensuring they fit the input requirements of the LSTM model.

Wasserstein Distance (WD) for Topological Insights

We applied persistent homology using the ripser library to derive Wasserstein Distances (WD) between consecutive days' price changes. WD captures hidden topological structures in the data, which are used as features in the LSTM model.

LSTM for Price Prediction

The LSTM neural network was trained using 3-day sliding windows of scaled prices and WD values. LSTM was chosen for its ability to capture long-term dependencies in sequential data. The predictions were inverse-transformed back to the original scale to ensure interpretability.

# LSTM Model Definition

model = Sequential()

model.add(LSTM(50, activation='relu', input_shape=(X_train.shape[1], X_train.shape[2])))

model.add(Dropout(0.2))

model.add(Dense(1))

model.compile(optimizer='adam', loss='mse')

model.fit(X_train, y_train, epochs=10, batch_size=32)

Auto-Tuned SARIMAX for Realized Volatility Forecasting

We used auto_arima from the pmdarima library to find the optimal parameters for the SARIMAX model. The best parameters were selected based on AIC values, ensuring the best fit for volatility forecasting.

auto_model = auto_arima(

data['pct_change'], start_p=1, max_p=3, start_q=1, max_q=3,

seasonal=True, m=6, start_P=0, max_P=2, start_Q=0, max_Q=2, D=1,

trace=True, stepwise=True

)

best_order = auto_model.order

best_seasonal_order = auto_model.seasonal_order

sarimax_model = SARIMAX(data['pct_change'], order=best_order, seasonal_order=best_seasonal_order)

sarimax_results = sarimax_model.fit()

SHAP Interpretation for Model Transparency

To enhance the interpretability of the LSTM model, we used SHAP values. SHAP identifies the contribution of each input feature to the final prediction, enabling transparent decision-making.

explainer = shap.KernelExplainer(model_predict, shap.sample(X_train_reshaped, 100))

shap_values = explainer.shap_values(X_test_reshaped)

shap.summary_plot(np.array(shap_values).squeeze(-1), X_test_reshaped, feature_names=feature_names)

Results

领英推荐

Data Science Unicorns, RAG Pipelines, a New…

Towards Data Science 10 个月前

Heatmaps: FiftyOne Computer Vision Tips and Tricks –…

Voxel51 1 年前

Naive bayes Classification

Bluechip Technologies Asia 9 个月前

Predicted Close Prices (Original Scale) for the Next 6 Days

The LSTM predictions reveal minor fluctuations in the stock price over the forecast period, with a slight downward trend toward Day 6. These predictions provide actionable insights for short-term investors looking to plan their entry and exit points.

Forecasted Realized Volatility Using SARIMAX

Realized Volatility predicted by SARIMAX

The SARIMAX model captures volatility swings, with significant dips on Day 4 and a recovery by Day 6. This forecast is essential for risk management, helping investors prepare for potential market instability.

The SHAP summary plot visualizes the impact of the 6 features on the LSTM model's predictions. Each dot on the plot represents the SHAP value for a given feature and sample, showing how much each feature contributes to increasing or decreasing the predicted value.

Features and Their Interpretations:

Key Insights from the SHAP Plot:

Closing Prices Drive the Predictions:
Minimal Impact of WD (Wasserstein Distance):
Color Gradient and Impact:

Altogether, This interpretability tool has gained widespread recognition in machine learning literature for its ability to provide transparent and comprehensible explanations of model predictions, thereby bridging the gap between advanced machine learning techniques and practical decision-making. [5]

Most impactful features: Scaled closing prices from days 1, 2, and 3 (Features 1, 3, and 5).
Least impactful features: Wasserstein Distance values for the same days (Features 2, 4, and 6).
Interpretation: The model primarily relies on historical price trends to make predictions, while the topological insights from WD contribute minimally. This insight can help refine the model by either exploring better topological metrics or focusing on other financial indicators.

This SHAP analysis provides a transparent view of the LSTM model's predictions, helping us understand which features matter most and why.

Discussion and Insights

The combination of LSTM for price prediction, SARIMAX for volatility forecasting, and Wasserstein Distance from TDA offers a comprehensive framework for financial forecasting.

LSTM captures non-linear patterns in stock prices, while SARIMAX effectively models volatility trends.
SHAP values provide transparency, making the LSTM model more interpretable for traders.
The inclusion of Wasserstein Distance introduces topological insights that enhance prediction accuracy.

These results demonstrate the power of hybrid models in capturing both price trends and volatility dynamics.

Conclusion

This study presents a hybrid approach to forecasting stock prices and realized volatility using LSTM, SARIMAX, and Topological Data Analysis. The results show that this framework provides accurate predictions and transparent interpretations, making it a valuable tool for traders and portfolio managers.

Future work could explore additional technical indicators (e.g., RSI, MACD) and incorporate external factors (e.g., macroeconomic variables) to further enhance prediction accuracy.

A deep divemay enhance the HAR model by incorporating Wasserstein Distance (WD) and control variables such as VIX and DXY. This HAR-WD model captures the complex, multi-dimensional influences on stock volatility, offering a robust tool for forecasting and risk management.

A generalization of the Topological Tail Dependence theory: From indices to individual stocks [5]

This article summarizes your LSTM, SARIMAX, SHAP, and TDA-based financial forecasting exercise, showcasing the hybrid framework's strength in capturing both price movements and volatility dynamics.

References

Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory.
Hyndman, R.J., & Athanasopoulos, G. (2018). Forecasting: Principles and Practice.
Ripser: Efficient Persistent Homology. Available at: Ripser.
Shapley, L.S. (1953). A Value for n-Person Games.
Hugo Gobato Souto a, Amir Moradi (2024) A generalization of the Topological Tail Dependence theory: From indices to individual stocks

What is Realized Volatility?

Realized volatility is a statistical measure of the actual or historical variability in the returns of a financial asset, such as a stock, over a specific period. It is computed using observed returns, typically from intra-day or daily prices, and provides insight into the extent to which the asset's price fluctuates in reality.

Realized volatility differs from implied volatility, which reflects the market’s expectations of future volatility.

How is Realized Volatility Calculated?

The most common way to calculate realized volatility is by using the standard deviation of returns over a certain period. If we assume daily returns are available, the formula is:

RVt=∑i=1nri2RV_t = \sqrt{\sum_{i=1}^{n} r_{i}^2}RVt=i=1∑nri2

Where:

RVtRV_tRVt: Realized volatility for time ttt.
rir_iri: Log returns (percentage change) of the asset on day iii.
nnn: Number of observations in the period (e.g., 30 days for monthly RV).

In practice, higher-frequency data (such as 5-minute or 15-minute prices) are sometimes used to compute more accurate measures of realized volatility.

要查看或添加评论，请登录

Larry liang的更多文章

Revolutionizing Real Estate Marketing: Virtual Staging Using Artificial Intelligence

2024年11月30日

Revolutionizing Real Estate Marketing: Virtual Staging Using Artificial Intelligence

Author: Larry Liang Date: 2024/11/29 Abstract Virtual staging has become a pivotal tool in real estate marketing…
?? Turn Trading Stocks into a Fun Adventure with Our Cutting-Edge Prediction Tool! ??

2024年10月29日

?? Turn Trading Stocks into a Fun Adventure with Our Cutting-Edge Prediction Tool! ??

Are you ready to spice up your investment journey? Say goodbye to boring stock trading and hello to an exciting new way…
PDD Close Price Prediction with Generative AI: GANs and VAEs in Action

2024年10月27日

PDD Close Price Prediction with Generative AI: GANs and VAEs in Action

Author: Larry Liang Date: 2024/10/24 Stock price prediction is critical for investors looking to stay ahead of market…
Stock Price Forecasting of PDD Using TimesNet, ARIMA, Transformer, and GARCH

2024年10月27日

Stock Price Forecasting of PDD Using TimesNet, ARIMA, Transformer, and GARCH

Author: Larry Liang Date: 2024/10/24 Abstract This paper presents a comparative study of four models—TimesNet, ARIMA…

Forecasting Stock Prices and Realized Volatility: A Hybrid Approach Using LSTM, SARIMAX, and Topological Data Analysis

Larry liang

Business Intelligence Engineer@ Costco Wholesale | Data Visualization Expert

Abstract

Introduction

Methodology

Data Collection and Preprocessing

Wasserstein Distance (WD) for Topological Insights

LSTM for Price Prediction

Auto-Tuned SARIMAX for Realized Volatility Forecasting

SHAP Interpretation for Model Transparency

Results

领英推荐

Predicted Close Prices (Original Scale) for the Next 6 Days

Forecasted Realized Volatility Using SARIMAX

Features and Their Interpretations:

Key Insights from the SHAP Plot:

Discussion and Insights

Conclusion

References

What is Realized Volatility?

How is Realized Volatility Calculated?

Larry liang的更多文章

社区洞察

其他会员也浏览了

Improving Video Analytics with Big Data: Techniques and Best Practices

Machine Learning for Predictive Analytics: Forecasting Future Trends

Let's talk about the Predictive Analytics.

Machine Learning in Causal Inference: Limitations and Potential

Synerise Monad: Apply science to behavioral data. Automatically.

Augmentation Data Deep Dive

time series forecasting

How Reliable are LSTM Models in Predicting Tesla's Stock?

Time Series Machine Learning Analysis and Demand Forecasting with H2O & TSstudio

Data Science, Machine Learning: Main Developments in 2017 and Key Trends in 2018

Abstract

Introduction

Methodology

Data Collection and Preprocessing

Wasserstein Distance (WD) for Topological Insights

LSTM for Price Prediction

Auto-Tuned SARIMAX for Realized Volatility Forecasting

SHAP Interpretation for Model Transparency

Results

领英推荐

Predicted Close Prices (Original Scale) for the Next 6 Days

Forecasted Realized Volatility Using SARIMAX

Features and Their Interpretations:

Key Insights from the SHAP Plot:

Discussion and Insights

Conclusion

References

What is Realized Volatility?

How is Realized Volatility Calculated?

Larry liang的更多文章

Revolutionizing Real Estate Marketing: Virtual Staging Using Artificial Intelligence

?? Turn Trading Stocks into a Fun Adventure with Our Cutting-Edge Prediction Tool! ??

PDD Close Price Prediction with Generative AI: GANs and VAEs in Action

Stock Price Forecasting of PDD Using TimesNet, ARIMA, Transformer, and GARCH

社区洞察

其他会员也浏览了

Improving Video Analytics with Big Data: Techniques and Best Practices

Machine Learning for Predictive Analytics: Forecasting Future Trends

Let's talk about the Predictive Analytics.

Machine Learning in Causal Inference: Limitations and Potential

Synerise Monad: Apply science to behavioral data. Automatically.

Augmentation Data Deep Dive

time series forecasting

How Reliable are LSTM Models in Predicting Tesla's Stock?

Time Series Machine Learning Analysis and Demand Forecasting with H2O & TSstudio

Data Science, Machine Learning: Main Developments in 2017 and Key Trends in 2018