Time Series Analysis with SARIMAX, LSTM, and FB Prophet in Python: Commodity Price Forecasting 2023-2024 for Crude Oil and Gold Price

Muhammad Aftab Ahmed

Sr. Data Engineer, Applied AI @ Afiniti | Sr. Business Analyst @ xByco | MBA | PGD - Big Data Analytics | BE

发布日期: 2023年7月11日

Commodity price forecasting is a challenging yet captivating task that has grabbed the attention of many professionals. However, with the advancement of artificial intelligence, Python's advanced machine learning models can now provide more accurate and insightful predictions. In this article, we focus on forecasting gold prices based on the FED rate and WTI crude oil prices based on the Dollar index for the years 2023-2024 using SARIMAX, Facebook Prophet, and LSTM deep learning time series analysis. This article serves as a comprehensive guide for investors, data scientists, and market analysts who are keen to leverage data-driven insights in the commodity markets.

Topics Covered:

Prerequisites for Time Series Forecasting: Stationarity, Trend, Seasonality, ADF and KPSS Tests, ACF/PACF Plots
SARIMAX Machine Learning: AR, MA, ARMA, ARIMA, and SARIMA
Facebook Prophet Library: Newton-Raphson and L-BFGS optimizers
Long Short-Term Memory (LSTM) Neural Network: Cell states, Gates, Neurons, dropout, Epochs, etc.

Prerequisites for Time Series Forecasting

Importance of Stationarity in Time Series Forecasting

In time series analysis, stationarity is a fundamental concept, primarily because the underlying assumption of many statistical models is that the statistical properties of the data do not change over time. This implies that parameters such as mean, variance, and autocorrelation remain constant regardless of the time frame

For non-stationary data, the behavior of the data changes over time. This time-dependent structure introduces complexities that make it challenging for a model to learn the underlying patterns accurately. The accuracy of the model can be severely impacted if these underlying assumptions of stationarity are violated, leading to unreliable and unstable forecasts

That's why the first step in time series analysis is often to make the series stationary. Once the series is transformed into a stationary series, we can then use statistical methods to analyze it and create models to predict future values

Decomposing Time Series Data into Trend, Seasonality, and Residuals

Decomposition in time series analysis is the process of breaking down a time series into three constituent parts:

Trend Component: This is the overall pattern or trajectory of the series. It is the long-term increase or decrease in the data. This can be linear or non-linear.
Seasonality Component: This reflects the seasonality in the series, i.e., patterns that occur as a function of specific seasons. It is the repeating short-term cycle in the series.
Residual Component: The residuals are what is left after the trend and seasonality are removed from the series. They represent the random variations in the series that cannot be explained by the model.

Additive and Multiplicative Seasonality

Seasonality is an important characteristic of time series data, where patterns repeat over regular intervals. There are two main types: additive and multiplicative. It's crucial to understand these types and their distinctions to choose the right model for analyzing time series data.

Additive Seasonality

In an additive time series, the components of the time series (level, trend, seasonality, and noise) add together. The mathematical model for an additive time series is:

y(t) = Level + Trend + Seasonality + Noise

Here:

Level refers to the average value in the series.

Trend represents any measurable pattern of the data over time.

Seasonality corresponds to the repeating short-term cycle in the series.

Noise is the random variation in the series.

A key attribute of additive seasonality is that the magnitude of the seasonal fluctuations, i.e., the difference between the peak and the troughs, does not vary with the time series' level. In other words, the variation around the trend does not change over different time periods. This feature makes additive seasonality particularly suited to time series data where the magnitude of the seasonal pattern in the data does not depend on the level of the time series

No alt text provided for this image — Additive Seasonality Example - Time Series Forecasting

In above example, trend is upward with constant magnitude of the seasonal fluctuations therefore this is additive seasonality

Multiplicative Seasonality

Contrastingly, in a multiplicative time series, the components of the time series multiply together. The mathematical model for a multiplicative time series is:

y(t) = Level * Trend * Seasonality * Noise

Multiplicative seasonality is characterized by an increasing or decreasing seasonal amplitude over time, dependent on the time series' level. It is especially useful when the variation in the seasonal pattern increases or decreases as the data values increase or decrease. A critical point to note is that in multiplicative seasonality, the season seems to "widen" as time progresses

In above example, trend is upward with increasing magnitude of the seasonal fluctuations therefore this is multiplicative seasonality

Transformation from Multiplicative to Additive

A notable attribute of multiplicative time series data is that they can often be converted into additive time series through transformations such as taking logarithms. This transformation can be beneficial in certain scenarios.

If a seasonal time series shows multiplicative seasonality, it may often be beneficial to transform it into an additive series by taking the log. By taking the logarithm of the data, the multiplicative components convert to additive components, and the variance stabilizes.

This log transformation is particularly useful when the amplitude of the seasonal component is proportional to the level of the series, which leads to an increasing or decreasing seasonal amplitude. The transformation thus helps to stabilize the increasing or decreasing seasonal amplitude, allowing for more straightforward forecasting.

Decomposition of Gold Prices Data from 1970 onwards

In Python, time series decomposition can be done using the seasonal_decompose function provided by the statsmodels library as shown below

from statsmodels.tsa.seasonal import seasonal_decompose


# Decompose 'Gold Price (oz/$)'
decomp_results_gold = seasonal_decompose(df['Gold Price (oz/$)'], period=12)
# Plot decomposed data?
plt.rcParams["figure.figsize"] = (10,10)
decomp_results_gold.plot()
plt.title('Decomposition of Gold Price')
plt.show()

Above decomposition of gold prices from 1970 onwards can be eloborate as follows

Decompose gold prices showing upward trend
Decompose gold prices showing additive seasonality as magnitude of flutuation is constant

Thus gold prices are non-stationary due to presence of upward trend and additive seasonality

Decomposition of WTI Crude Oil Prices from 1983 onwards

from statsmodels.tsa.seasonal import seasonal_decompose
import matplotlib.pyplot as plt


# Decompose 'Gold Price (oz/$)'
decomp_results_gold = seasonal_decompose(df['Crude_Oil_WTI_($/bbl)'], period=12)


# Plot decomposed data?
plt.rcParams["figure.figsize"] = (10,10)
fig = decomp_results_gold.plot()
plt.subplots_adjust(hspace=0.5)? # Adjust the spacing between subplots
plt.suptitle('Decomposition of Crude_Oil_WTI', y=1.02)? # Adjust the main title position
plt.show()

Above decomposition of WTI Crude Oil from 1983 onwards can be eloborate as follows

Decompose WTI Crude Oil showing weaker upward trend
Decompose WTI Crude Oil showing additive seasonality as magnitude of flutuation is constant

Thus WTI Crude Oil prices are non-stationary due to presence of weaker upward trend and additive seasonality

Checking Stationarity with KPSS and Augmented Dickey-Fuller Tests

While decomposing data into trend, seasonality, and residuals is one method of checking stationarity, it is not the only approach. Evaluating stationarity requires experience that develops over time.

Therefore, another way to assess stationarity is through statistical tests. These tests are designed to analyze the statistical properties of the data and determine if they remain constant over time. Utilizing statistical tests provides a rigorous and objective means of determining whether the data exhibits stationarity.

Augmented Dickey-Fuller (ADF) Test

The ADF test is a type of unit root test. The "unit root" refers to a scenario where a time series is non-stationary and behaves in a way that's statistically similar to a random walk. This is problematic because a series that resembles a random walk is unpredictable, with sharp shifts and changes that don't adhere to a particular pattern.

In the context of the ADF test, the null hypothesis assumes the presence of a unit root, i.e., the series is non-stationary. If the p-value obtained from the test is less than the significance level (commonly 0.05), we reject the null hypothesis, inferring that the time series is stationary.

Implementing the ADF test in Python is straightforward, using the adfuller function from the statsmodels library

ADF Test for Gold Prices

from statsmodels.tsa.stattools import adfuller
print("adfuller test results shown below")
result = adfuller(df['Gold Price (oz/$)'])
print(result)?
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
if result[1] < 0.05:
? ? print("Null hypothesis that data is non-stationary is rejected Thus data is stationary")
else:
? ? print("Null hypothesis that data is non-stationary is accepted Thus data is non-stationary")
for key, value in result[4].items():
?print('\t%s: %.3f' % (key, value))

As P-Value is 0.96 > 0.05, hence Gold Price Data is non-stationary as per ADF Test

ADF Test for WTI Crude Oil Prices

from statsmodels.tsa.stattools import adfuller
print("adfuller test results shown below")
result = adfuller(df['Crude_Oil_WTI_($/bbl)'])
print(result)?
print('ADF Statistic: %f' % result[0])
print('p-value: %f' % result[1])
if result[1] < 0.05:
? ? print("Null hypothesis that data is non-stationary is rejected Thus data is stationary")
else:
? ? print("Null hypothesis that data is non-stationary is accepted Thus data is non-stationary")
for key, value in result[4].items():
?print('\t%s: %.3f' % (key, value))?

As P-Value is 0.18 > 0.05, hence WTI Crude Oil Price Data is non-stationary as per ADF Test

Kwiatkowski-Phillips-Schmidt-Shin (KPSS) Test

The KPSS test, in contrast to the ADF test, examines the null hypothesis that a time series is stationary around a deterministic trend (trend-stationarity), as opposed to being non-stationary due to a unit root. This distinction is crucial. A process could be stationary around a deterministic trend (where fluctuations around the trend will always revert to the trend and the variations are relatively stable), which is an entirely different form of non-stationarity than having a unit root.

In the KPSS test, the null hypothesis is that the time series data is stationary (or trend stationary), If the test statistic is greater than the critical value, we reject the null hypothesis, which means the series is not stationary.

To perform a KPSS test in Python, you can use the kpss function from the statsmodels library

KPSS Test for Gold Prices

def kpss_test(data, **kw)
? ? series = df['Gold Price (oz/$)']
? ? statistic, p_value, n_lags, critical_values = kpss(series, **kw)
? ??
? ? # Format Output
? ? print(f'KPSS Statistic: {statistic}')
? ? print(f'p-value: {p_value}')
? ? print(f'num lags: {n_lags}')
? ? print('Critial Values:')
? ? for key, value in critical_values.items():
? ? ? ? print(f'? ?{key} : {value}')
? ??
? ? if p_value < 0.05:
? ? ? ? print("Result: The series is not stationary")
? ? ? ? print("Null hypothesis that data is stationary is rejected. Thus, data is non-stationary.")
? ? else:
? ? ? ? print("Result: The series is stationary")
? ? ? ? print("Null hypothesis that data is stationary is accepted. Thus, data is stationary.")
? ? ? ??
kpss_test(df):

As P-Value is 0.01 < 0.05, hence Gold Price Data is non-stationary as per KPSS Test as well

KPSS Test for WTI Crude Oil Prices

def kpss_test(data, **kw)
? ? series = df['Crude_Oil_WTI_($/bbl)']
? ? statistic, p_value, n_lags, critical_values = kpss(series, **kw)
? ??
? ? # Format Output
? ? print(f'KPSS Statistic: {statistic}')
? ? print(f'p-value: {p_value}')
? ? print(f'num lags: {n_lags}')
? ? print('Critial Values:')
? ? for key, value in critical_values.items():
? ? ? ? print(f'? ?{key} : {value}')
? ??
? ? if p_value < 0.05:
? ? ? ? print("Result: The series is not stationary")
? ? ? ? print("Null hypothesis that data is stationary is rejected. Thus, data is non-stationary.")
? ? else:
? ? ? ? print("Result: The series is stationary")
? ? ? ? print("Null hypothesis that data is stationary is accepted. Thus, data is stationary.")
? ? ? ??
kpss_test(df):

As P-Value is 0.01 < 0.05, hence WTL Crude Oil Data is non-stationary as per KPSS Test as well

Making Data Stationary Plotting ACF and PACF Plots

There are several ways to make your time series data stationary, including differencing, seasonal differencing, and transformation

Differencing: This technique transforms the time series by calculating the difference between consecutive observations. In pandas, this can be done using the diff() function. Differencing can help stabilize the mean of a time series by removing changes in the level of a time series, thus eliminating trend and seasonality.
Seasonal Differencing: In some cases, a time series may be seasonal, and we need to remove the seasonality to make the series stationary. This is done by subtracting the observation from the same season in the previous cycle. This can be achieved by using the shift() function in pandas.
Transformation: Sometimes, a mathematical transformation of the time series data can make it stationary. Examples include square root, cube root, logarithmic, and the more general Box-Cox transformations. These transformations can help to stabilize the variance of a time series.

When it comes to mixed differencing, some good rules of thumb are that you should never use more than one order of seasonal differencing, and never more than two orders of differencing in total

D (order of seasonal differencing) should be 0 or 1
d + D (order of regular differencing plus order of seasonal differencing) should range between 0 and 2

Plotting ACF and PACF Plots, finding values of p and q

ACF and PACF plots provide a visual representation of the correlation between points in a time series, based on the time separation, or "lag" between the points. They are essential tools for understanding the properties of the time series data and to identify patterns that can assist in the development of predictive models.

Specifically, these plots can help determine the parameters of an AutoRegressive Integrated Moving Average (ARIMA) model or its seasonal counterpart (SARIMA), which are commonly used models for forecasting in time series analysis

Autocorrelation Function (ACF) Plot

The Autocorrelation Function plot, also known as a Correlogram, gives us the correlation between the time series and its lagged values. It shows us the strength and direction of the linear relationship between observations at different time steps. The ACF plot is used to identify the 'MA(q)' component of an ARIMA model, where 'q' is the highest lag in the data where the ACF plot crosses the significance level.

In Python, the ACF plot can be generated using the plot_acf function from the statsmodels.graphics.tsaplots module. You will need a time series dataset to pass into this function, which will then generate and display the ACF plot.

In above example, the ACF plots tails off at 3 therefore the value of q = 3, MA(3)

Partial Autocorrelation Function (PACF) Plot

On the other hand, the Partial Autocorrelation Function plot provides the partial correlation of a time series with its own lags. It controls for all other lags; the PACF at lag 'k' is the correlation that results after removing the effect of any correlations due to the terms at shorter lags. The PACF plot is used to identify the 'AR(p)' component of an ARIMA model, where 'p' is the highest lag in the data where the PACF plot crosses the significance level.

In Python, the PACF plot can be generated using the plot_pacf function from the statsmodels.graphics.tsaplots module. As with the ACF plot, you will need a time series dataset to pass into this function, which will then generate and display the PACF plot.

In above example, the PACF plots tails off at 2 therefore the value of p = 2, AR(2)

Refer to below table to find p and q values from ACF and PACF plots

Making Gold Price Data Stationary | Plotting ACF and PACF

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import acf, pacf, adfuller, kpss
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from pmdarima.arima import auto_arima


def apply_differencing(df, column='value', seasonality=None):
? ? """
? ? Apply different differencing methods on the given DataFrame.


? ? Parameters:
? ? ? ? - df: Pandas DataFrame
? ? ? ? ? ? The input DataFrame containing the time series data.
? ? ? ? - column: str, optional (default='value')
? ? ? ? ? ? The name of the column to analyze.
? ? ? ? - seasonality: int, optional (default=None)
? ? ? ? ? ? The seasonality period for seasonal differencing.


? ? Returns:
? ? ? ? - differencing_results: dict
? ? ? ? ? ? A dictionary containing the differencing methods and their results.


? ? """


? ? differencing_results = {}


? ? # Convert the column to numeric type
? ? df[column] = pd.to_numeric(df[column], errors='coerce')


? ? # Original data
? ? differencing_results['Original'] = df[column]


? ? # First-order differencing
? ? differencing_results['First Order'] = df[column].diff().dropna()


? ? # Seasonal differencing
? ? if seasonality is not None:
? ? ? ? differencing_results['Seasonal'] = df[column].diff(periods=seasonality).dropna()


? ? # Logarithmic differencing
? ? differencing_results['Log'] = np.log(df[column]).diff().dropna()


? ? # Plot differencing results
? ? fig, axes = plt.subplots(len(differencing_results), 1, figsize=(15, 5 * len(differencing_results)))
? ? for i, (method, series) in enumerate(differencing_results.items()):
? ? ? ? axes[i].plot(series)
? ? ? ? axes[i].set_ylabel(column)
? ? ? ? axes[i].set_xlabel('Date')
? ? ? ? axes[i].set_title(f'Differencing Method: {method}')
? ? plt.tight_layout()
? ? plt.show()


? ? # Perform ADF and KPSS tests
? ? for method, series in differencing_results.items():
? ? ? ? print(f'\n##### {method} Differencing #####')
? ? ? ? print('\nADF Test:')
? ? ? ? perform_adf_test(series)
? ? ? ? print('\nKPSS Test:')
? ? ? ? perform_kpss_test(series)
? ? ? ? print('\nOptimal ARMA Model:')
? ? ? ? find_optimal_arma(series, method)


? ? return differencing_results


def perform_adf_test(series):
? ? result = adfuller(series)
? ? print('ADF Statistic: %f' % result[0])
? ? print('p-value: %f' % result[1])
? ? if result[1] < 0.05:
? ? ? ? print("ADF Test: The series appears to be stationary")
? ? else:
? ? ? ? print("ADF Test: The series appears to be non-stationary")


def perform_kpss_test(series):
? ? statistic, p_value, n_lags, critical_values = kpss(series)
? ? print(f'KPSS Statistic: {statistic}')
? ? print(f'p-value: {p_value}')
? ? print(f'num lags: {n_lags}')
? ? if p_value < 0.05:
? ? ? ? print("KPSS Test: The series appears to be non-stationary")
? ? else:
? ? ? ? print("KPSS Test: The series appears to be stationary")


def find_optimal_arma(series, method):
? ? stepwise_model = auto_arima(series, start_p=0, start_q=0,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?max_p=3, max_q=3, m=1,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?seasonal=False,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?suppress_warnings=True,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?trace=False,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?error_action='ignore',
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stepwise=True)
? ? best_order = stepwise_model.order
? ? print(f'{method} Differencing - Optimal ARMA(p,q): {best_order}')


# Load the dataset
df = pd.read_csv('Historical_Data_XAU_USD.csv', index_col='Date', parse_dates=True)
# Remove commas and convert Gold Price to float
df['Gold Price (oz/$)'] = df['Gold Price (oz/$)'].str.replace(',', '').astype(float)


# Apply differencing methods
differencing_results = apply_differencing(df, column='Gold Price (oz/$)', seasonality=12)

Making WTL Crude Oil Data Stationary | Plotting ACF and PACF

import pandas as p
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import acf, pacf, adfuller, kpss
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from pmdarima.arima import auto_arima


def apply_differencing(df, column='value', seasonality=None):
? ? """
? ? Apply different differencing methods on the given DataFrame.


? ? Parameters:
? ? ? ? - df: Pandas DataFrame
? ? ? ? ? ? The input DataFrame containing the time series data.
? ? ? ? - column: str, optional (default='value')
? ? ? ? ? ? The name of the column to analyze.
? ? ? ? - seasonality: int, optional (default=None)
? ? ? ? ? ? The seasonality period for seasonal differencing.


? ? Returns:
? ? ? ? - differencing_results: dict
? ? ? ? ? ? A dictionary containing the differencing methods and their results.


? ? """


? ? differencing_results = {}


? ? # Convert the column to numeric type
? ? df[column] = pd.to_numeric(df[column], errors='coerce')


? ? # Original data
? ? differencing_results['Original'] = df[column]


? ? # First-order differencing
? ? differencing_results['First Order'] = df[column].diff().dropna()


? ? # Seasonal differencing
? ? if seasonality is not None:
? ? ? ? differencing_results['Seasonal'] = df[column].diff(periods=seasonality).dropna()


? ? # Logarithmic differencing
? ? differencing_results['Log'] = np.log(df[column]).diff().dropna()


? ? # Plot differencing results
? ? fig, axes = plt.subplots(len(differencing_results), 1, figsize=(15, 5 * len(differencing_results)))
? ? for i, (method, series) in enumerate(differencing_results.items()):
? ? ? ? axes[i].plot(series)
? ? ? ? axes[i].set_ylabel(column)
? ? ? ? axes[i].set_xlabel('Date')
? ? ? ? axes[i].set_title(f'Differencing Method: {method}')
? ? plt.tight_layout()
? ? plt.show()


? ? # Perform ADF and KPSS tests
? ? for method, series in differencing_results.items():
? ? ? ? print(f'\n##### {method} Differencing #####')
? ? ? ? print('\nADF Test:')
? ? ? ? perform_adf_test(series)
? ? ? ? print('\nKPSS Test:')
? ? ? ? perform_kpss_test(series)
? ? ? ? print('\nOptimal ARMA Model:')
? ? ? ? find_optimal_arma(series, method)


? ? return differencing_results


def perform_adf_test(series):
? ? result = adfuller(series)
? ? print('ADF Statistic: %f' % result[0])
? ? print('p-value: %f' % result[1])
? ? if result[1] < 0.05:
? ? ? ? print("ADF Test: The series appears to be stationary")
? ? else:
? ? ? ? print("ADF Test: The series appears to be non-stationary")


def perform_kpss_test(series):
? ? statistic, p_value, n_lags, critical_values = kpss(series)
? ? print(f'KPSS Statistic: {statistic}')
? ? print(f'p-value: {p_value}')
? ? print(f'num lags: {n_lags}')
? ? if p_value < 0.05:
? ? ? ? print("KPSS Test: The series appears to be non-stationary")
? ? else:
? ? ? ? print("KPSS Test: The series appears to be stationary")


def find_optimal_arma(series, method):
? ? stepwise_model = auto_arima(series, start_p=0, start_q=0,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?max_p=3, max_q=3, m=1,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?seasonal=False,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?suppress_warnings=True,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?trace=False,
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?error_action='ignore',
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?stepwise=True)
? ? best_order = stepwise_model.order
? ? print(f'{method} Differencing - Optimal ARMA(p,q): {best_order}')


# Load the dataset
df = pd.read_csv('Crude_Oil_WTI.csv', index_col='Date', parse_dates=True)


# Apply differencing methods
differencing_results = apply_differencing(df, column='Crude_Oil_WTI_($/bbl)', seasonality=12)

d

After applying appropriate differencing to make the dataset stationary, the next step in our time series analysis is to plot the Autocorrelation Function (ACF) and Partial Autocorrelation Function (PACF) plots. These plots help us identify the optimal values for the parameters 'p' and 'q' in the ARIMA model.

To make this part interactive and engaging, I encourage you to participate by posting your observations and thoughts on the ACF and PACF plots in the comment section below. Together, we can confirm the most suitable values for 'p' and 'q' and continue our exciting journey into time series forecasting.

Exploring the ARIMA Spectrum: AR | MA | ARMA | ARIMA | SARIMA and SARIMAX

ARIMA modeling is a key technique in time series analysis. Founded on mid-20th-century statistical theory, it's designed to predict future data points by understanding the past. ARIMA models are built on ideas of stationarity, differencing, and autocorrelation, aiming to identify and understand any trends or patterns in the time series data

This makes them great for working on data with clear trends or seasonality, like stock prices or weather patterns. Because of their adaptability, ARIMA models have found use in a wide range of fields, from finance to engineering, making them an important tool in data analysis. This versatile and dependable modeling approach continues to be popular in the data science community

Autoregressive (AR) Models

AR models are based on the concept of autocorrelation. Autocorrelation is a statistical property where a variable is correlated with a lagged version of itself. In AR models, we assume that the current value of our time series can be predicted from its previous values, or lags. The number of lags used as predictors is called the order of the AR model. For example, an AR(1) model would predict the current value based on the immediately preceding value, while an AR(2) model would use the last two values, and so forth.

Moving Average (MA) Models

MA models, on the other hand, do not use past values of the time series as predictors. Instead, they use a series of past forecast errors in a "moving window" as input variables. The idea is that if a random disturbance causes a fluctuation at one point, it's likely to create similar disturbances in the near future. A model order is again defined for an MA model - for instance, MA(1) uses the error at the immediately preceding time step to predict the current value.

Autoregressive Moving Average (ARMA) Models

ARMA models are simply a combination of AR and MA models. They use both past observations of the time series and past forecast errors to predict future values. In other words, they blend the idea that the present is a function of the past (AR) and that errors made in the past propagate into the future (MA).

领英推荐

Latest AI Python Packages

Ashok Veda 1 年前

#ArtificialIntelligence No 65: Why R lost the R vs…

Ajit Jaokar 2 年前

Python’s Top 6 Machine Learning Algorithms

Sankhyana Consultancy Services Pvt. Ltd. 2 年前

Autoregressive Integrated Moving Average (ARIMA) Models

ARIMA models extend ARMA models by considering the time series' order of differencing to make it stationary (i.e., removing trends and seasonality from the data). The "integrated" in ARIMA refers to this differencing process, indicating that the model uses differences of the time series at different lags to predict future values. An ARIMA model is usually noted as ARIMA(p,d,q), where p is the order of the AR part, d is the order of differencing, and q is the order of the MA part.

Seasonal Autoregressive Integrated Moving Average (SARIMA) Models

SARIMA models extend ARIMA by adding a seasonal difference to handle seasonal components in the time series. In other words, besides considering dependencies at previous time steps and past forecast errors (like ARIMA), they also account for patterns that repeat seasonally. These models are often used for series that show a clear seasonal pattern, such as electricity consumption, tourist arrivals, and monthly sales data.

Seasonal Autoregressive Integrated Moving Average with Explanatory Variable (SARIMAX) Models

Finally, SARIMAX extends SARIMA by incorporating an external or exogenous variable, which might improve the model's performance. Exogenous variables are those that are not part of the time series but have a relationship with it. For example, in predicting the sales of a soda company, the average temperature can be an exogenous variable.

Facebook's Prophet Model | Open Source Library for Time Series Forecasting

Facebook Prophet was developed by Facebook's Core Data Science team and open-sourced in 2017. It is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. The model was designed to handle the common features seen in business time series, such as seasonality, trends, and holidays.

Prophet is robust to missing data and shifts in the trend, and typically handles outliers well, making it more flexible in handling various kinds of 'real-world' data. Its flexibility and ease of use — it doesn't require complex configuration and has automatic feature engineering for time-based components — have led to its increasing popularity in time series forecasting.

Components | Trend, Seasonality, and Holidays in Prophet Architecture

The Prophet model uses an additive regression model, incorporating four primary components: a piecewise linear or logistic growth curve trend, a yearly seasonal component modeled using Fourier series, a weekly seasonal component using dummy variables, and a user-provided list of important holidays.

Changes in trends are automatically detected and defined as change points in the data, while the seasonal components are modeled with Fourier series to allow complex seasonal patterns. Holidays are modeled with individual dummy variables, allowing for versatile modeling of different holiday effects.

Optimization Techniques in Prophet: Newton-Raphson and L-BFGS

Optimization in Prophet is executed to fit the model parameters, including trend and seasonality components. Two significant optimizers employed in Prophet are the Newton-Raphson and Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) methods

Newton-Raphson Method: The Newton-Raphson method is a pivotal optimization technique implemented in Prophet to fine-tune the model parameters, including components of trend and seasonality. This algorithm works by identifying the maximum (or minimum) of a function. In the context of Prophet, the function to be maximized is the likelihood function of the observed data, contingent on the model's parameters. The Newton-Raphson method iteratively refines the approximation of the parameters to achieve maximum likelihood, thereby optimizing the model's performance

L-BFGS: Complementing the Newton-Raphson method in Prophet is the Limited-memory Broyden–Fletcher–Goldfarb–Shanno (L-BFGS) optimizer. This method is a memory-efficient variant of the standard BFGS algorithm, using gradient information to identify the minimum of the function. L-BFGS becomes particularly beneficial when dealing with a large number of parameters, as it significantly reduces both computational complexity and memory requirements, thereby enhancing the model's efficiency and scalability.

Data Preprocessing: Handling Outliers, Missing Data and Scaling in Prophet

Prophet has been designed to handle missing data and outliers. For missing data, as long as there are not large gaps, Prophet will provide a forecast by fitting the trend and seasonality.

Outliers can be removed from the training data, or you can adjust their values before fitting the model. However, due to the robustness of the Prophet model, often these steps might not be necessary.

In terms of scaling, Prophet does not require the input data to be stationary, and it also does not require any specific scaling of the input data. This makes Prophet easy to use on many types of time series data with minimal preprocessing.

Data Forecasting: Intervals, Horizon and Model Evaluation in Prophet

In Prophet, you can set the forecasting horizon by specifying the number of periods and the frequency (e.g., days, weeks) of the forecasts. This provides flexibility in generating forecasts for different future periods.

Prophet also provides functionality to include uncertainty intervals in the forecasts, which provide a range for the future values instead of a single prediction. These intervals are based on the uncertainty in the trend and seasonality, and do not include any uncertainty from additional regressors.

To evaluate the forecasting performance, Prophet provides functionality to perform cross-validation to assess the accuracy of the model's forecasts. This is done by selecting cutoff points in the history, and for each of them, fitting the model using data only up to that cutoff point, and then comparing the forecast to the actual values.

Long Short-Term Memory (LSTM) | Deep Learning Neural Network

Long Short-Term Memory (LSTM) was introduced by Sepp Hochreiter and Jurgen Schmidhuber in 1997 as a solution to the vanishing gradient problem in training traditional Recurrent Neural Networks (RNNs). The vanishing gradient problem occurs during training when gradients, computed during the backpropagation, get smaller and smaller as they are passed backwards through time. As a result, the weights in the earlier layers of the network are updated to a lesser extent, hindering the model's learning process.

Traditional RNNs excel in tasks where input and output sequences are synchronous and of the same length, such as in translation tasks. However, when dealing with tasks that involve longer sequences with possible long-term dependencies, the performance of traditional RNNs significantly degrades, mainly due to the aforementioned vanishing gradient problem.

LSTM was designed to remember information for long periods, thus overcoming the limitations of traditional RNNs, making it more suitable for tasks involving longer sequences, like time series forecasting, text generation, and more.

Cell State | Forget, Input and Output Gates of LSTM Architecture

LSTMs contain information outside the normal flow of the recurrent network in a 'cell state'. Much like computer memory, the cell state stores and recalls information. It can read, write, and erase information, with the amount of information to retain or forget determined by structures called 'gates'.

Gates are a way to optionally let information through. They are composed of a sigmoid activation layer and a pointwise multiplication operation. The sigmoid layer outputs numbers between zero and one, describing how much of each component should be let through.

An LSTM has three of these gates:

Forget gate: This is a sigmoid layer that takes the output at the previous time step 'h(t-1)' and the current input 'x(t)' and outputs a number between 0-1 for each number in the cell state 'C(t-1)'. A 1 represents “completely keep this” while a 0 represents “completely get rid of this”.
Input gate: This is to update the cell state with new information. It has two parts; a sigmoid layer called the "input gate layer" which decides which values will be updated, and a tanh layer which creates new candidate values that could be added to the state.
Output gate: This decides what the next hidden state should be. A sigmoid layer decides which parts of the cell state make it to the output.

Anatomy of a Neuron: Structure and Function

Neurons are fundamental units of computation in artificial neural networks, including LSTM (Long Short-Term Memory) models used for time series forecasting. Understanding the components of a neuron and their role in LSTM can provide valuable insights into how these models process sequential data.

Dendrites: These are the structures of the neuron that receive inputs from the previous layer or directly from the input data. In LSTM models, dendrites act as the input receivers for sequential data, such as historical values of a time series. They allow the neuron to capture temporal dependencies and patterns in the data.
Cell Body: The cell body, also known as the soma, is where all the inputs received by the dendrites are summed up. This summation process integrates the information from different inputs. If the sum of the inputs exceeds a certain threshold, the neuron becomes activated and proceeds with further processing. Activation functions, such as the sigmoid or tanh function, are often used to determine the activation state of the neuron.
Axon: The axon is the output structure of the neuron. When the neuron fires, the output signal is carried by the axon to the neurons of the next layer. In the context of LSTM time series forecasting, the axon carries the predictions or forecasts made by the LSTM model for the given time step. The signal is then propagated to the next layer, allowing the model to capture and propagate information through time.

In LSTM models, the dendrites receive sequential input data, the cell body processes and integrates these inputs, and the axon carries the output signals. This sequential processing capability of LSTM networks makes them well-suited for capturing and analyzing temporal patterns in time series data. By effectively leveraging the interactions between dendrites, the cell body, and the axon, LSTM models can make accurate predictions and forecasts in time series forecasting tasks.

Optimizers in Neural Networks: Adam and RMSprop

The optimizer's role is crucial as it shapes and molds your model into its most accurate possible form by manipulating the weights. The loss function is the guide to the terrain, telling the optimizer when it’s moving in the right or wrong direction.

RMSProp: RMSProp (Root Mean Square Propagation) divides the learning rate by an exponentially decaying average of squared gradients. It tries to dampen the oscillations, but can sometimes be too aggressive and the system ends up almost not moving, especially with noisy data. RMSProp is an optimizer that's reliable and fast, and has been popular in the deep learning community.
Adam: Adam (Adaptive Moment Estimation) can be viewed as a combination of RMSprop and Stochastic Gradient Descent with momentum. It uses the squared gradients to scale the learning rate like RMSprop and it takes advantage of momentum by using the moving average of the gradient instead of gradient itself like SGD with momentum. Adam is usually recommended as the default algorithm to use, and often works slightly better than RMSprop.

In terms of when to use which, it usually depends on the specific problem and the nature of the dataset. Adam is often a good starting point as it combines the advantages of other optimizers.

Dropout Rate, Batch Size and Epochs in LSTM Modelling

Dropout is a technique used to prevent overfitting. Dropout works by probabilistically removing, or “dropping out,” inputs to a layer, which may be input variables in the data sample or activations from a previous layer. It has the effect of simulating a large number of networks with very different network structures and, in turn, making nodes in the network generally more robust to the inputs

The batch size determines the number of examples you look at before making a weight update. The lower it is, the noisier the training signal is going to be, the higher it is, the longer it will take to compute the gradient for each step

An epoch is a term used in machine learning and indicates the number of passes of the entire training dataset the machine learning algorithm has completed. If the batch size is the whole dataset, then each epoch is equivalent to one iteration. The number of epochs is a hyperparameter you can tune. If you set it too small, your model might still have room to improve. If you set it too large, then you're just wasting computational resources because your model has already converged.

Decoding the Correlation: Gold Prices and the Federal Reserve Rate

Gold, a prominent symbol of wealth and prosperity, has been at the heart of global economics for centuries. Its price, which has seen various fluctuations over time, is influenced by a myriad of factors, one being the Federal Reserve (Fed) rate. The association between gold prices and the Fed rate has morphed over time, particularly after the United States decided to sever the link between the dollar and gold.

The U.S. dollar was tied to gold under the gold standard system. In this framework, the dollar's value was directly connected to a specified amount of gold. The gold standard had a critical influence on gold prices, essentially stabilizing them.

However, in 1971, President Richard Nixon made a historic decision to end the gold standard. This event, known as the 'Nixon Shock,' marked the decoupling of the dollar from gold. The primary reason for this shift was to curb the outflow of gold and stabilize the U.S. economy, which was grappling with increasing inflation and unemployment.

The decoupling spurred significant changes in the global economic landscape. For gold, it allowed prices to float freely in the market, influenced by supply and demand. Consequently, gold transitioned from a relatively stable asset to a more volatile one. The event ignited a bullish market for gold that lasted for almost a decade, with gold prices soaring from $35 an ounce to a peak of $850 in 1980.

In times of economic stability and growth, when the Fed raises rates, the returns on bonds and other fixed-income investments tend to rise. This scenario makes gold, a non-interest bearing asset, less attractive, leading to a drop in its price. Conversely, in an economic downturn, when the Fed slashes rates, gold becomes an appealing investment, driving up its price.

However, other factors, such as geopolitical events, inflation expectations, and global market volatility, can also significantly impact gold prices. For instance, even in a high-interest-rate environment, gold prices might surge due to geopolitical tensions, as investors flock to gold as a 'safe-haven' asset.

Although currently, gold prices depend on many factors, primarily geopolitical events and economic uncertainty, from a Machine Learning Modeling perspective, the single most suitable parameter is still the Federal Reserve (Fed) rate, considering ceteris paribus.

Petro-Dollar and WTI Crude Oil Prices

The interplay between WTI crude oil prices and the Dollar Index, an indicator of the U.S. dollar's value against major currencies, carries profound implications in the global energy market. Understanding the historical context and the significance of the petro-dollar system sheds light on the complex relationship between these two influential factors.

The petro-dollar system, born in the 1970s, solidified the link between WTI crude oil prices and the U.S. dollar. Following the oil crisis, major oil-producing nations agreed to price oil exclusively in U.S. dollars, cementing the dollar's dominance as the preferred currency for global oil transactions. This arrangement created a symbiotic relationship, as countries needing oil had to maintain a steady supply of dollars, strengthening the currency's value.

One key consequence of the petro-dollar system is the inverse relationship between WTI crude oil prices and the Dollar Index. When the Dollar Index strengthens, reflecting a robust U.S. economy, WTI crude oil prices tend to face downward pressure. This occurs because a stronger dollar makes oil more expensive for buyers using other currencies, dampening demand and contributing to lower prices.

While the petro-dollar system and the Dollar Index play a significant role in WTI crude oil price movements, it is crucial to acknowledge that other factors influence oil prices as well. Geopolitical events, global supply and demand dynamics, weather disruptions, and production fluctuations are just a few examples of additional factors that can exert independent effects on WTI crude oil prices.

In the realm of Machine Learning Modeling, the Dollar Index remains a vital parameter for analyzing and forecasting WTI crude oil prices. By incorporating the Dollar Index as a feature in predictive models, analysts can enhance their ability to capture the nuanced relationship between these variables and improve the accuracy of oil price predictions.

Gold Price Forecasting 2023-2024 | Python SARIMAX , Facebook Prophet and LSTM Deep Learning Modelling

Python SARIMAX ML Code, Gold Price Forecasting 2023-2024

Lets first visualise input dataframe

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX
import warnings
from sklearn.model_selection import TimeSeriesSplit
from sklearn.model_selection import ParameterSampler
import matplotlib.cm as cm
import matplotlib.gridspec as gridspec
warnings.filterwarnings('ignore')? # Ignore warning messages


# Load and preprocess the data
df = pd.read_csv('Historical_Data_XAU_USD.csv')
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
df['Gold Price (oz/$)'] = df['Gold Price (oz/$)'].str.replace(',', '').astype(float)
df['FED Rate'] = df['FED Rate'].astype(float)


# Define a range of hyperparameters to test
param_dist = {"order": [(p, d, q) for p in range(3) for d in [0, 1, 2] for q in range(3)],
? ? "seasonal_order": [(P, D, Q, 12) for P in range(3) for D in [0, 1, 2] for Q in range(3)]}


# Initialize a random parameter sampler
param_list = list(ParameterSampler(param_dist, n_iter=50))


# Define Time Series cross-validator
tscv = TimeSeriesSplit(n_splits=5)


best_aic = np.inf
best_bic = np.inf
best_params = None


# Cross-validation and hyperparameter tuning
for params in param_list:
? ? aic_scores = []
? ? bic_scores = []


? ? # Cross-validation loop
? ? for train_index, test_index in tscv.split(df):
? ? ? ? cv_train, cv_test = df.iloc[train_index], df.iloc[test_index]


? ? ? ? # Fit SARIMAX model
? ? ? ? sarima = SARIMAX(cv_train['Gold Price (oz/$)'], exog=cv_train['FED Rate'],
? ? ? ? ? ? ? ? ? ? ? ? ?order=params['order'], seasonal_order=params['seasonal_order'])
? ? ? ? sarima_results = sarima.fit()


? ? ? ? # Save AIC and BIC scores
? ? ? ? aic_scores.append(sarima_results.aic)
? ? ? ? bic_scores.append(sarima_results.bic)


? ? # Average AIC and BIC scores
? ? mean_aic = np.mean(aic_scores)
? ? mean_bic = np.mean(bic_scores)


? ? # Update best parameters if current model has a lower AIC and BIC
? ? if mean_aic < best_aic and mean_bic < best_bic:
? ? ? ? best_aic = mean_aic
? ? ? ? best_bic = mean_bic
? ? ? ? best_params = params


# Print the best parameters and the corresponding AIC and BIC
print('Best parameters:', best_params)
print('Best AIC:', best_aic)
print('Best BIC:', best_bic)


# Fit the best model
try:
? ? best_model = SARIMAX(df['Gold Price (oz/$)'], exog=df['FED Rate'],
? ? ? ? ? ? ? ? ? ? ? ? order=best_params['order'], seasonal_order=best_params['seasonal_order'])
? ? best_model_fit = best_model.fit()
except LinAlgError:
? ? print(f"Best model parameters: {best_params} cause a LinAlgError")


# Define future FED Rates
future_fed_rates = [df['FED Rate'].iloc[-1]] + [5.3]*6 + [5.6]*6 + [5.3]*3 +? [5.06]*2


# Create forecast object
forecast_object = best_model_fit.get_forecast(steps=18, exog=np.array(future_fed_rates).reshape(-1, 1))


# Extract prediction mean
mean = forecast_object.predicted_mean


# Extract the forecast dates
datescf = pd.date_range(start=df.index[-1], periods=18, freq='MS')


# Predictions on historical data
in_sample_forecast = best_model_fit.get_prediction(start=pd.to_datetime('1970-03-01'),?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?end=df.index[-1],?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?dynamic=False,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?exog=df['FED Rate'][df.index >= pd.to_datetime('1970-03-01')])
in_sample_mean = in_sample_forecast.predicted_mean


# Confidence intervals for the in-sample data
in_sample_conf_int = in_sample_forecast.conf_int()


# Correct any negative lower confidence levels in the in-sample data
in_sample_conf_int[in_sample_conf_int.iloc[:, 0] < 0] = 0


# Confidence intervals for the out-of-sample data
conf_int = best_model_fit.get_forecast(steps=len(future_fed_rates), exog=future_fed_rates).conf_int()


# Correct any negative lower confidence levels in the out-of-sample data
conf_int[conf_int.iloc[:, 0] < 0] = 0

Python SARIMAX ML Result, Gold Price Forecasting 2023-2024

Python FB Prophet Code, Gold Price Forecasting 2023-2024

import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import ParameterSampler
from prophet import Prophet
from prophet.diagnostics import cross_validation, performance_metrics
import matplotlib.pyplot as plt
from scipy.stats import uniform


df = pd.read_csv('Historical_Data_XAU_USD.csv', index_col='Date', parse_dates=True)


# Reset index to have Date as a column
df.reset_index(inplace=True)


# Convert data types
df['Gold Price (oz/$)'] = df['Gold Price (oz/$)'].str.replace(',', '').astype(float)
df['FED Rate'] = df['FED Rate'].astype(float)


# Make Gold Price stationary
df['Gold Price (oz/$)_diff'] = df['Gold Price (oz/$)'].diff()
df.dropna(inplace=True)


# Create a scaler for Gold Price
price_scaler = MinMaxScaler()
df['Gold Price (oz/$)_diff_scaled'] = price_scaler.fit_transform(df[['Gold Price (oz/$)_diff']])


# Normalize the FED Rate
scaler = MinMaxScaler()
df['FED Rate'] = scaler.fit_transform(df[['FED Rate']])


# Prepare dataframe for Prophet
df_prophet = pd.DataFrame({
? ? 'ds': df['Date'],
? ? 'y': df['Gold Price (oz/$)_diff_scaled'].values,
? ? 'FED Rate': df['FED Rate'].values
})


# Define the parameter space
param_space = {
? ? 'changepoint_prior_scale': uniform(0.01, 0.5),
? ? 'seasonality_prior_scale': uniform(0.1, 10.0),
? ? 'changepoint_range': uniform(0.8, 0.1),
? ? 'seasonality_mode': ['additive', 'multiplicative'],
? ? 'daily_seasonality': [True, False],
? ? 'weekly_seasonality': [True, False],
? ? 'yearly_seasonality': [True, False]
}


# Define number of iterations for random search
n_iter = 50


# Perform random search
tuning_results = []
for params in ParameterSampler(param_space, n_iter=n_iter):
? ? model = Prophet(**params)
? ? model.add_regressor('FED Rate')
? ? model.fit(df_prophet)
? ??
? ? # Perform cross-validation
? ? cv_results = cross_validation(model, initial='3650 days', period='180 days', horizon='365 days')
? ? df_p = performance_metrics(cv_results)


? ? metrics = {}
? ? try:
? ? ? ? metrics['mape'] = df_p['mape'].mean()
? ? except KeyError:
? ? ? ? pass


? ? try:
? ? ? ? metrics['mae'] = df_p['mae'].mean()
? ? except KeyError:
? ? ? ? pass


? ? try:
? ? ? ? metrics['rmse'] = df_p['rmse'].mean()
? ? except KeyError:
? ? ? ? pass


? ? tuning_results.append({
? ? ? ? 'params': params,
? ? ? ? 'metrics': metrics,
? ? })


# Find the best parameters
best_params = min(tuning_results, key=lambda x: x['metrics'].get('mape', float('inf')))['params']




# Print the best parameters
print('Best Parameters:')
print(best_params)


# Use best parameters to predict future values
model = Prophet(**best_params)
model.add_regressor('FED Rate')
model.fit(df_prophet)


# Prepare future data
future_data = pd.DataFrame({
? ? 'ds': pd.date_range(start='2023-08-01', periods=17, freq='M'),
? ? 'FED Rate': [5.3, 5.3 ,5.3, 5.3, 5.3, 5.3, 5.6 , 5.6 , 5.6 , 5.6 , 5.6 , 5.6 , 5.3, 5.3, 5.3, 5.06 , 5.06]})


# Normalize FED Rate
future_data['FED Rate'] = scaler.transform(future_data[['FED Rate']])


# Predict
forecast = model.predict(future_data)


# Denormalize 'yhat'
forecast['yhat'] = price_scaler.inverse_transform(forecast[['yhat']])
forecast['yhat_lower'] = price_scaler.inverse_transform(forecast[['yhat_lower']])
forecast['yhat_upper'] = price_scaler.inverse_transform(forecast[['yhat_upper']])


# Undifference the predictions
last_value = df.iloc[-1]['Gold Price (oz/$)']
forecast['yhat'] = np.r_[last_value, forecast['yhat'].values].cumsum()[1:]
forecast['yhat_lower'] = np.r_[last_value, forecast['yhat_lower'].values].cumsum()[1:]
forecast['yhat_upper'] = np.r_[last_value, forecast['yhat_upper'].values].cumsum()[1:]


# Print predicted gold prices
print("Predicted Gold Prices:")
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']])d

Python FB Prophet Result, Gold Price Forecasting 2023-2024

Python LSTM DL Code, Gold Price Forecasting 2023-2024

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import mean_squared_error


df = pd.read_csv('Historical_Data_XAU_USD.csv', index_col='Date', parse_dates=True)


# Reset index to have Date as a column
df.reset_index(inplace=True)


# Engineer features from 'Date'
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month


# Convert data types
df['Gold Price (oz/$)'] = df['Gold Price (oz/$)'].str.replace(',', '').astype(float)
df['FED Rate'] = df['FED Rate'].astype(float)


# Create a separate scaler for Gold Price
price_scaler = MinMaxScaler()
df['Gold Price (oz/$)_diff'] = df['Gold Price (oz/$)'].diff()
df.dropna(inplace=True)
df['Gold Price (oz/$)_diff_scaled'] = price_scaler.fit_transform(df[['Gold Price (oz/$)_diff']])


# Normalize the other data
scaler = MinMaxScaler()
df[['Year', 'Month', 'FED Rate']] = scaler.fit_transform(df[['Year', 'Month', 'FED Rate']])


# Prepare the data for LSTM
window_size = 10
features_count = df[['Year', 'Month', 'FED Rate']].shape[1]


X = []
y = []
for i in range(len(df) - window_size):
? ? X.append(df.iloc[i:i+window_size, [3, 4, 5]].values)
? ? y.append(df.iloc[i+window_size]['Gold Price (oz/$)_diff_scaled'])
X = np.array(X)
y = np.array(y)


# Split the data into train and test sets
train_size = int(0.8 * len(X))
train_X, test_X = X[:train_size], X[train_size:]
train_y, test_y = y[:train_size], y[train_size:]


# Define the LSTM model
def create_model(neurons=50, optimizer='adam', dropout_rate=0.0):
? ? model = Sequential()
? ? model.add(LSTM(neurons, activation='relu', return_sequences=True, input_shape=(window_size, features_count)))
? ? model.add(Dropout(dropout_rate))
? ? model.add(LSTM(neurons, activation='relu'))
? ? model.add(Dropout(dropout_rate))
? ? model.add(Dense(1))
? ? model.compile(optimizer=optimizer, loss='mean_squared_error')
? ? return model


# Create KerasRegressor
model = KerasRegressor(build_fn=create_model, verbose=0)


# Define hyperparameters to tune
param_dist = {
? ? 'neurons': [50, 100, 150, 200],
? ? 'optimizer': ['adam', 'rmsprop'],
? ? 'dropout_rate': [0.0, 0.1, 0.2, 0.3],
? ? 'batch_size': [16, 32, 64, 128],
? ? 'epochs': [10, 20, 30]
}


# Perform random search
random = RandomizedSearchCV(estimator=model, param_distributions=param_dist, scoring='neg_mean_squared_error', n_iter=10)
random_result = random.fit(train_X, train_y)


# Get the best model
best_model = random_result.best_estimator_
best_params = random_result.best_params_
print("Best Model Parameters:")
print(best_params)


# Predictions on test set and MSE calculation
test_y_pred = random_result.predict(test_X)
print('Test MSE:', mean_squared_error(test_y, test_y_pred))


# denormalize the 'FED Rate'
df['FED Rate'] = scaler.inverse_transform(df[['Year', 'Month', 'FED Rate']])[:, 2]


# Prepare future data
future_data = np.array([
? ? [2023, 8, 5.3],
? ? [2023, 9, 5.3],
? ? [2023, 10, 5.3],
? ? [2023, 11, 5.3],
? ? [2023, 12, 5.3],
? ? [2024, 1, 5.6],
? ? [2024, 2, 5.6],
? ? [2024, 3, 5.6],
? ? [2024, 4, 5.6],
? ? [2024, 5, 5.6],
? ? [2024, 6, 5.6],
? ? [2024, 7, 5.3],
? ? [2024, 8, 5.3],
? ? [2024, 9, 5.3],
? ? [2024, 10, 5.06],
? ? [2024, 11, 5.06],
? ? [2024, 12, 5.06]
])


from datetime import timedelta


# Assuming that last_date is the last date in your original data
last_date = df['Date'].iloc[-1]


# Generate future dates
future_dates = pd.date_range(last_date, periods=len(future_data[:, 2]) + 1, freq='MS')[1:]


# Normalize future data with the same scaler used for training data
future_data_scaled = scaler.transform(future_data)


# Prepare sequences for prediction
future_X = []
for i in range(len(future_data_scaled)):
? ? sequence = np.concatenate((df.iloc[-window_size + i:, [3, 4, 5]].values, future_data_scaled[:i + 1, :]), axis=0)
? ? future_X.append(sequence[-window_size:])
future_X = np.array(future_X)


# Make predictions on the future data
future_y_pred = best_model.predict(future_X)


# Transform future_y_pred to original scale
future_y_pred = price_scaler.inverse_transform(future_y_pred.reshape(-1, 1))


# Undifference the predictions
last_value = df.iloc[-1]['Gold Price (oz/$)']
future_y_pred = np.r_[last_value, future_y_pred.flatten()].cumsum()[1:]


# Print predicted gold prices
print("Predicted Gold Prices:")
print(future_y_pred)

Python LSTM DL Result, Gold Price Forecasting 2023-2024

WTI Crude Oil Price Forecasting 2023-2024 | Python SARIMAX , Facebook Prophet and LSTM Deep Learning Modelling

Python SARIMAX ML Code, Crude Oil Forecasting 2023-2024

Lets first visualise input dataframe

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.statespace.sarimax import SARIMAX
import warnings
from sklearn.model_selection import TimeSeriesSplit
from sklearn.model_selection import ParameterSampler
import matplotlib.cm as cm
import matplotlib.gridspec as gridspec
warnings.filterwarnings('ignore')? # Ignore warning messages


# Load and preprocess the data
df = pd.read_csv('Crude_Oil_WTI.csv')
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)


# Define a range of hyperparameters to test
param_dist = {"order": [(p, d, q) for p in range(3) for d in [0, 1, 2] for q in range(3)],
? ? "seasonal_order": [(P, D, Q, 12) for P in range(3) for D in [0, 1, 2] for Q in range(3)]}


# Initialize a random parameter sampler
param_list = list(ParameterSampler(param_dist, n_iter=50))


# Define Time Series cross-validator
tscv = TimeSeriesSplit(n_splits=5)


best_aic = np.inf
best_bic = np.inf
best_params = None


# Cross-validation and hyperparameter tuning
for params in param_list:
? ? aic_scores = []
? ? bic_scores = []


? ? # Cross-validation loop
? ? for train_index, test_index in tscv.split(df):
? ? ? ? cv_train, cv_test = df.iloc[train_index], df.iloc[test_index]


? ? ? ? # Fit SARIMAX model
? ? ? ? sarima = SARIMAX(cv_train['Crude_Oil_WTI_($/bbl)'], exog=cv_train['US Dollar Index'],
? ? ? ? ? ? ? ? ? ? ? ? ?order=params['order'], seasonal_order=params['seasonal_order'])
? ? ? ? sarima_results = sarima.fit()


? ? ? ? # Save AIC and BIC scores
? ? ? ? aic_scores.append(sarima_results.aic)
? ? ? ? bic_scores.append(sarima_results.bic)


? ? # Average AIC and BIC scores
? ? mean_aic = np.mean(aic_scores)
? ? mean_bic = np.mean(bic_scores)


? ? # Update best parameters if current model has a lower AIC and BIC
? ? if mean_aic < best_aic and mean_bic < best_bic:
? ? ? ? best_aic = mean_aic
? ? ? ? best_bic = mean_bic
? ? ? ? best_params = params


# Print the best parameters and the corresponding AIC and BIC
print('Best parameters:', best_params)
print('Best AIC:', best_aic)
print('Best BIC:', best_bic)


# Fit the best model
best_model = SARIMAX(df['Crude_Oil_WTI_($/bbl)'], exog=df['US Dollar Index'],
? ? ? ? ? ? ? ? ? ? ?order=best_params['order'], seasonal_order=best_params['seasonal_order'])
best_model_fit = best_model.fit()


# Define future US Dollar Index
future_us_dollar_index = [101.64, 103.06, 103.66, 104.33, 103.57, 102.59, 99.42, 99.48, 101.97, 102.85, 103.33, 102.4,?
? ? ? ? ? ? ? ? ? ? ? ? ? 101.23, 100.08, 99.78, 98.72, 97.62, 98.34]


# Create forecast object
forecast_object = best_model_fit.get_forecast(steps=18, exog=np.array(future_us_dollar_index).reshape(-1, 1),alpha=0.2)






# Extract prediction mean
mean = forecast_object.predicted_mean


# Extract the forecast dates
datescf = pd.date_range(start=df.index[-1], periods=18, freq='MS')


# Predictions on historical data
in_sample_forecast = best_model_fit.get_prediction(start=pd.to_datetime('1983-05-01'),?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?end=df.index[-1],?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?dynamic=False,?
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?exog=df['US Dollar Index'][df.index >= pd.to_datetime('1983-04-01')])


in_sample_mean = in_sample_forecast.predicted_mean


# Replace negative predicted values with 0
in_sample_mean[in_sample_mean < 0] = 0




# Confidence intervals for the in-sample data
in_sample_conf_int = in_sample_forecast.conf_int()


# Correct any negative lower confidence levels in the in-sample data
in_sample_conf_int[in_sample_conf_int.iloc[:, 0] < 0] = 0


# Confidence intervals for the out-of-sample data
conf_int = best_model_fit.get_forecast(steps=len(future_us_dollar_index), exog=future_us_dollar_index).conf_int()


# Correct any negative lower confidence levels in the out-of-sample data
conf_int[conf_int.iloc[:, 0] < 0] = 0

Python SARIMAX ML Result, Crude Oil Forecasting 2023-2024

Python FB Prophet Code, Crude Oil Forecasting 2023-2024

import pandas as pd
import numpy as np
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import ParameterSampler
from prophet import Prophet
from prophet.diagnostics import cross_validation, performance_metrics
import matplotlib.pyplot as plt
from scipy.stats import uniform


df = pd.read_csv('Crude_Oil_WTI.csv', index_col='Date', parse_dates=True)


# Reset index to have Date as a column
df.reset_index(inplace=True)




# Make Crude Oil Price stationary
df['Crude_Oil_WTI_($/bbl)_diff'] = df['Crude_Oil_WTI_($/bbl)'].diff()
df.dropna(inplace=True)


# Create a scaler for Crude Oil Price
price_scaler = MinMaxScaler()
df['Crude_Oil_WTI_($/bbl)_diff_scaled'] = price_scaler.fit_transform(df[['Crude_Oil_WTI_($/bbl)_diff']])


# Normalize the US Dollar Index
scaler = MinMaxScaler()
df['US Dollar Index'] = scaler.fit_transform(df[['US Dollar Index']])


# Prepare dataframe for Prophet
df_prophet = pd.DataFrame({
? ? 'ds': df['Date'],
? ? 'y': df['Crude_Oil_WTI_($/bbl)_diff_scaled'].values,
? ? 'US Dollar Index': df['US Dollar Index'].values
})


# Define the parameter space
param_space = {
? ? 'changepoint_prior_scale': uniform(0.01, 0.5),
? ? 'seasonality_prior_scale': uniform(0.1, 10.0),
? ? 'changepoint_range': uniform(0.8, 0.1),
? ? 'seasonality_mode': ['additive', 'multiplicative'],
? ? 'daily_seasonality': [True, False],
? ? 'weekly_seasonality': [True, False],
? ? 'yearly_seasonality': [True, False]
}


# Define number of iterations for random search
n_iter = 50


# Perform random search
tuning_results = []
for params in ParameterSampler(param_space, n_iter=n_iter):
? ? model = Prophet(**params)
? ? model.add_regressor('US Dollar Index')
? ? model.fit(df_prophet)
? ??
? ? # Perform cross-validation
? ? cv_results = cross_validation(model, initial='3650 days', period='180 days', horizon='365 days')
? ? df_p = performance_metrics(cv_results)


? ? metrics = {}
? ? try:
? ? ? ? metrics['mape'] = df_p['mape'].mean()
? ? except KeyError:
? ? ? ? pass


? ? try:
? ? ? ? metrics['mae'] = df_p['mae'].mean()
? ? except KeyError:
? ? ? ? pass


? ? try:
? ? ? ? metrics['rmse'] = df_p['rmse'].mean()
? ? except KeyError:
? ? ? ? pass


? ? tuning_results.append({
? ? ? ? 'params': params,
? ? ? ? 'metrics': metrics,
? ? })


# Find the best parameters
best_params = min(tuning_results, key=lambda x: x['metrics'].get('mape', float('inf')))['params']




# Print the best parameters
print('Best Parameters:')
print(best_params)


# Use best parameters to predict future values
model = Prophet(**best_params)
model.add_regressor('US Dollar Index')
model.fit(df_prophet)


# Prepare future data
future_data = pd.DataFrame({
? ? 'ds': pd.date_range(start='2023-08-01', periods=17, freq='M'),
? ? 'US Dollar Index': [103.06, 103.66, 104.33, 103.57, 102.59, 99.42, 99.48, 101.97, 102.85, 103.33, 102.4,?
? ? ? ? ? ? ? ? ? ? ? ? ? 101.23, 100.08, 99.78, 98.72, 97.62, 98.34]})


# Normalize US Dollar Index
future_data['US Dollar Index'] = scaler.transform(future_data[['US Dollar Index']])


# Predict
forecast = model.predict(future_data)


# Denormalize 'yhat'
forecast['yhat'] = price_scaler.inverse_transform(forecast[['yhat']])
forecast['yhat_lower'] = price_scaler.inverse_transform(forecast[['yhat_lower']])
forecast['yhat_upper'] = price_scaler.inverse_transform(forecast[['yhat_upper']])


# Undifference the predictions
last_value = df.iloc[-1]['Crude_Oil_WTI_($/bbl)']
forecast['yhat'] = np.r_[last_value, forecast['yhat'].values].cumsum()[1:]
forecast['yhat_lower'] = np.r_[last_value, forecast['yhat_lower'].values].cumsum()[1:]
forecast['yhat_upper'] = np.r_[last_value, forecast['yhat_upper'].values].cumsum()[1:]


# Print predicted Crude Oil prices
print("Predicted Crude Oil Prices:")
print(forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']])

d

Python FB Prophet Result, Crude Oil Forecasting 2023-2024

Python LSTM DL Code, Crude Oil Forecasting 2023-2024

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import MinMaxScaler
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor
from sklearn.model_selection import RandomizedSearchCV
from sklearn.metrics import mean_squared_error


df = pd.read_csv('Crude_Oil_WTI.csv', index_col='Date', parse_dates=True)


# Reset index to have Date as a column
df.reset_index(inplace=True)


# Engineer features from 'Date'
df['Year'] = df['Date'].dt.year
df['Month'] = df['Date'].dt.month


# Create a separate scaler for Crude Oil Price
price_scaler = MinMaxScaler()
df['Crude_Oil_WTI_($/bbl)_diff'] = df['Crude_Oil_WTI_($/bbl)'].diff()
df.dropna(inplace=True)
df['Crude_Oil_WTI_($/bbl)_diff_scaled'] = price_scaler.fit_transform(df[['Crude_Oil_WTI_($/bbl)_diff']])


# Normalize the other data
scaler = MinMaxScaler()
df[['Year', 'Month', 'US Dollar Index']] = scaler.fit_transform(df[['Year', 'Month', 'US Dollar Index']])


# Prepare the data for LSTM
window_size = 10
features_count = df[['Year', 'Month', 'US Dollar Index']].shape[1]


X = []
y = []
for i in range(len(df) - window_size):
? ? X.append(df.iloc[i:i+window_size, [3, 4, 5]].values)
? ? y.append(df.iloc[i+window_size]['Crude_Oil_WTI_($/bbl)_diff_scaled'])
X = np.array(X)
y = np.array(y)


# Split the data into train and test sets
train_size = int(0.8 * len(X))
train_X, test_X = X[:train_size], X[train_size:]
train_y, test_y = y[:train_size], y[train_size:]


# Define the LSTM model
def create_model(neurons=50, optimizer='adam', dropout_rate=0.0):
? ? model = Sequential()
? ? model.add(LSTM(neurons, activation='relu', return_sequences=True, input_shape=(window_size, features_count)))
? ? model.add(Dropout(dropout_rate))
? ? model.add(LSTM(neurons, activation='relu'))
? ? model.add(Dropout(dropout_rate))
? ? model.add(Dense(1))
? ? model.compile(optimizer=optimizer, loss='mean_squared_error')
? ? return model


# Create KerasRegressor
model = KerasRegressor(build_fn=create_model, verbose=0)


# Define hyperparameters to tune
param_dist = {
? ? 'neurons': [50, 100, 150, 200],
? ? 'optimizer': ['adam', 'rmsprop'],
? ? 'dropout_rate': [0.0, 0.1, 0.2, 0.3],
? ? 'batch_size': [16, 32, 64, 128],
? ? 'epochs': [10, 20, 30]
}


# Perform random search
random = RandomizedSearchCV(estimator=model, param_distributions=param_dist, scoring='neg_mean_squared_error', n_iter=10)
random_result = random.fit(train_X, train_y)


# Get the best model
best_model = random_result.best_estimator_
best_params = random_result.best_params_
print("Best Model Parameters:")
print(best_params)


# Predictions on test set and MSE calculation
test_y_pred = random_result.predict(test_X)
print('Test MSE:', mean_squared_error(test_y, test_y_pred))


# denormalize the 'US Dollar Index'
df['US Dollar Index'] = scaler.inverse_transform(df[['Year', 'Month', 'US Dollar Index']])[:, 2]


# Prepare future data
future_data = np.array([
? ? [2023, 8, 5.3],
? ? [2023, 9, 5.3],
? ? [2023, 10, 5.3],
? ? [2023, 11, 5.3],
? ? [2023, 12, 5.3],
? ? [2024, 1, 5.6],
? ? [2024, 2, 5.6],
? ? [2024, 3, 5.6],
? ? [2024, 4, 5.6],
? ? [2024, 5, 5.6],
? ? [2024, 6, 5.6],
? ? [2024, 7, 5.3],
? ? [2024, 8, 5.3],
? ? [2024, 9, 5.3],
? ? [2024, 10, 5.06],
? ? [2024, 11, 5.06],
? ? [2024, 12, 5.06]
])


future_us_dollar_index = np.array([103.06, 103.66, 104.33, 103.57, 102.59, 99.42, 99.48, 101.97, 102.85, 103.33, 102.4,?
? ? ? ? ? ? ? ? ? ? ? ? ? 101.23, 100.08, 99.78, 98.72, 97.62, 98.34])


# Replace the third column with the future_us_dollar_index
future_data[:, 2] = future_us_dollar_index


from datetime import timedelta


# Assuming that last_date is the last date in your original data
last_date = df['Date'].iloc[-1]


# Generate future dates
future_dates = pd.date_range(last_date, periods=len(future_data[:, 2]) + 1, freq='MS')[1:]


# Normalize future data with the same scaler used for training data
future_data_scaled = scaler.transform(future_data)


# Prepare sequences for prediction
future_X = []
for i in range(len(future_data_scaled)):
? ? sequence = np.concatenate((df.iloc[-window_size + i:, [3, 4, 5]].values, future_data_scaled[:i + 1, :]), axis=0)
? ? future_X.append(sequence[-window_size:])
future_X = np.array(future_X)


# Make predictions on the future data
future_y_pred = best_model.predict(future_X)


# Transform future_y_pred to original scale
future_y_pred = price_scaler.inverse_transform(future_y_pred.reshape(-1, 1))


# Undifference the predictions
last_value = df.iloc[-1]['Crude_Oil_WTI_($/bbl)']
future_y_pred = np.r_[last_value, future_y_pred.flatten()].cumsum()[1:]


# Print predicted Crude Oil prices
print("Predicted Crude Oil Prices:")
print(future_y_pred)


# Concatenate the historical and predicted dates
total_dates = np.concatenate((df['Date'].values, future_dates))

Python LSTM DL Result, Crude Oil Forecasting 2023-2024

Author Notes: Beyond the Text

This article is designed with a non-technical audience in mind, though there are many aspects of time series forecasting still untouched in the text
The provided code offers a high-level overview and may require further fine-tuning. Moreover, some lapses in the code have been intentionally included to maintain confidentiality until the finalized article is published in a research paper.
The code may need further optimization in line with machine learning best practices and specific commodity market insights. For instance, the SARIMAX model used for gold price forecasting reflects d+D=3. However, it's typically recommended that d + D should range between 0 and 2.
In the current example, only one exogenous variable is considered for each commodity. For gold prices, we used the Federal Reserve (FED) rate. However, accurate modelling would require a broader list of exogenous variables, including, but not limited to, global economic conditions, geopolitical events and uncertainty, investor sentiment and market speculation, inflation and deflation expectations, gold supply and production, and gold mining supply data.
Similarly, for the forecasting of WTI crude oil prices, we considered only the Dollar Index as an exogenous variable. But accurate modelling would again require multiple exogenous variables, such as global economic indicators, geopolitical events and instability, global oil supply and demand, OPEC decisions and production cuts, energy policy and regulations, and weather conditions and natural disasters.
I still feel that the most significant exogenious variable for even WTI crude oil price forecasting is also FED rate, as its the one that further decides the faith of dollar index as well, that petro-dollar dilemma diverted my attention
Please remember that all trading is at your own risk. This article is intended for educational purposes only. The forecast prices are not yet optimized and, even if we present the most accurate model, it still needs to be updated regularly for precise forecasting. After all, it's the event-driven sentiments that rules financial markets.

Did you know? FinTech Revolution is radically reshaping banking!

Ihsan ULLAH

NUML University Islamabad

6 个月

Is there any Data Analyst. I have a paid task

Nashat Jumaah Omar

Production F.A Engineer & Instructor | Trained +600 Petroleum Engineers on Python |

7 个月

This is a Gem..

2 次回应

Mahdi Amina, PhD

Analytics & AI Specialist

9 个月

I was wondering where is the plotting code

1 次回应

Freek Hobbenschot

MSc. Econometrics

1 年

Great article, thanks for sharing!

2 次回应

Adam Smith

Virtual Assistant

1 年

Oil is really voliatile since March 2021. I follow some Oil Traders in Dubai who seem to know what they're talking about, check out their analysis here? https://www.youtube.com/watch?v=7ybikzE0m-U