Forecasting Time Series (stock price)- ARIMA Model Using Python
Vaidyanathan Ravichandran
Professor of Practice (Finance) - Business Schools , Bangalore
Time Series Forecasting – ARIMA Model - Introduction
Time series forecasting is a method used to predict future values based on previously observed values in a time series data set. A time series is a sequence of data points typically measured at successive points in time, often at uniform intervals.
Time series forecasting involves using statistical models and techniques to analyze past data and make informed predictions about future trends. It is widely used in various fields such as finance, economics, supply chain management, and meteorology for predicting future events and making data-driven decisions.
Traditional and Advanced Methods
Advanced Machine Learning Methods:
Steps in Time Series Forecasting
This article will discuss time series forecasting using the ARIMA model.
ARIMA Model Overview
ARIMA stands for Auto-Regressive Integrated Moving Average and represents a cornerstone in time series forecasting. It is a statistical method that has gained immense popularity due to its efficacy in handling various standard temporal structures present in time series data. ARIMA models are based on the idea that the information in past values of the time series can alone be used to predict future values.
Exponential smoothing and ARIMA models are the two most widely used approaches to time series forecasting. Exponential smoothing models are based on a description of the trend and seasonality in the data, while ARIMA models aim to describe the autocorrelations in the data.
Assumptions and Parameters of the ARIMA Model
Major Assumption:
Stationarity: The time series has statistical properties that remain constant across time.
Components/Parameters of ARIMA Model:
AR (Autoregression): The dependent relationship between an observation and its preceding observations. p: The lag order.
I (Integrated): Differencing of raw observations to achieve stationarity. d: Degree of differencing.
MA (Moving Average): The relationship between an observation and a residual error from a moving average model. q: Order of the moving average.
Non-Seasonal Data
Non-seasonal time series data do not exhibit regular and predictable patterns that repeat over a specific period. Examples include stock prices, which may show trends or cycles but do not typically follow a seasonal pattern.
Characteristics:
No Regular Repetition
Trends and Cycles
Stationarity
Example of Non-Seasonal Data:
White Noise
Autocorrelation: A key feature of white noise is the absence of autocorrelation. This means there's no correlation between a data point and its past or future values. The autocorrelation function (ACF) of a white noise series should be close to zero at all lags (time differences) except for lag 0 (correlation with itself).
?White noise is a fundamental concept in time series analysis, representing purely random fluctuations. It is crucial for diagnosing the adequacy of time series models and ensuring that only the random noise remains in the residuals after fitting a model. Understanding and identifying white noise helps in building more accurate and reliable time series models.
?In essence, white noise in time series represents a purely random component, providing a benchmark for how well a model explains the data's variability.
?
Below illustration of White Noise in Time Series:
?White Noise Generation: We will generate 1000 samples from a normal distribution with mean 0 and standard deviation 1.
import numpy as np
import matplotlib.pyplot as plt
# Generate white noise
np.random.seed(0)
white_noise = np.random.normal(0, 1, 1000)
# Plot the white noise
plt.figure(figsize=(10, 6))
plt.plot(white_noise)
plt.title('White Noise')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()
# Plot the autocorrelation function (ACF)
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(white_noise, lags=40)
plt.title('Autocorrelation Function (ACF) of White Noise')
plt.show()
ARIMA Model Steps
领英推荐
Python code to download stock data from Yahoo finance.
import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
# Function to fetch and save stock data
def fetch_stock_data(ticker, start_date, end_date, filename):
# Fetch data from Yahoo Finance
stock_data = yf.download(ticker, start=start_date, end=end_date)
# Save data to CSV
stock_data.to_csv(filename)
print(f"Data saved to {filename}")
# Get user inputs
ticker = input("Enter the ticker symbol of the Nifty 50 stock (e.g., 'RELIANCE.NS'): ")
start_date = input("Enter the start date (YYYY-MM-DD): ")
end_date = input("Enter the end date (YYYY-MM-DD): ")
filename = input("Enter the filename to save the data (e.g., 'stock_data.csv'): ")
# Validate date format
try:
datetime.strptime(start_date, '%Y-%m-%d')
datetime.strptime(end_date, '%Y-%m-%d')
except ValueError:
print("Invalid date format. Please enter the date in YYYY-MM-DD format.")
exit()
# Fetch and save stock data
fetch_stock_data(ticker, start_date, end_date, filename)
# Read the saved CSV file
stock_data = pd.read_csv(filename, index_col='Date', parse_dates=True)
# Plot the closing prices
plt.figure(figsize=(12, 6))
plt.plot(stock_data['Close'], label=f'Closing Prices of {ticker}')
plt.title(f'Closing Prices of {ticker} from {start_date} to {end_date}')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.legend()
plt.grid(True)
plt.show()
Step 2: Test for Stationarity – ADF
Method 1 : By observing, we can say clearly the above series is non-stationary.
Method 2 : ADF Test
# Perform ADF test on the closing prices
print("\nADF Test Result for Closing Prices:")
adf_test(df['Close'])
???
Analysis:
The p-value is greater than 0.05, so we fail to reject the null hypothesis. The time series is non-stationary
?Analysis:
The p-value is greater than 0.05, so we fail to reject the null hypothesis. The time series is non-stationary.
Step 2 : Transforming non-stationary series into stationary
Some of the common approaches for making a time series stationary.
?Differencing:
?Log Transformation:
?Seasonal Decomposition:
?Log Difference:
?Remember that the choice of method depends on your specific dataset and the patterns you observe. Experiment with these approaches to achieve stationarity!
?The following Python code achieves the objective of transforming data in to stationary series
# Function to read the CSV file, apply transformations, and fit ARIMA model
def transform_and_test_stationarity(filename):
# Read the CSV file
df = pd.read_csv(filename)
# Parse the date column (assuming the date column is named 'Date')
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
# Extract the closing prices
closing_prices = df['Close']
# Original Series
plot_series(closing_prices, 'Original Closing Prices')
print("\nADF Test Result for Original Closing Prices:")
adf_test(closing_prices)
# 1. Differencing
first_difference = closing_prices.diff().dropna()
plot_series(first_difference, 'First Difference of Closing Prices')
print("\nADF Test Result for First Difference of Closing Prices:")
adf_test(first_difference)
# 2. Log Transformation
log_transformation = np.log(closing_prices).dropna()
plot_series(log_transformation, 'Log Transformation of Closing Prices')
print("\nADF Test Result for Log Transformation of Closing Prices:")
adf_test(log_transformation)
# 3. Seasonal Decomposition
result = seasonal_decompose(closing_prices, model='additive', period=30)
seasonal_adjusted = closing_prices - result.seasonal
plot_series(seasonal_adjusted.dropna(), 'Seasonally Adjusted Closing Prices')
print("\nADF Test Result for Seasonally Adjusted Closing Prices:")
adf_test(seasonal_adjusted.dropna())
# 4. Log Difference
log_difference = log_transformation.diff().dropna()
plot_series(log_difference, 'Log Difference of Closing Prices')
print("\nADF Test Result for Log Difference of Closing Prices:")
adf_test(log_difference)
Next Step is Fit the ARIMA Model :
?Use the below ?- auto_arima() python code to autofit the ARIMA model
# Auto-fit ARIMA model on log difference (or any stationary series)
print("\nFitting ARIMA model on Log Difference of Closing Prices:")
model = auto_arima(log_difference, seasonal=False, trace=True, error_action='ignore', suppress_warnings=True)
print(model.summary())
The final Step is to forecast values and plot the original vs forecasted values
# Forecast future values
forecast_periods = 30
forecast = model.predict(n_periods=forecast_periods)
# Create a series for the forecast
forecast_index = pd.date_range(start=log_difference.index[-1], periods=forecast_periods + 1, freq='D', inclusive='right')[1:]
forecast_series = pd.Series(forecast, index=forecast_index)
# Plot the original and forecasted values
plt.figure(figsize=(10, 5))
plt.plot(log_difference, label='Log Difference of Closing Prices')
plt.plot(forecast_series, label='Forecasted Values', color='red')
plt.title('Log Difference of Closing Prices with Forecasted Values')
plt.xlabel('Date')
plt.ylabel('Log Difference of Closing Price')
plt.legend()
plt.grid(True)
plt.show()
The residual errors seem fine with near-zero mean and uniform variance
Full code for the ARIMA Model for time series forecasting using livestock prices of stock from NSE India
??import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from pmdarima import auto_arima
# Function to perform ADF test and print the result with analysis
def adf_test(timeseries):
result = adfuller(timeseries)
adf_statistic = result[0]
p_value = result[1]
critical_values = result[4]
print('ADF Statistic: %f' % adf_statistic)
print('p-value: %f' % p_value)
print('Critical Values:')
for key, value in critical_values.items():
print('\t%s: %.3f' % (key, value))
if p_value < 0.05:
print("The p-value is less than 0.05, so we reject the null hypothesis. The time series is stationary.")
else:
print("The p-value is greater than 0.05, so we fail to reject the null hypothesis. The time series is non-stationary.")
for key, value in critical_values.items():
if adf_statistic < value:
print(f"The ADF statistic is less than the {key} critical value. We reject the null hypothesis at the {key} level. The time series is stationary.")
else:
print(f"The ADF statistic is greater than the {key} critical value. We fail to reject the null hypothesis at the {key} level. The time series is non-stationary.")
# Function to plot the time series
def plot_series(timeseries, title):
plt.figure(figsize=(10, 5))
plt.plot(timeseries, label=title)
plt.title(title)
plt.xlabel('Date')
plt.ylabel('Value')
plt.legend()
plt.grid(True)
plt.show()
# Function to read the CSV file, apply transformations, and fit ARIMA model
def transform_and_test_stationarity(filename):
# Read the CSV file
df = pd.read_csv(filename)
# Parse the date column (assuming the date column is named 'Date')
df['Date'] = pd.to_datetime(df['Date'])
df.set_index('Date', inplace=True)
# Extract the closing prices
closing_prices = df['Close']
# Original Series
plot_series(closing_prices, 'Original Closing Prices')
print("\nADF Test Result for Original Closing Prices:")
adf_test(closing_prices)
# 1. Differencing
first_difference = closing_prices.diff().dropna()
plot_series(first_difference, 'First Difference of Closing Prices')
print("\nADF Test Result for First Difference of Closing Prices:")
adf_test(first_difference)
# 2. Log Transformation
log_transformation = np.log(closing_prices).dropna()
plot_series(log_transformation, 'Log Transformation of Closing Prices')
print("\nADF Test Result for Log Transformation of Closing Prices:")
adf_test(log_transformation)
# 3. Seasonal Decomposition
result = seasonal_decompose(closing_prices, model='additive', period=30)
seasonal_adjusted = closing_prices - result.seasonal
plot_series(seasonal_adjusted.dropna(), 'Seasonally Adjusted Closing Prices')
print("\nADF Test Result for Seasonally Adjusted Closing Prices:")
adf_test(seasonal_adjusted.dropna())
# 4. Log Difference
log_difference = log_transformation.diff().dropna()
plot_series(log_difference, 'Log Difference of Closing Prices')
print("\nADF Test Result for Log Difference of Closing Prices:")
adf_test(log_difference)
# Auto-fit ARIMA model on log difference (or any stationary series)
print("\nFitting ARIMA model on Log Difference of Closing Prices:")
model = auto_arima(log_difference, seasonal=False, trace=True, error_action='ignore', suppress_warnings=True)
print(model.summary())
# Forecast future values
forecast_periods = 30
forecast = model.predict(n_periods=forecast_periods)
# Create a series for the forecast
forecast_index = pd.date_range(start=log_difference.index[-1], periods=forecast_periods + 1, freq='D', inclusive='right')[1:]
forecast_series = pd.Series(forecast, index=forecast_index)
# Plot the original and forecasted values
plt.figure(figsize=(10, 5))
plt.plot(log_difference, label='Log Difference of Closing Prices')
plt.plot(forecast_series, label='Forecasted Values', color='red')
plt.title('Log Difference of Closing Prices with Forecasted Values')
plt.xlabel('Date')
plt.ylabel('Log Difference of Closing Price')
plt.legend()
plt.grid(True)
plt.show()
# Example usage
filename = input("Enter the filename of the CSV file to read (e.g., 'stock_data.csv'): ")
transform_and_test_stationarity(filename)
Business Intelligence Analyst | Ex ABFRL | Process optimization | MBA candidate
10 个月Awesome explanation sir, and when it comes to accuracy advanced models like LSTM, Transformer models, and NeuralProphet tend to offer higher accuracy, but they need more data and computational power. For time series with changing volatility, such as financial data, GARCH models are particularly effective. If ease of use is a priority, tools like Prophet and NeuralProphet are designed to be user-friendly and require minimal tuning.
Narsee Monjee Institute of Management Studies (NMIMS)- Bengaluru
10 个月Dear Sir, you have explained the concept very simply and beautifully. I am really happy to have read this article, and it is useful to many of us! Thank you!
Interned at Dell Technologies | Vice President | Insignia - The Alumni Committee at SVKM's NMIMS, Bangalore (NMIMS) | Ex - Infosys
10 个月Very useful information! Forecasting using time series model is amazing! ARIMA is great way to do so. To add on we also have a FB prophet Model which automatically detects the trends and seasonality on daily, weekly and yearly basis and mitigates the impact of outliers. It also allows for the inclusion of holidays and important events that can spike the stock price!