Forecasting Time Series (stock price)- ARIMA Model Using Python
USING LIVE STOCK DATA

Forecasting Time Series (stock price)- ARIMA Model Using Python

Time Series Forecasting – ARIMA Model - Introduction

Time series forecasting is a method used to predict future values based on previously observed values in a time series data set. A time series is a sequence of data points typically measured at successive points in time, often at uniform intervals.

Time series forecasting involves using statistical models and techniques to analyze past data and make informed predictions about future trends. It is widely used in various fields such as finance, economics, supply chain management, and meteorology for predicting future events and making data-driven decisions.

Traditional and Advanced Methods

  • Traditional Methods:
  • Simple Moving Average
  • Weighted Moving Average
  • Exponential Smoothing Average

Advanced Machine Learning Methods:

  • Linear Regression Models: Can be used for forecasting by incorporating lagged values and other predictors.
  • Tree-based Methods: Random forests, gradient boosting machines, etc., for capturing complex patterns.
  • Neural Networks: Models like Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks designed to handle sequential data.

Steps in Time Series Forecasting

  1. Data Preparation: Collecting and cleaning the data, handling missing values, and transforming variables if necessary.
  2. Exploratory Data Analysis (EDA): Visualizing the data to identify trends, seasonality, and other patterns.
  3. Model Selection: Choosing appropriate forecasting methods based on the data characteristics.
  4. Model Fitting: Training the model on historical data.
  5. Model Evaluation: Assessing the model’s performance using metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), etc.
  6. Forecasting: Making predictions using the trained model and assessing the forecast accuracy.
  7. Updating the Model: Regularly updating the model as new data becomes available.

This article will discuss time series forecasting using the ARIMA model.

ARIMA Model Overview

ARIMA stands for Auto-Regressive Integrated Moving Average and represents a cornerstone in time series forecasting. It is a statistical method that has gained immense popularity due to its efficacy in handling various standard temporal structures present in time series data. ARIMA models are based on the idea that the information in past values of the time series can alone be used to predict future values.

Exponential smoothing and ARIMA models are the two most widely used approaches to time series forecasting. Exponential smoothing models are based on a description of the trend and seasonality in the data, while ARIMA models aim to describe the autocorrelations in the data.

Assumptions and Parameters of the ARIMA Model

Major Assumption:

Stationarity: The time series has statistical properties that remain constant across time.

Components/Parameters of ARIMA Model:

AR (Autoregression): The dependent relationship between an observation and its preceding observations. p: The lag order.

I (Integrated): Differencing of raw observations to achieve stationarity. d: Degree of differencing.

MA (Moving Average): The relationship between an observation and a residual error from a moving average model. q: Order of the moving average.

Non-Seasonal Data

Non-seasonal time series data do not exhibit regular and predictable patterns that repeat over a specific period. Examples include stock prices, which may show trends or cycles but do not typically follow a seasonal pattern.

Characteristics:

No Regular Repetition

Trends and Cycles

Stationarity

Example of Non-Seasonal Data:


White Noise

Autocorrelation: A key feature of white noise is the absence of autocorrelation. This means there's no correlation between a data point and its past or future values. The autocorrelation function (ACF) of a white noise series should be close to zero at all lags (time differences) except for lag 0 (correlation with itself).

?White noise is a fundamental concept in time series analysis, representing purely random fluctuations. It is crucial for diagnosing the adequacy of time series models and ensuring that only the random noise remains in the residuals after fitting a model. Understanding and identifying white noise helps in building more accurate and reliable time series models.

?In essence, white noise in time series represents a purely random component, providing a benchmark for how well a model explains the data's variability.

?

Below illustration of White Noise in Time Series:

?White Noise Generation: We will generate 1000 samples from a normal distribution with mean 0 and standard deviation 1.

import numpy as np
import matplotlib.pyplot as plt

# Generate white noise
np.random.seed(0)
white_noise = np.random.normal(0, 1, 1000)

# Plot the white noise
plt.figure(figsize=(10, 6))
plt.plot(white_noise)
plt.title('White Noise')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()

# Plot the autocorrelation function (ACF)
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(white_noise, lags=40)
plt.title('Autocorrelation Function (ACF) of White Noise')
plt.show()
        


ARIMA Model Steps

Python code to download stock data from Yahoo finance.

import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime

# Function to fetch and save stock data
def fetch_stock_data(ticker, start_date, end_date, filename):
    # Fetch data from Yahoo Finance
    stock_data = yf.download(ticker, start=start_date, end=end_date)
    
    # Save data to CSV
    stock_data.to_csv(filename)
    print(f"Data saved to {filename}")

# Get user inputs
ticker = input("Enter the ticker symbol of the Nifty 50 stock (e.g., 'RELIANCE.NS'): ")
start_date = input("Enter the start date (YYYY-MM-DD): ")
end_date = input("Enter the end date (YYYY-MM-DD): ")
filename = input("Enter the filename to save the data (e.g., 'stock_data.csv'): ")

# Validate date format
try:
    datetime.strptime(start_date, '%Y-%m-%d')
    datetime.strptime(end_date, '%Y-%m-%d')
except ValueError:
    print("Invalid date format. Please enter the date in YYYY-MM-DD format.")
    exit()

# Fetch and save stock data
fetch_stock_data(ticker, start_date, end_date, filename)

# Read the saved CSV file
stock_data = pd.read_csv(filename, index_col='Date', parse_dates=True)

# Plot the closing prices
plt.figure(figsize=(12, 6))
plt.plot(stock_data['Close'], label=f'Closing Prices of {ticker}')
plt.title(f'Closing Prices of {ticker} from {start_date} to {end_date}')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.legend()
plt.grid(True)
plt.show()
        


Step 2: Test for Stationarity – ADF

Method 1 : By observing, we can say clearly the above series is non-stationary.

Method 2 : ADF Test

   # Perform ADF test on the closing prices
    print("\nADF Test Result for Closing Prices:")
    adf_test(df['Close'])
???        

Analysis:

The p-value is greater than 0.05, so we fail to reject the null hypothesis. The time series is non-stationary

?Analysis:

The p-value is greater than 0.05, so we fail to reject the null hypothesis. The time series is non-stationary.

Step 2 : Transforming non-stationary series into stationary

Some of the common approaches for making a time series stationary.

?Differencing:

  • Differencing involves computing the difference between consecutive observations. First-order differencing (Yt - Yt-1) helps remove trends and seasonality. By subtracting adjacent values, you create a new series where the trend component is minimized. Differencing can be performed multiple times if needed.

?Log Transformation:

  • Taking the natural logarithm of the data can stabilize variance. It’s useful when the data exhibits exponential growth or decay. Log transformation reduces the impact of extreme values and makes the data more stationary.

?Seasonal Decomposition:

  • Decompose the time series into seasonal, trend, and residual components. The seasonal component captures periodic patterns (e.g., daily, weekly, or yearly) Subtracting the seasonal component from the original data yields a seasonally adjusted series.

?Log Difference:

  • Combine log transformation and differencing. First, take the natural logarithm of the data. Then, compute the difference between consecutive log-transformed values. This approach addresses both trend and seasonality.

?Remember that the choice of method depends on your specific dataset and the patterns you observe. Experiment with these approaches to achieve stationarity!

?The following Python code achieves the objective of transforming data in to stationary series

 # Function to read the CSV file, apply transformations, and fit ARIMA model
def transform_and_test_stationarity(filename):
    # Read the CSV file
    df = pd.read_csv(filename)
    
    # Parse the date column (assuming the date column is named 'Date')
    df['Date'] = pd.to_datetime(df['Date'])
    df.set_index('Date', inplace=True)
    
    # Extract the closing prices
    closing_prices = df['Close']
    
    # Original Series
    plot_series(closing_prices, 'Original Closing Prices')
    print("\nADF Test Result for Original Closing Prices:")
    adf_test(closing_prices)

    # 1. Differencing
    first_difference = closing_prices.diff().dropna()
    plot_series(first_difference, 'First Difference of Closing Prices')
    print("\nADF Test Result for First Difference of Closing Prices:")
    adf_test(first_difference)

    # 2. Log Transformation
    log_transformation = np.log(closing_prices).dropna()
    plot_series(log_transformation, 'Log Transformation of Closing Prices')
    print("\nADF Test Result for Log Transformation of Closing Prices:")
    adf_test(log_transformation)

    # 3. Seasonal Decomposition
    result = seasonal_decompose(closing_prices, model='additive', period=30)
    seasonal_adjusted = closing_prices - result.seasonal
    plot_series(seasonal_adjusted.dropna(), 'Seasonally Adjusted Closing Prices')
    print("\nADF Test Result for Seasonally Adjusted Closing Prices:")
    adf_test(seasonal_adjusted.dropna())

    # 4. Log Difference
    log_difference = log_transformation.diff().dropna()
    plot_series(log_difference, 'Log Difference of Closing Prices')
    print("\nADF Test Result for Log Difference of Closing Prices:")
    adf_test(log_difference)        

Next Step is Fit the ARIMA Model :

?Use the below ?- auto_arima() python code to autofit the ARIMA model

# Auto-fit ARIMA model on log difference (or any stationary series)

    print("\nFitting ARIMA model on Log Difference of Closing Prices:")

    model = auto_arima(log_difference, seasonal=False, trace=True, error_action='ignore', suppress_warnings=True)

    print(model.summary())        

The final Step is to forecast values and plot the original vs forecasted values

 # Forecast future values

    forecast_periods = 30

    forecast = model.predict(n_periods=forecast_periods)

 

    # Create a series for the forecast

    forecast_index = pd.date_range(start=log_difference.index[-1], periods=forecast_periods + 1, freq='D', inclusive='right')[1:]

    forecast_series = pd.Series(forecast, index=forecast_index)

 

    # Plot the original and forecasted values

    plt.figure(figsize=(10, 5))

    plt.plot(log_difference, label='Log Difference of Closing Prices')

    plt.plot(forecast_series, label='Forecasted Values', color='red')

    plt.title('Log Difference of Closing Prices with Forecasted Values')

    plt.xlabel('Date')

    plt.ylabel('Log Difference of Closing Price')

    plt.legend()

    plt.grid(True)

    plt.show()        

The residual errors seem fine with near-zero mean and uniform variance


Full code for the ARIMA Model for time series forecasting using livestock prices of stock from NSE India

??import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from statsmodels.tsa.stattools import adfuller

from statsmodels.tsa.seasonal import seasonal_decompose

from pmdarima import auto_arima

 
# Function to perform ADF test and print the result with analysis

def adf_test(timeseries):

    result = adfuller(timeseries)

    adf_statistic = result[0]

    p_value = result[1]

    critical_values = result[4]

     print('ADF Statistic: %f' % adf_statistic)

    print('p-value: %f' % p_value)

    print('Critical Values:')

    for key, value in critical_values.items():

        print('\t%s: %.3f' % (key, value))

 
    if p_value < 0.05:

        print("The p-value is less than 0.05, so we reject the null hypothesis. The time series is stationary.")

    else:

        print("The p-value is greater than 0.05, so we fail to reject the null hypothesis. The time series is non-stationary.")

   
    for key, value in critical_values.items():

        if adf_statistic < value:

            print(f"The ADF statistic is less than the {key} critical value. We reject the null hypothesis at the {key} level. The time series is stationary.")

        else:

            print(f"The ADF statistic is greater than the {key} critical value. We fail to reject the null hypothesis at the {key} level. The time series is non-stationary.")


# Function to plot the time series

def plot_series(timeseries, title):

    plt.figure(figsize=(10, 5))

    plt.plot(timeseries, label=title)

    plt.title(title)

    plt.xlabel('Date')

    plt.ylabel('Value')

    plt.legend()

    plt.grid(True)

    plt.show()

 

# Function to read the CSV file, apply transformations, and fit ARIMA model

def transform_and_test_stationarity(filename):

    # Read the CSV file

    df = pd.read_csv(filename)

   

    # Parse the date column (assuming the date column is named 'Date')

    df['Date'] = pd.to_datetime(df['Date'])

    df.set_index('Date', inplace=True)

   
    # Extract the closing prices

    closing_prices = df['Close']
  

    # Original Series

    plot_series(closing_prices, 'Original Closing Prices')

    print("\nADF Test Result for Original Closing Prices:")

    adf_test(closing_prices)

 

    # 1. Differencing

    first_difference = closing_prices.diff().dropna()

    plot_series(first_difference, 'First Difference of Closing Prices')

    print("\nADF Test Result for First Difference of Closing Prices:")

    adf_test(first_difference)


    # 2. Log Transformation

    log_transformation = np.log(closing_prices).dropna()

    plot_series(log_transformation, 'Log Transformation of Closing Prices')

    print("\nADF Test Result for Log Transformation of Closing Prices:")

    adf_test(log_transformation)

 

    # 3. Seasonal Decomposition

    result = seasonal_decompose(closing_prices, model='additive', period=30)

    seasonal_adjusted = closing_prices - result.seasonal

    plot_series(seasonal_adjusted.dropna(), 'Seasonally Adjusted Closing Prices')

    print("\nADF Test Result for Seasonally Adjusted Closing Prices:")

    adf_test(seasonal_adjusted.dropna())

     # 4. Log Difference

    log_difference = log_transformation.diff().dropna()

    plot_series(log_difference, 'Log Difference of Closing Prices')

    print("\nADF Test Result for Log Difference of Closing Prices:")

    adf_test(log_difference)


    # Auto-fit ARIMA model on log difference (or any stationary series)

    print("\nFitting ARIMA model on Log Difference of Closing Prices:")

    model = auto_arima(log_difference, seasonal=False, trace=True, error_action='ignore', suppress_warnings=True)

    print(model.summary())

 

    # Forecast future values

    forecast_periods = 30

    forecast = model.predict(n_periods=forecast_periods)

 
    # Create a series for the forecast

    forecast_index = pd.date_range(start=log_difference.index[-1], periods=forecast_periods + 1, freq='D', inclusive='right')[1:]

    forecast_series = pd.Series(forecast, index=forecast_index)


    # Plot the original and forecasted values

    plt.figure(figsize=(10, 5))

    plt.plot(log_difference, label='Log Difference of Closing Prices')

    plt.plot(forecast_series, label='Forecasted Values', color='red')

    plt.title('Log Difference of Closing Prices with Forecasted Values')

    plt.xlabel('Date')

    plt.ylabel('Log Difference of Closing Price')

    plt.legend()

    plt.grid(True)

    plt.show()

 # Example usage

filename = input("Enter the filename of the CSV file to read (e.g., 'stock_data.csv'): ")

transform_and_test_stationarity(filename)
        




Abhishek Naik

Business Intelligence Analyst | Ex ABFRL | Process optimization | MBA candidate

10 个月

Awesome explanation sir, and when it comes to accuracy advanced models like LSTM, Transformer models, and NeuralProphet tend to offer higher accuracy, but they need more data and computational power. For time series with changing volatility, such as financial data, GARCH models are particularly effective. If ease of use is a priority, tools like Prophet and NeuralProphet are designed to be user-friendly and require minimal tuning.

Dr. Dileep S

Narsee Monjee Institute of Management Studies (NMIMS)- Bengaluru

10 个月

Dear Sir, you have explained the concept very simply and beautifully. I am really happy to have read this article, and it is useful to many of us! Thank you!

Tanvir Sayyad

Interned at Dell Technologies | Vice President | Insignia - The Alumni Committee at SVKM's NMIMS, Bangalore (NMIMS) | Ex - Infosys

10 个月

Very useful information! Forecasting using time series model is amazing! ARIMA is great way to do so. To add on we also have a FB prophet Model which automatically detects the trends and seasonality on daily, weekly and yearly basis and mitigates the impact of outliers. It also allows for the inclusion of holidays and important events that can spike the stock price!

要查看或添加评论,请登录

Vaidyanathan Ravichandran的更多文章

社区洞察

其他会员也浏览了