登录查看更多内容

Forecasting Time Series (stock price)- ARIMA Model Using Python

Vaidyanathan Ravichandran

Professor of Practice (Finance) - Business Schools , Bangalore

发布日期: 2024年5月21日

Time Series Forecasting – ARIMA Model - Introduction

Time series forecasting is a method used to predict future values based on previously observed values in a time series data set. A time series is a sequence of data points typically measured at successive points in time, often at uniform intervals.

Time series forecasting involves using statistical models and techniques to analyze past data and make informed predictions about future trends. It is widely used in various fields such as finance, economics, supply chain management, and meteorology for predicting future events and making data-driven decisions.

Traditional and Advanced Methods

Traditional Methods:
Simple Moving Average
Weighted Moving Average
Exponential Smoothing Average

Advanced Machine Learning Methods:

Linear Regression Models: Can be used for forecasting by incorporating lagged values and other predictors.
Tree-based Methods: Random forests, gradient boosting machines, etc., for capturing complex patterns.
Neural Networks: Models like Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks designed to handle sequential data.

Steps in Time Series Forecasting

Data Preparation: Collecting and cleaning the data, handling missing values, and transforming variables if necessary.
Exploratory Data Analysis (EDA): Visualizing the data to identify trends, seasonality, and other patterns.
Model Selection: Choosing appropriate forecasting methods based on the data characteristics.
Model Fitting: Training the model on historical data.
Model Evaluation: Assessing the model’s performance using metrics like Mean Absolute Error (MAE), Mean Squared Error (MSE), etc.
Forecasting: Making predictions using the trained model and assessing the forecast accuracy.
Updating the Model: Regularly updating the model as new data becomes available.

This article will discuss time series forecasting using the ARIMA model.

ARIMA Model Overview

ARIMA stands for Auto-Regressive Integrated Moving Average and represents a cornerstone in time series forecasting. It is a statistical method that has gained immense popularity due to its efficacy in handling various standard temporal structures present in time series data. ARIMA models are based on the idea that the information in past values of the time series can alone be used to predict future values.

Exponential smoothing and ARIMA models are the two most widely used approaches to time series forecasting. Exponential smoothing models are based on a description of the trend and seasonality in the data, while ARIMA models aim to describe the autocorrelations in the data.

Assumptions and Parameters of the ARIMA Model

Major Assumption:

Stationarity: The time series has statistical properties that remain constant across time.

Components/Parameters of ARIMA Model:

AR (Autoregression): The dependent relationship between an observation and its preceding observations. p: The lag order.

I (Integrated): Differencing of raw observations to achieve stationarity. d: Degree of differencing.

MA (Moving Average): The relationship between an observation and a residual error from a moving average model. q: Order of the moving average.

Non-Seasonal Data

Non-seasonal time series data do not exhibit regular and predictable patterns that repeat over a specific period. Examples include stock prices, which may show trends or cycles but do not typically follow a seasonal pattern.

Characteristics:

No Regular Repetition

Trends and Cycles

Stationarity

Example of Non-Seasonal Data:

White Noise

Autocorrelation: A key feature of white noise is the absence of autocorrelation. This means there's no correlation between a data point and its past or future values. The autocorrelation function (ACF) of a white noise series should be close to zero at all lags (time differences) except for lag 0 (correlation with itself).

?White noise is a fundamental concept in time series analysis, representing purely random fluctuations. It is crucial for diagnosing the adequacy of time series models and ensuring that only the random noise remains in the residuals after fitting a model. Understanding and identifying white noise helps in building more accurate and reliable time series models.

?In essence, white noise in time series represents a purely random component, providing a benchmark for how well a model explains the data's variability.

Below illustration of White Noise in Time Series:

?White Noise Generation: We will generate 1000 samples from a normal distribution with mean 0 and standard deviation 1.

import numpy as np
import matplotlib.pyplot as plt

# Generate white noise
np.random.seed(0)
white_noise = np.random.normal(0, 1, 1000)

# Plot the white noise
plt.figure(figsize=(10, 6))
plt.plot(white_noise)
plt.title('White Noise')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()

# Plot the autocorrelation function (ACF)
from statsmodels.graphics.tsaplots import plot_acf
plot_acf(white_noise, lags=40)
plt.title('Autocorrelation Function (ACF) of White Noise')
plt.show()

ARIMA Model Steps

领英推荐

Roll Up Your Sleeves: 9 Data and Machine Learning…

Towards Data Science 10 个月前

New Book on Synthetic Data: Version 3.0 Just Released

Vincent Granville 2 年前

Deep Learning: Earth Science's Crystal Ball

Lissandro Botelho 8 个月前

Python code to download stock data from Yahoo finance.

import yfinance as yf
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime

# Function to fetch and save stock data
def fetch_stock_data(ticker, start_date, end_date, filename):
    # Fetch data from Yahoo Finance
    stock_data = yf.download(ticker, start=start_date, end=end_date)
    
    # Save data to CSV
    stock_data.to_csv(filename)
    print(f"Data saved to {filename}")

# Get user inputs
ticker = input("Enter the ticker symbol of the Nifty 50 stock (e.g., 'RELIANCE.NS'): ")
start_date = input("Enter the start date (YYYY-MM-DD): ")
end_date = input("Enter the end date (YYYY-MM-DD): ")
filename = input("Enter the filename to save the data (e.g., 'stock_data.csv'): ")

# Validate date format
try:
    datetime.strptime(start_date, '%Y-%m-%d')
    datetime.strptime(end_date, '%Y-%m-%d')
except ValueError:
    print("Invalid date format. Please enter the date in YYYY-MM-DD format.")
    exit()

# Fetch and save stock data
fetch_stock_data(ticker, start_date, end_date, filename)

# Read the saved CSV file
stock_data = pd.read_csv(filename, index_col='Date', parse_dates=True)

# Plot the closing prices
plt.figure(figsize=(12, 6))
plt.plot(stock_data['Close'], label=f'Closing Prices of {ticker}')
plt.title(f'Closing Prices of {ticker} from {start_date} to {end_date}')
plt.xlabel('Date')
plt.ylabel('Closing Price')
plt.legend()
plt.grid(True)
plt.show()

Step 2: Test for Stationarity – ADF

Method 1 : By observing, we can say clearly the above series is non-stationary.

Method 2 : ADF Test

   # Perform ADF test on the closing prices
    print("\nADF Test Result for Closing Prices:")
    adf_test(df['Close'])
???

Analysis:

The p-value is greater than 0.05, so we fail to reject the null hypothesis. The time series is non-stationary

?Analysis:

The p-value is greater than 0.05, so we fail to reject the null hypothesis. The time series is non-stationary.

Step 2 : Transforming non-stationary series into stationary

Some of the common approaches for making a time series stationary.

?Differencing:

Differencing involves computing the difference between consecutive observations. First-order differencing (Yt - Yt-1) helps remove trends and seasonality. By subtracting adjacent values, you create a new series where the trend component is minimized. Differencing can be performed multiple times if needed.

?Log Transformation:

Taking the natural logarithm of the data can stabilize variance. It’s useful when the data exhibits exponential growth or decay. Log transformation reduces the impact of extreme values and makes the data more stationary.

?Seasonal Decomposition:

Decompose the time series into seasonal, trend, and residual components. The seasonal component captures periodic patterns (e.g., daily, weekly, or yearly) Subtracting the seasonal component from the original data yields a seasonally adjusted series.

?Log Difference:

Combine log transformation and differencing. First, take the natural logarithm of the data. Then, compute the difference between consecutive log-transformed values. This approach addresses both trend and seasonality.

?Remember that the choice of method depends on your specific dataset and the patterns you observe. Experiment with these approaches to achieve stationarity!

?The following Python code achieves the objective of transforming data in to stationary series

 # Function to read the CSV file, apply transformations, and fit ARIMA model
def transform_and_test_stationarity(filename):
    # Read the CSV file
    df = pd.read_csv(filename)
    
    # Parse the date column (assuming the date column is named 'Date')
    df['Date'] = pd.to_datetime(df['Date'])
    df.set_index('Date', inplace=True)
    
    # Extract the closing prices
    closing_prices = df['Close']
    
    # Original Series
    plot_series(closing_prices, 'Original Closing Prices')
    print("\nADF Test Result for Original Closing Prices:")
    adf_test(closing_prices)

    # 1. Differencing
    first_difference = closing_prices.diff().dropna()
    plot_series(first_difference, 'First Difference of Closing Prices')
    print("\nADF Test Result for First Difference of Closing Prices:")
    adf_test(first_difference)

    # 2. Log Transformation
    log_transformation = np.log(closing_prices).dropna()
    plot_series(log_transformation, 'Log Transformation of Closing Prices')
    print("\nADF Test Result for Log Transformation of Closing Prices:")
    adf_test(log_transformation)

    # 3. Seasonal Decomposition
    result = seasonal_decompose(closing_prices, model='additive', period=30)
    seasonal_adjusted = closing_prices - result.seasonal
    plot_series(seasonal_adjusted.dropna(), 'Seasonally Adjusted Closing Prices')
    print("\nADF Test Result for Seasonally Adjusted Closing Prices:")
    adf_test(seasonal_adjusted.dropna())

    # 4. Log Difference
    log_difference = log_transformation.diff().dropna()
    plot_series(log_difference, 'Log Difference of Closing Prices')
    print("\nADF Test Result for Log Difference of Closing Prices:")
    adf_test(log_difference)

Next Step is Fit the ARIMA Model :

?Use the below ?- auto_arima() python code to autofit the ARIMA model

# Auto-fit ARIMA model on log difference (or any stationary series)

    print("\nFitting ARIMA model on Log Difference of Closing Prices:")

    model = auto_arima(log_difference, seasonal=False, trace=True, error_action='ignore', suppress_warnings=True)

    print(model.summary())

The final Step is to forecast values and plot the original vs forecasted values

 # Forecast future values

    forecast_periods = 30

    forecast = model.predict(n_periods=forecast_periods)

 

    # Create a series for the forecast

    forecast_index = pd.date_range(start=log_difference.index[-1], periods=forecast_periods + 1, freq='D', inclusive='right')[1:]

    forecast_series = pd.Series(forecast, index=forecast_index)

 

    # Plot the original and forecasted values

    plt.figure(figsize=(10, 5))

    plt.plot(log_difference, label='Log Difference of Closing Prices')

    plt.plot(forecast_series, label='Forecasted Values', color='red')

    plt.title('Log Difference of Closing Prices with Forecasted Values')

    plt.xlabel('Date')

    plt.ylabel('Log Difference of Closing Price')

    plt.legend()

    plt.grid(True)

    plt.show()

The residual errors seem fine with near-zero mean and uniform variance

Full code for the ARIMA Model for time series forecasting using livestock prices of stock from NSE India

??import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

from statsmodels.tsa.stattools import adfuller

from statsmodels.tsa.seasonal import seasonal_decompose

from pmdarima import auto_arima

 
# Function to perform ADF test and print the result with analysis

def adf_test(timeseries):

    result = adfuller(timeseries)

    adf_statistic = result[0]

    p_value = result[1]

    critical_values = result[4]

     print('ADF Statistic: %f' % adf_statistic)

    print('p-value: %f' % p_value)

    print('Critical Values:')

    for key, value in critical_values.items():

        print('\t%s: %.3f' % (key, value))

 
    if p_value < 0.05:

        print("The p-value is less than 0.05, so we reject the null hypothesis. The time series is stationary.")

    else:

        print("The p-value is greater than 0.05, so we fail to reject the null hypothesis. The time series is non-stationary.")

   
    for key, value in critical_values.items():

        if adf_statistic < value:

            print(f"The ADF statistic is less than the {key} critical value. We reject the null hypothesis at the {key} level. The time series is stationary.")

        else:

            print(f"The ADF statistic is greater than the {key} critical value. We fail to reject the null hypothesis at the {key} level. The time series is non-stationary.")


# Function to plot the time series

def plot_series(timeseries, title):

    plt.figure(figsize=(10, 5))

    plt.plot(timeseries, label=title)

    plt.title(title)

    plt.xlabel('Date')

    plt.ylabel('Value')

    plt.legend()

    plt.grid(True)

    plt.show()

 

# Function to read the CSV file, apply transformations, and fit ARIMA model

def transform_and_test_stationarity(filename):

    # Read the CSV file

    df = pd.read_csv(filename)

   

    # Parse the date column (assuming the date column is named 'Date')

    df['Date'] = pd.to_datetime(df['Date'])

    df.set_index('Date', inplace=True)

   
    # Extract the closing prices

    closing_prices = df['Close']
  

    # Original Series

    plot_series(closing_prices, 'Original Closing Prices')

    print("\nADF Test Result for Original Closing Prices:")

    adf_test(closing_prices)

 

    # 1. Differencing

    first_difference = closing_prices.diff().dropna()

    plot_series(first_difference, 'First Difference of Closing Prices')

    print("\nADF Test Result for First Difference of Closing Prices:")

    adf_test(first_difference)


    # 2. Log Transformation

    log_transformation = np.log(closing_prices).dropna()

    plot_series(log_transformation, 'Log Transformation of Closing Prices')

    print("\nADF Test Result for Log Transformation of Closing Prices:")

    adf_test(log_transformation)

 

    # 3. Seasonal Decomposition

    result = seasonal_decompose(closing_prices, model='additive', period=30)

    seasonal_adjusted = closing_prices - result.seasonal

    plot_series(seasonal_adjusted.dropna(), 'Seasonally Adjusted Closing Prices')

    print("\nADF Test Result for Seasonally Adjusted Closing Prices:")

    adf_test(seasonal_adjusted.dropna())

     # 4. Log Difference

    log_difference = log_transformation.diff().dropna()

    plot_series(log_difference, 'Log Difference of Closing Prices')

    print("\nADF Test Result for Log Difference of Closing Prices:")

    adf_test(log_difference)


    # Auto-fit ARIMA model on log difference (or any stationary series)

    print("\nFitting ARIMA model on Log Difference of Closing Prices:")

    model = auto_arima(log_difference, seasonal=False, trace=True, error_action='ignore', suppress_warnings=True)

    print(model.summary())

 

    # Forecast future values

    forecast_periods = 30

    forecast = model.predict(n_periods=forecast_periods)

 
    # Create a series for the forecast

    forecast_index = pd.date_range(start=log_difference.index[-1], periods=forecast_periods + 1, freq='D', inclusive='right')[1:]

    forecast_series = pd.Series(forecast, index=forecast_index)


    # Plot the original and forecasted values

    plt.figure(figsize=(10, 5))

    plt.plot(log_difference, label='Log Difference of Closing Prices')

    plt.plot(forecast_series, label='Forecasted Values', color='red')

    plt.title('Log Difference of Closing Prices with Forecasted Values')

    plt.xlabel('Date')

    plt.ylabel('Log Difference of Closing Price')

    plt.legend()

    plt.grid(True)

    plt.show()

 # Example usage

filename = input("Enter the filename of the CSV file to read (e.g., 'stock_data.csv'): ")

transform_and_test_stationarity(filename)

Understanding Financial Risks

1,597 位关注者

Abhishek Naik

Business Intelligence Analyst | Ex ABFRL | Process optimization | MBA candidate

10 个月

Awesome explanation sir, and when it comes to accuracy advanced models like LSTM, Transformer models, and NeuralProphet tend to offer higher accuracy, but they need more data and computational power. For time series with changing volatility, such as financial data, GARCH models are particularly effective. If ease of use is a priority, tools like Prophet and NeuralProphet are designed to be user-friendly and require minimal tuning.

1 次回应

Dr. Dileep S

Narsee Monjee Institute of Management Studies (NMIMS)- Bengaluru

10 个月

Dear Sir, you have explained the concept very simply and beautifully. I am really happy to have read this article, and it is useful to many of us! Thank you!

2 次回应

Tanvir Sayyad

Interned at Dell Technologies | Vice President | Insignia - The Alumni Committee at SVKM's NMIMS, Bangalore (NMIMS) | Ex - Infosys

10 个月

Very useful information! Forecasting using time series model is amazing! ARIMA is great way to do so. To add on we also have a FB prophet Model which automatically detects the trends and seasonality on daily, weekly and yearly basis and mitigates the impact of outliers. It also allows for the inclusion of holidays and important events that can spike the stock price!

2 次回应

查看更多评论

要查看或添加评论，请登录

Vaidyanathan Ravichandran的更多文章

Real Estate Valuation Using the Sales Comparison Approach

2025年2月5日

Real Estate Valuation Using the Sales Comparison Approach

The Sales Comparison Approach is one of the most widely used methods for valuing real estate, particularly for…
Real Estate Valuation Using the Cost Approach

2025年2月5日

Real Estate Valuation Using the Cost Approach

Real estate valuation is essential for informed buying, selling, and investment decisions. Among the various valuation…

2 条评论
"Empowering India’s Backbone: The Finance Budget 2025's Transformative Boost for MSMEs"

2025年2月3日

"Empowering India’s Backbone: The Finance Budget 2025's Transformative Boost for MSMEs"

A Great Boost for MSMEs in the Finance Budget 2025 In the latest Finance Budget, the government has unveiled a series…

1 条评论
The AI Race: China vs. the U.S.—Who Will Dominate the Future?

2025年2月2日

The AI Race: China vs. the U.S.—Who Will Dominate the Future?

The global competition for supremacy in artificial intelligence (AI) has become a defining geopolitical and economic…

1 条评论
Agentic AI: Transforming Industries with Autonomous Intelligence

2025年1月23日

Agentic AI: Transforming Industries with Autonomous Intelligence

Agentic AI refers to artificial intelligence systems that operate autonomously, making decisions and taking actions…

1 条评论
"A Deep Dive into the Basics of Real Options Analysis -Part 2 - Valuation of Real Options through Black-Scholes Option Pricing Model (excel /python(

2024年12月29日

"A Deep Dive into the Basics of Real Options Analysis -Part 2 - Valuation of Real Options through Black-Scholes Option Pricing Model (excel /python(

Refer Earlier article : "A Deep Dive into the Basics of Real Options Analysis - Key Concepts for Strategic…
A Deep Dive into the Basics of Real Options Analysis-Key Concepts for Strategic Decision-Making (Part 1- Key Concepts)

2024年12月29日

A Deep Dive into the Basics of Real Options Analysis-Key Concepts for Strategic Decision-Making (Part 1- Key Concepts)

Real Options Analysis: A Comprehensive Guide to Strategic Investment Decisions Introduction Real options analysis (ROA)…

2 条评论
"A Practical Guide to Forecasting for Business Success"

2024年12月22日

"A Practical Guide to Forecasting for Business Success"

Forecasting is the art and science of predicting future events. It is a critical tool for businesses, allowing them to…

1 条评论
"Bankruptcy Prediction and Financial Health: Analyzing Adani Group with Altman Z-Score"

2024年10月21日

"Bankruptcy Prediction and Financial Health: Analyzing Adani Group with Altman Z-Score"

Bankruptcy prediction has become an essential tool for investors, lenders, and company managers to assess financial…

1 条评论
"Predicting Obesity with Logistic Regression: A Step-by-Step Guide Using Synthetic Data"

2024年10月21日

"Predicting Obesity with Logistic Regression: A Step-by-Step Guide Using Synthetic Data"

Outline for this article : Introduction Logistic Regression Overview Understanding Categorical Variables Sigmoid…

See all articles

Forecasting Time Series (stock price)- ARIMA Model Using Python

Vaidyanathan Ravichandran

Professor of Practice (Finance) - Business Schools , Bangalore

Time Series Forecasting – ARIMA Model - Introduction

Traditional and Advanced Methods

Steps in Time Series Forecasting

This article will discuss time series forecasting using the ARIMA model.

ARIMA Model Steps

领英推荐

Step 2: Test for Stationarity – ADF

?Analysis:

Some of the common approaches for making a time series stationary.

Full code for the ARIMA Model for time series forecasting using livestock prices of stock from NSE India

Understanding Financial Risks

1,597 位关注者

Vaidyanathan Ravichandran的更多文章

社区洞察

其他会员也浏览了

AI Developer tech skillsets.

MLP (Keras) Optimizers for Discrete Problems

Exploring the Most Complex Topics in Data Science and Their Impact on Supply Chain Management

Regression: From Theory to ML

Data Science Explained!

Why you should add statistical learning to your machine learning tool kit

The Unsung Hero of Data Science: Mathematics

Deep Learning: GANs and Variational Autoencoders training

TensorFlow-Keras using Mnist Dataset

Choosing the Right Time Series Model: A Blend of Data Science, Statistics, and Financial Understanding.

Time Series Forecasting – ARIMA Model - Introduction

Traditional and Advanced Methods

Steps in Time Series Forecasting

This article will discuss time series forecasting using the ARIMA model.

ARIMA Model Steps

领英推荐

Step 2: Test for Stationarity – ADF

?Analysis:

Some of the common approaches for making a time series stationary.

Full code for the ARIMA Model for time series forecasting using livestock prices of stock from NSE India

Understanding Financial Risks

1,597 位关注者

Vaidyanathan Ravichandran的更多文章

Real Estate Valuation Using the Sales Comparison Approach

Real Estate Valuation Using the Cost Approach

"Empowering India’s Backbone: The Finance Budget 2025's Transformative Boost for MSMEs"

The AI Race: China vs. the U.S.—Who Will Dominate the Future?

Agentic AI: Transforming Industries with Autonomous Intelligence

"A Deep Dive into the Basics of Real Options Analysis -Part 2 - Valuation of Real Options through Black-Scholes Option Pricing Model (excel /python(

A Deep Dive into the Basics of Real Options Analysis-Key Concepts for Strategic Decision-Making (Part 1- Key Concepts)

"A Practical Guide to Forecasting for Business Success"

"Bankruptcy Prediction and Financial Health: Analyzing Adani Group with Altman Z-Score"

"Predicting Obesity with Logistic Regression: A Step-by-Step Guide Using Synthetic Data"

社区洞察

其他会员也浏览了

AI Developer tech skillsets.

MLP (Keras) Optimizers for Discrete Problems

Exploring the Most Complex Topics in Data Science and Their Impact on Supply Chain Management

Regression: From Theory to ML

Data Science Explained!

Why you should add statistical learning to your machine learning tool kit

The Unsung Hero of Data Science: Mathematics

Deep Learning: GANs and Variational Autoencoders training

TensorFlow-Keras using Mnist Dataset

Choosing the Right Time Series Model: A Blend of Data Science, Statistics, and Financial Understanding.