Time Series Part 3 - Stock Price prediction using ARIMA model with Python

Time Series Part 3 - Stock Price prediction using ARIMA model with Python

Welcome back readers!

For those who are reading this blog series for the first time could refer to the previous two parts of the blog. Part 1 introduces the concepts involved in time series and Part 2 discusses the steps involved to model time series data using ARIMA models.

In this blog, we will be using the ARIMA model to forecast the prices of Reliance Ltd. To do the forecast we will be using Python tool. In this blog, I have tried to keep the read simple where we will look at the steps in a chronological order. The process is broken into multiple steps where we will see the relevant code and what it does. The Python file that we are referring to is also attached at the end of this blog, so that one can download the file, make the changes and use it.

Reliance Ltd. is a listed Indian company. Reliance Ltd. The company does business in many areas like energy, petrochemicals, natural gas, retail, telecommunications, mass media and textiles. One of the reason for selecting this stock is that this stock has one of the highest market cap in the Indian stock markets and hence offers good liquidity to trade in it.

Let’s get started:

1)????Import the required packages

No alt text provided for this image

  • The packages that we are using are:
  • pandas – This package helps us data handling and data manipulations easy
  • yfinance – It is used to download dataset from Yahoo Finance
  • datetime – Used to handle date and time objects
  • matplotlib - Used for data visualization and plotting 2D graphics
  • seaborn - Used for improved data visualization and better graphics
  • pmdarima – Used for model selection in time series forecasting

2)????Select asset and time period

No alt text provided for this image

For this step, we are using the ticker for Reliance stock i.e. ‘RELIANCE.NS’. This value is stored in the variable named ticker.

The dataset is broken into two i.e. training and testing dataset. We will train our model using the training dataset and will finally predict the future of the stock for the testing period. We will visually compare the actual stock price compared to the prediction by the model in Step 7.

The period selected for training data is from 1st January 2010 to 31st December 2020. This is a total of 11 years of training data. We are storing the value in the variable named start_training and end_training.

For the testing data, we will use data from 1st January 2021 to current date. We are storing the values in variable named start_testing and end_testing.

Note: One can download the file and change the stock name and dates for self-use.

3)????Download training data

No alt text provided for this image

Here, we are downloading the training dataset using the yfinance package(abbreviated as yf). As parameters, we are passing the ticker name ‘RELIANCE.NS’ that we had saved in the variable ticker in the previous step. We are also passing the start_training as the start date and end_training as the end date.

4)????Convert daily dataset to weekly dataset

No alt text provided for this image

In this step, we are resampling the dataset to weekly timeframe. To do so, we are using resample function from the pandas package. We are setting the parameter value as ‘W’ which stands for weekly resampling. We are also using the agg function where ‘Open’ is the first value, ‘High’ is the maximum value for the week, ‘Low’ is the minimum value for the week, ‘Close’ and ‘Adj Close’ is the last of the respective columns.

The reason why we are converting to weekly timeframe is that higher the timeframe, lower is the noise and hence increases predictability of the model. One can also convert the data to monthly timeframe and check the predictability of the model.

5)????Model Selection

No alt text provided for this image

This step is the most crucial step and was discussed in great detail in Part 2 of this blog. We can take any combination of values for the ARIMA model but it is recommended to select values using which the AIC score in maximized.?

In this step we are selecting values of the ARIMA model using the auto_arima function from the pmdarima package. The values are chosen in such a way that AIC score is maximized.

The selected model is stored in a variable named arima_fit and will be used in next steps.

Given below is the summary generated from the model selected.

No alt text provided for this image

???????????????????????????????????????????Snapshot from the summary of Step 5

Observations:

  • As highlighted above, the model parameters which give the most optimized results are values of (3, 1, 2). This means AR model has a lagged value till lag 3, I suggests first order differencing, MA model has a lagged value till lag 2.
  • The final equation for the suggested model is:

dY(t) = 7.13 - 1.27*dY(t-1) - 0.57*dY(t-2) + 0.14*dY(t-3) + 1.29*e(t-1) + 0.73*e(t-2)

6)????Testing Data

Till now we were working on training data and in this step we will start working with the testing data. In the previous steps, we downloaded the training data, manipulated the data and did model selection. Now, in this step we will be using the chosen model on some unseen data to see how the model performs on new data. Following are the steps performed here:

  • Download the data from 1st January 2021 till current date

No alt text provided for this image

Resample the data from daily to weekly.

No alt text provided for this image

Select only adjusted close data for further steps

No alt text provided for this image

7)????Forecast using testing data

No alt text provided for this image

In step 5, we had stored the selected model in a variable called arima_fit and in this step we are using the model stored in arima_fit to predict the price for the selected testing period (i.e. 1st January 2021 to current date).

The parameters in the function are used for the following purposes:

  • n_periods – To input the number of periods we want to predict
  • return_conf_int – To input whether to want to get the confidence intervals for the forecast?
  • alpha – To calculate the confidence intervals for the forecasts as (1 – alpha) %

In the next step we store the calculated value in a pandas Dataframe named arima_fcast. Given below is the snapshot of the result generated:

No alt text provided for this image

Snapshot of the arima_fcast

8)????Plot the results

No alt text provided for this image

Using the above code snippet, we are plotting values calculated and stored in variable arima_fcast. Following is the plot generated:

No alt text provided for this image

?????????????????????????????Visual representation of forecasted values

The black line indicates the actual adjusted closing price of the stock Reliance for the time period 1st January 2021 till date.

The red line indicates the predicted value for price of the stock.

The band around the red line shows the confidence band for the price calculated. As time moves away from the date of prediction (in our case 1st Jan 2021), the band widens as the confidence to predict a value further contracts. It is easier to predict the price of an asset for tomorrow than that after a year. As a result, the band widens as the period for prediction increases.

In the plot, we can see the price stays within the predicted band for most of the period and that suggests that the selected model might be a good model for price prediction.

Conclusion

Key takeaways from this blog are:

  • ARIMA models is one of statistical technique could be used to forecast time series data.
  • Time series data have trend, seasonality/cyclicality and noise components.
  • In ARIMA models, one has to decide on the order in such a way that error is reduced.
  • Model selection could be done using packages like pmdarima using tools like Python and R.

要查看或添加评论,请登录

Divyant Agarwal, CFA的更多文章

社区洞察

其他会员也浏览了