登录查看更多内容

Stock Price Predictor using Python

Vishwajeet Singh Rana

Senior Software Engineer (AI, Computer Vision and Video Analytics) at Collins Aerospace, Raytheon Technologies

发布日期: 2021年8月24日

1. Project Overview

Financial institutions around the world are trading in billions of dollars on a daily basis. Investment firms, hedge funds and even individuals have been using financial models to better understand market behavior and make profitable investments and trades. A wealth of information is available in the form of historical stock prices and company performance data, suitable for?machine learning algorithms?to process.

2. Problem Statement

In this project, I will be using?yahoo?finance?data to build a stock price predictor that takes daily trading data over a certain date range as input, and outputs projected estimates for given query dates. Note that the inputs will contain multiple metrics, such as opening price (Open), highest price the stock traded at (High), how many stocks were traded (Volume) and closing price adjusted for stock splits and dividends (Adjusted Close).

Some questions I would like to answer from this project are the following:

Can one really predict the price of stocks? Or is the price of stocks dependent on some factors e.g. economic factors.
Are there good models in practice that one can use to predict stock prices?

In this project, I would be trying out different models and comparing their performance. However, I would only be predicting the Adjusted Closing price of stocks here.

3. Metrics

I will be using the mean absolute error to compare three different models and their performance. The formula is given as follows:

The model with the lowest mean absolute error is said to perform better than others with higher values. The goal is to select the model with the least error.

4. Data Exploratory Analysis and Visualization

Before I begin the modelling process, I would be taking some time to view the data, compute some statistics and plots to better understand the data. This is what the first 5 rows of the data looks like for the Apple, Amazon, Ford, Google, Johnson & Johnson, Pfizer and S&P 500 stocks which I would be using for my analysis.

From the above, we can see that Google has some missing values, and I would be using Python’s?fillna?method to handle it. More details can be found in my?Jupyter notebook. We would need to normalize the data, but before that, let us have a look at some statistics for adjusted closing price.

Normalizing the Data

We want to know how the different types of stocks went up and down with respect to the others. In order to do this, we will normalize the data. We do this by dividing the values of each column by day one to ensure that each stock starts with $1.

From the above cumulative return plot, we can see that Apple has the highest return over the years, while Amazon was second and Google third and Microsoft fourth. The growth of Google and Microsoft looks much more stable than Apple and Amazon. Taking a closer look at the plot, we can see that Apple has a lot of volatility and risky stocks especially in recent years.

Cumulative Returns

I will compute cumulative returns to see how the pandemic affected stock prices for these companies.

From the above plots, let’s take note of the following:

2019:?Before the pandemic, we notice that most of the companies stocks were doing relatively well with Apple and Microsoft taking the lead and Pfizer trailing behind.
2020:?On the onset of the pandemic around spring, we notice that there was a fall in stock prices for all the companies, but afterwards the technology companies like Amazon, Apple, Microsoft and Google started to grow again. But companies like Pfizer, Ford and S&P 500 did not do very well particularly Ford.
2021:?As the vaccine rollout began and the lockdown began to be lifted, we can see significant growth in the stock prices of Ford in particular given that its stock prices were low in 2020 due to the pandemic. Companies like Google and Microsoft, S&P 500 also grew. In general, there was an improvement in stock prices of all the companies we considered.

Rolling mean and Bollinger Bands

The?rolling mean?may give us some idea about the true underlying prices of a stock. If there is a significant deviation below or above the rolling mean, it may give us an idea about a potential buying and selling opportunity.?Bollinger Bands?is a statistical chart that contains the volatility of a financial instrument over time. Bollinger observed that looking at the recent volatility of the stock, if it is very volatile, we might want to discard the movement above and below the mean. But if it is not very volatile we may want to pay attention to it.

From the above plots, we can see that the initial values for the rolling mean are missing. This is as a result of the 20 days window period I used at the beginning which had no values. We can also observe that the rolling mean follows the movement of the raw stock prices and it is less spiky. We can also see that Ford has lower stock prices than Microsoft in 2020 as expected.

Daily Returns

Daily returns tells us how much the stock price go up and down on a particular day. We can compute it using the following function:

where?price(t)?is the price of today’s stock and?price(t-1)?is the price of yesterday’s stock.

From the above plots, we can see that the volatility range for Ford is higher than Microsoft. This could be as a result of technology companies like Microsoft bouncing back faster during the pandemic.

5. Modelling Methodology and Results

In this section I will be trying out some models to predict the Adjusted closing price of a stock. Before starting modelling I used the python?fillna?method to handle missing data. More details can be seen in my?Jupyter notebook.

Prediction using Long Short-Term Memory (LSTM):

LSTM?is an artificial recurrent neural network (RNN) architecture used in deep learning that is capable of learning long-term dependencies. It processes data passing on information as it propagates forward and have a chain like structure. I used Adam optimizer for my model and the mean squared error for my loss function. Below is my LSTM model summary.

领英推荐

Graph RAG, Automated Prompt Engineering, Agent…

Towards Data Science 5 个月前

Python just arrived in Excel, here is everything you…

Nicolas Boucher 5 个月前

Multivariate Time Series Forecasting In Python

Ikigai 2 年前

For my initial model, I used a batch size of 1 and 5 epochs, which gave me a mean absolute error of 0.0942. This isn’t so bad, but there is room for improving the model by tuning the parameters to hopefully get better predictions.

Refinement

I will now try to tune a couple of my model parameters to see how my model performs. Below is a table of the different parameter I tuned for Microsoft stocks and their corresponding mean absolute error.

From the above table, we can see that as the batch size and number of epoch increased the model performed better (i.e. a lower mean absolute error). Also including an activation function (Relu) did not improve the model performance.

The results from my final (5th trial in the refinement table above) LSTM prediction has a batch size of 800 and number of epochs of 50 is given below:

From the above, we can see that the predicted and actual adjustable stock prices plots looks are relatively similar with little variation, but with a mean absolute error of 0.0591 which isn’t too bad. We can also conclude that spending more time tuning the parameter does improve the model as shown in the above table. However, there is still room for improvement and trying out other models to compare.

Prediction using Linear Regression

Linear Regression?attempts to model the relationship between a response and one or more explanatory variables by fitting a linear equation to the observed data. The results from my Linear regression prediction is given below:

From the above, we can see that the predicted and actual adjustable stock prices plots have variations but with a mean absolute error is 0.215 which a bit worst than the LSTM model. However, there is still room for improvement and trying out other models to compare. Let’s try one more model and see how it performs.

Prediction using Random Forest Regression

Random Forest Regression is a supervised learning algorithm that uses ensemble learning methods for regression. A Random Forest operates by constructing a multitude of decision trees during training time and outputting the average prediction of the individual tress for regression tasks. For classification tasks, it outputs the class selected by most trees.

Below is a table of actual and predicted values of Adjusted closing stock price for Microsoft using a Random Forest Regressor.

From the table, we can see that the Random Forest Regressor performed very well and the actual and predicted Adjusted close value are fairly close. Let us now view the plots.

From the above, we can see that the predicted and actual adjustable stock prices plots are relatively similar with a mean absolute error is 0.0497 which is good. Let us see how it would perform with the Google stocks.

From the above, we can see that the predicted and actual adjustable stock prices plots look are very similar with a mean absolute error of 0.000824 which is very good. Given that the plots overlap, I plotted it separately so we can see its similarity clearly.

Model Evaluation and Results

From my investigation of three different models, I observed that Random Forest Regressor delivered a much lower mean absolute error than the LSTM or Linear Regression for both Microsoft and Google stocks (see Fig. 15 below). I also observed that taking time to tune the parameters for the LSTM model (e.g. the number of epochs and batch size) resulted in better prediction.

Justification

From my analysis, we can see that one can actually predict the price of stocks and that economic factors do have some effect on the prices of stock. Secondly, there are several models that deliver good results in practice that one can use to predict stock prices.

The Random forest regressor, an ensemble method which combines multiple machine learning algorithms together is a good fit to use as it makes more accurate predictions than any individual model as shown in my analysis section above.

6. Conclusion

In conclusion the Random Forest Regressor delivered a much lower mean absolute error than the LSTM or Linear Regression for both Microsoft and Google stocks. I also observed that tuning the parameters for LSTM (e.g. the number of epochs and batch size) resulted in better prediction but this could take some time.

When exploring the data, it was interesting to see how the stock prices of different companies changed due to the pandemic and how the technological companies stock prices bounced back more quickly than the other companies considered. It was also interesting to see how Pfizer stocks improved as the vaccine rollout began.

Potential Improvements

Some potential improvement to my work could be the following:

For a more detailed analysis including the code, check out the?GitHub?page.

Take some significant time to tune the model parameters as well as include more features that might be relevant for stock price prediction.
Try out more models and see if there might be one with a better performance than Random Forest Regression. I only tried three models for simplicity and time constraints.
Explore other companies stocks to see how well one can predict their stocks prices with different models.

要查看或添加评论，请登录

Vishwajeet Singh Rana的更多文章

What is an AI Agent? A Comprehensive Guide

2024年9月22日

What is an AI Agent? A Comprehensive Guide

Artificial Intelligence (AI) has been a transformative force across various industries, revolutionizing how we interact…
Top 20 AI Job Roles for 2025: Roles, Responsibilities, and Salary Ranges

2024年9月21日

Top 20 AI Job Roles for 2025: Roles, Responsibilities, and Salary Ranges

As Artificial Intelligence (AI) continues to revolutionize industries worldwide, the demand for skilled professionals…

1 条评论
What is Linear Regression ???

2023年2月12日

What is Linear Regression ???

Linear regression is a statistical method used to model the relationship between a dependent variable and one or more…
Data Scientists Vs. ML Engineers

2021年9月1日

Data Scientists Vs. ML Engineers

There’s often confusion between the roles of Data Scientists and Machine Learning Engineers. Although they certainly…
What is Knowledge-based Translation in Artificial Intelligence?

2021年8月28日

What is Knowledge-based Translation in Artificial Intelligence?

Artificial Intelligence is a term which we all are familiar with and have often come across in some form or the other…
Top 3 Data Science Books for Absolute Beginners

2021年8月26日

Top 3 Data Science Books for Absolute Beginners

“There are plenty of resources, and I’m confused if I’m on the right path. Could you please help me out?” She worriedly…
Top 8 Skills To Become Data Analyst in 2022

2021年8月25日

Top 8 Skills To Become Data Analyst in 2022

First of all, what is data analytics? Data analytics is the process of analyzing, organizing, and transforming data to…
9 Simple Pandas & Numpy Functions that will Speed up your Exploratory Data Analysis

2021年8月23日

9 Simple Pandas & Numpy Functions that will Speed up your Exploratory Data Analysis

Exploratory Data Analysis (EDA) can be an essential part of your data science process. I want to emphasize the work…
Top 5 Data Science certifications to know about in 2021

2021年8月21日

Top 5 Data Science certifications to know about in 2021

The growing popularity of MOOCs is an undeniable fact. There is evidence of a growing number of corporates using MOOCs…

1 条评论
Unable to Land a Data Science Job? Here’s Why.

2021年8月19日

Unable to Land a Data Science Job? Here’s Why.

People often come to me for data science advice, and this is one of the most common messages I get. Individuals from…

See all articles

Stock Price Predictor using Python

Vishwajeet Singh Rana

Senior Software Engineer (AI, Computer Vision and Video Analytics) at Collins Aerospace, Raytheon Technologies

1. Project Overview

2. Problem Statement

3. Metrics

4. Data Exploratory Analysis and Visualization

Normalizing the Data

Cumulative Returns

Rolling mean and Bollinger Bands

Daily Returns

5. Modelling Methodology and Results

Prediction using Long Short-Term Memory (LSTM):

领英推荐

Refinement

Prediction using Linear Regression

Prediction using Random Forest Regression

Model Evaluation and Results

Justification

6. Conclusion

Potential Improvements

Vishwajeet Singh Rana的更多文章

社区洞察

其他会员也浏览了

Python Practice Project : Netflix Stock Data Analysis | Investing Insights | Patterns | Trends | Forecasting

Unlocking the Power of Synthetic Data - How Python Faker Package Might be Changing the Game for Data Scientists

Move Faster your ML Pipeline

Document Splitting

Introduction to Quant Investing with Python

Understanding the essential Data Processing libraries

Tidy Production Pandas with Hamilton

MarkItDown: A Powerful Tool for Converting Data to Markdown for LLM Applications

Summarization with LLMs: A Comprehensive Guide

Stock Analysis and Prediction Using Python: A Step-by-Step Guide

1. Project Overview

2. Problem Statement

3. Metrics

4. Data Exploratory Analysis and Visualization

Normalizing the Data

Cumulative Returns

Rolling mean and Bollinger Bands

Daily Returns

5. Modelling Methodology and Results

Prediction using Long Short-Term Memory (LSTM):

领英推荐

Refinement

Prediction using Linear Regression

Prediction using Random Forest Regression

Model Evaluation and Results

Justification

6. Conclusion

Potential Improvements

Vishwajeet Singh Rana的更多文章

What is an AI Agent? A Comprehensive Guide

Top 20 AI Job Roles for 2025: Roles, Responsibilities, and Salary Ranges

What is Linear Regression ???

Data Scientists Vs. ML Engineers

What is Knowledge-based Translation in Artificial Intelligence?

Top 3 Data Science Books for Absolute Beginners

Top 8 Skills To Become Data Analyst in 2022

9 Simple Pandas & Numpy Functions that will Speed up your Exploratory Data Analysis

Top 5 Data Science certifications to know about in 2021

Unable to Land a Data Science Job? Here’s Why.

社区洞察

其他会员也浏览了

Python Practice Project : Netflix Stock Data Analysis | Investing Insights | Patterns | Trends | Forecasting

Unlocking the Power of Synthetic Data - How Python Faker Package Might be Changing the Game for Data Scientists

Move Faster your ML Pipeline

Document Splitting

Introduction to Quant Investing with Python

Understanding the essential Data Processing libraries

Tidy Production Pandas with Hamilton

MarkItDown: A Powerful Tool for Converting Data to Markdown for LLM Applications

Summarization with LLMs: A Comprehensive Guide

Stock Analysis and Prediction Using Python: A Step-by-Step Guide