登录查看更多内容

Unveiling the Future: A Comprehensive Analysis and Stacked LSTM Approach to Stock Price Prediction

Ihtisham Mehmood

Co-Founder @ DMC | Data Scientist | Generative AI | Agentic AI | MLOps | Data Analyst | MBA | BBA

发布日期: 2023年11月10日

Introduction

In the fast-paced world of financial markets, the ability to accurately predict stock prices remains a holy grail for investors and traders alike. Leveraging the power of data-driven insights and cutting-edge technology, I embarked on a project to forecast stock prices, employing meticulous data collection from yfinance and employing a multi-faceted analytical approach. The whole project is done in Python. you can access the python code here. You can also access a dashboard to check the stock performance the companies by clicking here

Data Collection and Analysis

The foundation of any successful stock prediction model lies in the quality of data collected and the depth of analysis performed. My journey began with yfinance, a robust data source that provided a comprehensive view of stock movements over various periods. Daily returns, pair plots, moving averages, and the relationship between risk and return were key focal points during the exploratory data analysis phase.

from pandas_datareader.data import DataReader
import yfinance as yf
from  pandas_datareader import data as pdr
from datetime import datetime

yf.pdr_override()

end = datetime.now()
start = datetime(end.year-1,end.month,end.day)
tech_list = ['AAPL', 'GOOG', 'MSFT', 'AMZN']

end = datetime.now()
start = datetime(end.year - 1, end.month, end.day)

for stock in tech_list:
    globals()[stock] = yf.download(stock, start, end)
    
company_list = [AAPL, GOOG, MSFT, AMZN]
company_name = ["APPLE", "GOOGLE", "MICROSOFT", "AMAZON"]

for company, com_name in zip(company_list, company_name):
    company["company_name"] = com_name
    
df = pd.concat(company_list, axis=0)
df.tail(10)

In the following Python code, I utilized the pandas_datareader library in conjunction with Yahoo Finance's yfinance module to fetch historical stock data for four tech giants: Apple (AAPL), Google (GOOG), Microsoft (MSFT), and Amazon (AMZN). I specified the time frame to retrieve data from the past year, starting from the current date. The loop downloads the stock data for each company and assigns it to corresponding variables. Subsequently, I created a list of the downloaded data for these companies and assigned user-friendly names to each one. By using the concat function from the Pandas library, I combined the data into a single DataFrame. To enhance readability, I added a new column, "company_name," to identify the source company for each row. This code efficiently collects and organizes stock data, offering a concise and structured approach to financial analysis for the specified tech companies.

Stock Movement by Periods

https://github.com/Ihtishammehmood/Python/blob/18eb5fe6f67fea68414822ca122753be7b252bef/LSTM%20Model%20For%20Stock%20Forecasting%20.ipynb

This analysis explores the connection between the risk (volatility or uncertainty) associated with an investment and the expected return.

Understanding this relationship is crucial for investors to make informed decisions based on their risk tolerance and return expectations.

plt.figure(figsize=(16,10))
plt.subplots_adjust(top=1.50,bottom=1.4)
for i, company in enumerate (company_list,1):
    plt.subplot(2,2,i)
    company['Adj Close'].plot()
    plt.ylabel('Adj Close')
    plt.xlabel(None)
    plt.title(f"Closing Price of {tech_list[i - 1]}")
plt.tight_layout()

This following code helps to create the above Trend line for big 4 tech companies

Average Daily Price

In this Python code, I utilized the Pandas and Matplotlib libraries to analyze the average daily price and daily returns of four major tech companies: Apple (AAPL), Google (GOOG), Microsoft (MSFT), and Amazon (AMZN).

import pandas as pd
import matplotlib.pyplot as plt


company_data = [(AAPL, 'APPLE'), (GOOG, 'GOOGLE'), (MSFT, 'MICROSOFT'), (AMZN, 'AMAZON')]


fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(15, 10))

for i, (company, name) in enumerate(company_data):

    company['Daily Return'] = company['Adj Close'].pct_change()

    
    ax = axes[i // 2, i % 2]
    company['Daily Return'].plot(ax=ax, legend=True, linestyle='--', marker='o')
    ax.set_title(name)

fig.tight_layout()

plt.show()

The code calculates the daily return for each company by applying the percentage change to the adjusted closing prices. Subsequently, it generates a 2x2 grid of subplots, each representing one of the companies, and plots their respective daily return trends over time. This visualization aids in comparing the volatility and performance of the companies' stocks. The code offers a concise and effective way to assess the daily return patterns for these tech giants using Python's data analysis and visualization capabilities.

https://github.com/Ihtishammehmood/Python/blob/173d5d786085d9c62bb5b19355ab6fc9731395d6/LSTM%20Model%20For%20Stock%20Forecasting%20.ipynb

Creation of Pair Plot

A pair plot is a visual representation of pairwise relationships in a dataset. In the context of stock analysis, a pair plot can reveal correlations or trends between different variables, helping identify potential factors influencing stock prices.

sns.pairplot(tech_rets,kind='reg')
plt.savefig('pairplot.png')
plt.savefig('Pairplot.png')

The following result will be shown after executing the this code snippet

Correlation(Stock Returns , Stock Closing price)

plt.figure(figsize=(10,8))

plt.subplot(2,2,1)
sns.heatmap(tech_rets.corr(),annot=True)
plt.title('Daily stock returns')

plt.subplot(2,2,2)
sns.heatmap(closing_df.corr(),annot=True)
plt.title('closing price correlation')

So, in this bit of Python code, I'm using Matplotlib and Seaborn to create a 2x2 picture grid. The first picture (top-left) shows a heatmap, which is like a colorful table, indicating how much the daily stock returns of these tech companies are related to each other. The second picture (top-right) does the same thing but for the closing prices of the stocks instead. The colors and numbers help us quickly see if the stock prices or returns tend to move together or in opposite directions.

The Risk-Return Analysis

import matplotlib.cm as cm


rets = tech_rets.dropna()

area = np.pi * 20


cmap = cm.viridis

plt.figure(figsize=(10, 8), facecolor='white')
plt.scatter(rets.mean(), rets.std(), s=area, c=cmap(rets.mean()), marker='o')
plt.grid(True)
plt.xlabel('Expected Return', fontsize=14)
plt.ylabel('Risk', fontsize=14)
plt.title('Risk-Return Scatter Plot for Tech Stocks')

for label, x, y in zip(rets.columns, rets.mean(), rets.std()):
    plt.annotate(label, xy=(x, y), xytext=(50, 50), textcoords='offset points', ha='right', va='bottom', 
                 arrowprops=dict(arrowstyle='-', color='blue', connectionstyle='arc3,rad=-0.3'))

plt.show()

In this piece of code, I'm diving into a risk-return analysis for a set of tech stocks. The code creates a scatter plot where each dot represents a tech company. The x-axis shows the expected return, which is like the average gain or loss you might anticipate, and the y-axis represents risk, indicating how much the stock's price tends to swing. The plot gives a visual sense of which stocks offer potentially higher returns for a given level of risk.

As we look at the scatter plot, we're aiming for stocks in the top-left corner – these are the ones with higher expected returns and lower risk. The labels next to each dot tell us which company is which. The blue arrows show the general direction you'd prefer your investments to move in this space – up and to the left, indicating higher returns and lower risk.

Value At risk(VaR)?

Values at Risk (VaR) is a statistical measure used to quantify the potential loss on an investment within a given time frame and confidence level. In simpler terms, it helps investors understand the maximum amount they could lose with a certain degree of confidence.

rets = tech_rets.dropna()
std = rets.std()
VaR = (std *1.96)
print('VaR at 95% Confidence level in %:',
      VaR)

领英推荐

Data Science Portfolios, Speeding Up Python, KANs, and…

Towards Data Science 9 个月前

7 Data Science Trends for 2023, Top ODSC Recordings…

Open Data Science Conference (ODSC) 2 年前

Polars Vs Pandas: Benchmarking performances and beyond

Machine Learning Reply GmbH 1 年前

Let's take AAPL as an example. A VaR of 3.04% at a 95% confidence level means that, based on historical data and statistical analysis, there is a 95% chance that Apple's daily returns will not exceed a loss of 3.04% over the specified time frame. In other words, under normal market conditions, we can reasonably expect that losses on our investment in Apple will not exceed 3.04% most of the time.

Stacked LSTM Approach(Deep Learning)

Long Short-Term Memory (LSTM) is a type of recurrent neural network (RNN) designed for processing sequential data Stacking multiple LSTM layers enhances the model's ability to capture complex patterns in time-series data, making it well-suited for predicting stock prices. The Stacked LSTM approach allows the model to learn both short-term and long-term dependencies in the data, improving its predictive capabilities.

df = pdr.get_data_yahoo('GOOG', start = '2012-01-01',end = datetime.now())

plt.Figure(figsize=(13,6))
plt.plot(df['Close'])
plt.xlabel('Date')
plt.ylabel('Close price IN USD')
# plt.savefig('goog.png')
plt.title('Google Stock movement')
plt.show()

We are using google stock data to predict the price movement in the futures

This graph show the stock Movement of Historical Data from 2012 to 2023. This trend line gives a summary of stock movement throughout years

data = df.filter(['Close'])
dataset = data.values
training_data_len = int(np.ceil(len(dataset)*.95))
training_data_len

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)

train_data = scaled_data[0:int(training_data_len),:]
x_train = []
y_train = []

for i in range(60,len(train_data)):
    x_train.append(train_data[i-60:i,0])
    y_train.append(train_data[i,0])

if i<=61:
    print(x_train)
    print(y_train)
    print()

x_train, y_train = np.array(x_train), np.array(y_train)

x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))

Here, I'm setting up data for training a machine learning model, probably for predicting stock prices. The main steps are:

I start by picking the closing prices of the stocks from the DataFrame and put them in a new variable called data.
Then, I convert this data into a NumPy array (called dataset). It's like making a list of all the closing prices.
I decide that I'll use 95% of the data for training the model. So, I figure out how many data points that is and call it training_data_len.
Before using this data to train a machine learning model, I scale it using a Min-Max scaler. This just adjusts the values to be between 0 and 1, making it easier for the model to learn.
I then create the actual training sets (x_train and y_train). This is like making pairs of inputs and outputs for the model to learn from. For each element in the training data, I take the previous 60 closing prices as the input (x_train), and the next closing price as the output (y_train).
Just to check if everything is working well, I print the x_train and y_train for the first iteration.
Finally, I reshape the x_train data to make sure it's in a format that the machine learning model will understand. It's a three-dimensional array where the first dimension is the number of data points, the second is the number of time steps (60 in this case, representing the past 60 days), and the third is the number of features (1 in this case, which is the closing price).In simple terms, this code is preparing data to teach a computer how to predict future stock prices based on past closing prices. The idea is that the patterns in the past might help the computer predict what could happen next.

Create & Compile

from keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout

model = Sequential()
model.add(LSTM(100, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences=False))
model.add(Dropout(0.2))

model.add(Dense(1))

model.compile(optimizer='adam',loss='mean_squared_error')
model.fit(x_train,y_train,batch_size=1,epochs=1)

So, in this piece of code, I'm using a deep learning library called Keras to create a neural network model for predicting stock prices. The model consists of two Long Short-Term Memory (LSTM) layers, which are a type of neural network layer good at learning patterns over sequences, like time series data. The first LSTM layer has 100 units and takes in sequences of 60 past closing prices. The second LSTM layer has 50 units and doesn't return sequences, which means it focuses on learning overall patterns. I've added a dropout layer to avoid overfitting, meaning the model getting too specific to the training data. The final layer is a Dense layer with 1 unit, aiming to predict the next closing price. The model is trained using the Adam optimizer and the mean squared error loss function, which helps it learn how to make better predictions. I run this training for one epoch, meaning it goes through the entire dataset once. This neural network is essentially learning from the past 60 days of closing prices to predict the next one, and this process helps it get better at making predictions over time.

Testing the Model


test_data = scaled_data[training_data_len - 60:, :]


x_test = []
y_test = dataset[training_data_len:, :]

for i in range(60, len(test_data)):
    x_test.append(test_data[i-60:i, 0])


x_test = np.array(x_test)
x_test = x_test.reshape(x_test.shape[0], x_test.shape[1], 1)


predictions = scaler.inverse_transform(model.predict(x_test))


rmse = np.sqrt(np.mean((predictions - y_test) ** 2))
rmse

RMSE: 3.92814294427371

The Root Mean Square Error (RMSE) value of 3.93 is an indicator of how well the neural network model is performing in predicting stock prices. RMSE is a measure of the average deviation between the predicted values and the actual values. In this context, an RMSE of 3.93 suggests that, on average, the model's predictions differ from the true stock prices by approximately 3.93 units.

To put it simply, the lower the RMSE, the better the model's predictions align with the actual stock prices. In this case, an RMSE of 3.93 indicates a reasonably accurate model, but it's important to consider the scale of the stock prices we are working with. If the average stock price is in the hundreds, a 3.93 RMSE might be acceptable.

The difference in predicted values compare to actual close stock values can been observed in the above table. Below is the graph that shows the

This visual comparison allows us to see how well the model's predictions align with the actual stock prices. If the green line closely follows the blue line, it suggests that the model is making accurate predictions.

Conclusion

In the ever-evolving landscape of financial markets, the quest for accurate stock price predictions requires a harmonious blend of meticulous data analysis and advanced modeling techniques. Through the lens of yfinance data, I navigated the complexities of stock movements, unraveling patterns, and establishing a solid foundation for predictive modeling.

The integration of LSTM approach marked a pivotal moment in the project, introducing a sophisticated tool to forecast stock prices with enhanced accuracy. As we tread further into the future of finance, this combination of traditional analytics and cutting-edge technology proves to be a formidable force, unlocking new possibilities for investors and traders alike.

Click here to access complete python source code.

Click here to Access dashboard

Visit for more:

要查看或添加评论，请登录

Ihtisham Mehmood的更多文章

Most Important Algorithm In Machine Learning

2024年9月27日

Most Important Algorithm In Machine Learning

Backpropagation is an algorithm used to train artificial neural networks by adjusting the weights and biases to…
Neural Makeover:The Science of Rewiring Your Brain

2024年5月31日

Neural Makeover:The Science of Rewiring Your Brain

Have you at any point felt trapped in a hopeless cycle, annoyed by tendencies, or thought designs that appear to be…
Microsoft Fabric

2024年2月19日

Microsoft Fabric

In today's data-driven landscape, enterprises seek robust analytics solutions that streamline data management…
Title: Mastering Tactical Empathy: A Strategic Approach to Professional Success

2024年1月22日

Title: Mastering Tactical Empathy: A Strategic Approach to Professional Success

In the dynamic landscape of today's professional world, effective communication, conflict resolution, and leadership…
Leapfrogging the Learning Curve: How Transfer Learning Supercharges CNNs

2024年1月4日

Leapfrogging the Learning Curve: How Transfer Learning Supercharges CNNs

Introduction In the realm of Artificial Intelligence (AI) and machine learning, Transfer Learning stands as a…
Power of Design Thinking

2023年12月31日

Power of Design Thinking

In a world where innovation reigns supreme, the art of problem-solving has taken on a new guise – one that champions…
Customer Churn: A Pressing Concern for Businesses

2023年11月30日

Customer Churn: A Pressing Concern for Businesses

In the competitive realm of business, customer retention stands as a crucial pillar of success. While attracting new…
Anomaly | Fraud Detection

2023年10月30日

Anomaly | Fraud Detection

Fraudulent activities in various domains have become increasingly sophisticated, making it imperative for organizations…
Powering Predictive Precision: XGBoost and LightGBM

2023年10月19日

Powering Predictive Precision: XGBoost and LightGBM

In the ever-evolving landscape of machine learning and data science, the arsenal of tools available to data scientists…
Predicting Potential Hazardous Asteroids (PHA) with Machine Learning - A Random Forest Approach

2023年9月11日

Predicting Potential Hazardous Asteroids (PHA) with Machine Learning - A Random Forest Approach

The cosmos, with its celestial wonders, has always captured our imagination. However, it also presents a lurking danger…

See all articles

Unveiling the Future: A Comprehensive Analysis and Stacked LSTM Approach to Stock Price Prediction

Ihtisham Mehmood

Co-Founder @ DMC | Data Scientist | Generative AI | Agentic AI | MLOps | Data Analyst | MBA | BBA

Introduction

Data Collection and Analysis

Correlation(Stock Returns , Stock Closing price)

The Risk-Return Analysis

Value At risk(VaR)?

领英推荐

Stacked LSTM Approach(Deep Learning)

Create & Compile

Testing the Model

Conclusion

Ihtisham Mehmood的更多文章

社区洞察

其他会员也浏览了

Cost functions

Document Splitting

Introduction to Quant Investing with Python

Machine Learning fast-track: Telco Customer Churn Prediction

Summarization with LLMs: A Comprehensive Guide

Text Parsing in Python with US-Patent Data

The Usain Bolt of Data Processing, Pandas Lag Behind!

Leveraging People and Python in AI for Optimal Data Utilization

Unlocking Time Series Insights with TSFresh: A Python Guide

Time-Series-Analysis-with-Statsmodels - Chapter 3

Introduction

Data Collection and Analysis

Correlation(Stock Returns , Stock Closing price)

The Risk-Return Analysis

Value At risk(VaR)?

领英推荐

Stacked LSTM Approach(Deep Learning)

Create & Compile

Testing the Model

Conclusion

Ihtisham Mehmood的更多文章

Most Important Algorithm In Machine Learning

Neural Makeover:The Science of Rewiring Your Brain

Microsoft Fabric

Title: Mastering Tactical Empathy: A Strategic Approach to Professional Success

Leapfrogging the Learning Curve: How Transfer Learning Supercharges CNNs

Power of Design Thinking

Customer Churn: A Pressing Concern for Businesses

Anomaly | Fraud Detection

Powering Predictive Precision: XGBoost and LightGBM

Predicting Potential Hazardous Asteroids (PHA) with Machine Learning - A Random Forest Approach

社区洞察

其他会员也浏览了

Cost functions

Document Splitting

Introduction to Quant Investing with Python

Machine Learning fast-track: Telco Customer Churn Prediction

Summarization with LLMs: A Comprehensive Guide

Text Parsing in Python with US-Patent Data

The Usain Bolt of Data Processing, Pandas Lag Behind!

Leveraging People and Python in AI for Optimal Data Utilization

Unlocking Time Series Insights with TSFresh: A Python Guide

Time-Series-Analysis-with-Statsmodels - Chapter 3