Unveiling the Future: A Comprehensive Analysis and Stacked LSTM Approach to Stock Price Prediction
Ihtisham Mehmood
Co-Founder @ DMC | Data Scientist | Generative AI | Agentic AI | MLOps | Data Analyst | MBA | BBA
Introduction
In the fast-paced world of financial markets, the ability to accurately predict stock prices remains a holy grail for investors and traders alike. Leveraging the power of data-driven insights and cutting-edge technology, I embarked on a project to forecast stock prices, employing meticulous data collection from yfinance and employing a multi-faceted analytical approach. The whole project is done in Python. you can access the python code here. You can also access a dashboard to check the stock performance the companies by clicking here
Data Collection and Analysis
The foundation of any successful stock prediction model lies in the quality of data collected and the depth of analysis performed. My journey began with yfinance, a robust data source that provided a comprehensive view of stock movements over various periods. Daily returns, pair plots, moving averages, and the relationship between risk and return were key focal points during the exploratory data analysis phase.
from pandas_datareader.data import DataReader
import yfinance as yf
from pandas_datareader import data as pdr
from datetime import datetime
yf.pdr_override()
end = datetime.now()
start = datetime(end.year-1,end.month,end.day)
tech_list = ['AAPL', 'GOOG', 'MSFT', 'AMZN']
end = datetime.now()
start = datetime(end.year - 1, end.month, end.day)
for stock in tech_list:
globals()[stock] = yf.download(stock, start, end)
company_list = [AAPL, GOOG, MSFT, AMZN]
company_name = ["APPLE", "GOOGLE", "MICROSOFT", "AMAZON"]
for company, com_name in zip(company_list, company_name):
company["company_name"] = com_name
df = pd.concat(company_list, axis=0)
df.tail(10)
In the following Python code, I utilized the pandas_datareader library in conjunction with Yahoo Finance's yfinance module to fetch historical stock data for four tech giants: Apple (AAPL), Google (GOOG), Microsoft (MSFT), and Amazon (AMZN). I specified the time frame to retrieve data from the past year, starting from the current date. The loop downloads the stock data for each company and assigns it to corresponding variables. Subsequently, I created a list of the downloaded data for these companies and assigned user-friendly names to each one. By using the concat function from the Pandas library, I combined the data into a single DataFrame. To enhance readability, I added a new column, "company_name," to identify the source company for each row. This code efficiently collects and organizes stock data, offering a concise and structured approach to financial analysis for the specified tech companies.
Stock Movement by Periods
This analysis explores the connection between the risk (volatility or uncertainty) associated with an investment and the expected return.
Understanding this relationship is crucial for investors to make informed decisions based on their risk tolerance and return expectations.
plt.figure(figsize=(16,10))
plt.subplots_adjust(top=1.50,bottom=1.4)
for i, company in enumerate (company_list,1):
plt.subplot(2,2,i)
company['Adj Close'].plot()
plt.ylabel('Adj Close')
plt.xlabel(None)
plt.title(f"Closing Price of {tech_list[i - 1]}")
plt.tight_layout()
This following code helps to create the above Trend line for big 4 tech companies
Average Daily Price
In this Python code, I utilized the Pandas and Matplotlib libraries to analyze the average daily price and daily returns of four major tech companies: Apple (AAPL), Google (GOOG), Microsoft (MSFT), and Amazon (AMZN).
import pandas as pd
import matplotlib.pyplot as plt
company_data = [(AAPL, 'APPLE'), (GOOG, 'GOOGLE'), (MSFT, 'MICROSOFT'), (AMZN, 'AMAZON')]
fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(15, 10))
for i, (company, name) in enumerate(company_data):
company['Daily Return'] = company['Adj Close'].pct_change()
ax = axes[i // 2, i % 2]
company['Daily Return'].plot(ax=ax, legend=True, linestyle='--', marker='o')
ax.set_title(name)
fig.tight_layout()
plt.show()
The code calculates the daily return for each company by applying the percentage change to the adjusted closing prices. Subsequently, it generates a 2x2 grid of subplots, each representing one of the companies, and plots their respective daily return trends over time. This visualization aids in comparing the volatility and performance of the companies' stocks. The code offers a concise and effective way to assess the daily return patterns for these tech giants using Python's data analysis and visualization capabilities.
Creation of Pair Plot
A pair plot is a visual representation of pairwise relationships in a dataset. In the context of stock analysis, a pair plot can reveal correlations or trends between different variables, helping identify potential factors influencing stock prices.
sns.pairplot(tech_rets,kind='reg')
plt.savefig('pairplot.png')
plt.savefig('Pairplot.png')
The following result will be shown after executing the this code snippet
Correlation(Stock Returns , Stock Closing price)
plt.figure(figsize=(10,8))
plt.subplot(2,2,1)
sns.heatmap(tech_rets.corr(),annot=True)
plt.title('Daily stock returns')
plt.subplot(2,2,2)
sns.heatmap(closing_df.corr(),annot=True)
plt.title('closing price correlation')
So, in this bit of Python code, I'm using Matplotlib and Seaborn to create a 2x2 picture grid. The first picture (top-left) shows a heatmap, which is like a colorful table, indicating how much the daily stock returns of these tech companies are related to each other. The second picture (top-right) does the same thing but for the closing prices of the stocks instead. The colors and numbers help us quickly see if the stock prices or returns tend to move together or in opposite directions.
The Risk-Return Analysis
import matplotlib.cm as cm
rets = tech_rets.dropna()
area = np.pi * 20
cmap = cm.viridis
plt.figure(figsize=(10, 8), facecolor='white')
plt.scatter(rets.mean(), rets.std(), s=area, c=cmap(rets.mean()), marker='o')
plt.grid(True)
plt.xlabel('Expected Return', fontsize=14)
plt.ylabel('Risk', fontsize=14)
plt.title('Risk-Return Scatter Plot for Tech Stocks')
for label, x, y in zip(rets.columns, rets.mean(), rets.std()):
plt.annotate(label, xy=(x, y), xytext=(50, 50), textcoords='offset points', ha='right', va='bottom',
arrowprops=dict(arrowstyle='-', color='blue', connectionstyle='arc3,rad=-0.3'))
plt.show()
In this piece of code, I'm diving into a risk-return analysis for a set of tech stocks. The code creates a scatter plot where each dot represents a tech company. The x-axis shows the expected return, which is like the average gain or loss you might anticipate, and the y-axis represents risk, indicating how much the stock's price tends to swing. The plot gives a visual sense of which stocks offer potentially higher returns for a given level of risk.
As we look at the scatter plot, we're aiming for stocks in the top-left corner – these are the ones with higher expected returns and lower risk. The labels next to each dot tell us which company is which. The blue arrows show the general direction you'd prefer your investments to move in this space – up and to the left, indicating higher returns and lower risk.
Value At risk(VaR)?
Values at Risk (VaR) is a statistical measure used to quantify the potential loss on an investment within a given time frame and confidence level. In simpler terms, it helps investors understand the maximum amount they could lose with a certain degree of confidence.
rets = tech_rets.dropna()
std = rets.std()
VaR = (std *1.96)
print('VaR at 95% Confidence level in %:',
VaR)
领英推荐
Let's take AAPL as an example. A VaR of 3.04% at a 95% confidence level means that, based on historical data and statistical analysis, there is a 95% chance that Apple's daily returns will not exceed a loss of 3.04% over the specified time frame. In other words, under normal market conditions, we can reasonably expect that losses on our investment in Apple will not exceed 3.04% most of the time.
Stacked LSTM Approach(Deep Learning)
Long Short-Term Memory (LSTM) is a type of recurrent neural network (RNN) designed for processing sequential data Stacking multiple LSTM layers enhances the model's ability to capture complex patterns in time-series data, making it well-suited for predicting stock prices. The Stacked LSTM approach allows the model to learn both short-term and long-term dependencies in the data, improving its predictive capabilities.
df = pdr.get_data_yahoo('GOOG', start = '2012-01-01',end = datetime.now())
plt.Figure(figsize=(13,6))
plt.plot(df['Close'])
plt.xlabel('Date')
plt.ylabel('Close price IN USD')
# plt.savefig('goog.png')
plt.title('Google Stock movement')
plt.show()
We are using google stock data to predict the price movement in the futures
This graph show the stock Movement of Historical Data from 2012 to 2023. This trend line gives a summary of stock movement throughout years
data = df.filter(['Close'])
dataset = data.values
training_data_len = int(np.ceil(len(dataset)*.95))
training_data_len
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0,1))
scaled_data = scaler.fit_transform(dataset)
train_data = scaled_data[0:int(training_data_len),:]
x_train = []
y_train = []
for i in range(60,len(train_data)):
x_train.append(train_data[i-60:i,0])
y_train.append(train_data[i,0])
if i<=61:
print(x_train)
print(y_train)
print()
x_train, y_train = np.array(x_train), np.array(y_train)
x_train = np.reshape(x_train, (x_train.shape[0], x_train.shape[1], 1))
Here, I'm setting up data for training a machine learning model, probably for predicting stock prices. The main steps are:
Create & Compile
from keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Dropout
model = Sequential()
model.add(LSTM(100, return_sequences=True, input_shape= (x_train.shape[1], 1)))
model.add(LSTM(50, return_sequences=False))
model.add(Dropout(0.2))
model.add(Dense(1))
model.compile(optimizer='adam',loss='mean_squared_error')
model.fit(x_train,y_train,batch_size=1,epochs=1)
So, in this piece of code, I'm using a deep learning library called Keras to create a neural network model for predicting stock prices. The model consists of two Long Short-Term Memory (LSTM) layers, which are a type of neural network layer good at learning patterns over sequences, like time series data. The first LSTM layer has 100 units and takes in sequences of 60 past closing prices. The second LSTM layer has 50 units and doesn't return sequences, which means it focuses on learning overall patterns. I've added a dropout layer to avoid overfitting, meaning the model getting too specific to the training data. The final layer is a Dense layer with 1 unit, aiming to predict the next closing price. The model is trained using the Adam optimizer and the mean squared error loss function, which helps it learn how to make better predictions. I run this training for one epoch, meaning it goes through the entire dataset once. This neural network is essentially learning from the past 60 days of closing prices to predict the next one, and this process helps it get better at making predictions over time.
Testing the Model
test_data = scaled_data[training_data_len - 60:, :]
x_test = []
y_test = dataset[training_data_len:, :]
for i in range(60, len(test_data)):
x_test.append(test_data[i-60:i, 0])
x_test = np.array(x_test)
x_test = x_test.reshape(x_test.shape[0], x_test.shape[1], 1)
predictions = scaler.inverse_transform(model.predict(x_test))
rmse = np.sqrt(np.mean((predictions - y_test) ** 2))
rmse
RMSE: 3.92814294427371
The Root Mean Square Error (RMSE) value of 3.93 is an indicator of how well the neural network model is performing in predicting stock prices. RMSE is a measure of the average deviation between the predicted values and the actual values. In this context, an RMSE of 3.93 suggests that, on average, the model's predictions differ from the true stock prices by approximately 3.93 units.
To put it simply, the lower the RMSE, the better the model's predictions align with the actual stock prices. In this case, an RMSE of 3.93 indicates a reasonably accurate model, but it's important to consider the scale of the stock prices we are working with. If the average stock price is in the hundreds, a 3.93 RMSE might be acceptable.
The difference in predicted values compare to actual close stock values can been observed in the above table. Below is the graph that shows the
This visual comparison allows us to see how well the model's predictions align with the actual stock prices. If the green line closely follows the blue line, it suggests that the model is making accurate predictions.
Conclusion
In the ever-evolving landscape of financial markets, the quest for accurate stock price predictions requires a harmonious blend of meticulous data analysis and advanced modeling techniques. Through the lens of yfinance data, I navigated the complexities of stock movements, unraveling patterns, and establishing a solid foundation for predictive modeling.
The integration of LSTM approach marked a pivotal moment in the project, introducing a sophisticated tool to forecast stock prices with enhanced accuracy. As we tread further into the future of finance, this combination of traditional analytics and cutting-edge technology proves to be a formidable force, unlocking new possibilities for investors and traders alike.
Visit for more: