登录查看更多内容

Stock Market Prediction: A Practical Guide for choosing models

Harsha G.

Senior Director @ NTT Data | Automotive SME | Pre Sales | SaaS |

发布日期: 2024年8月19日

Predicting stock market trends is a challenge, but with machine learning, we can achieve valuable insights. In this article, we'll explore how various models would fare ( precision score) to predict stock market movements, and discuss how each model that could enhance your prediction accuracy.

The goal is to predict stock price movement using historical data. The RandomForestClassifier, a robust ensemble learning method, is employed to classify whether the stock price will increase or decrease.

Python code to import the stock price using Yahoo finance module for maximum period available.

import yfinance as yf
sp500 = yf.Ticker("^GSPC")
# query the historical prices
sp500 = sp500.history(period="max")

I started with a line plot to check closing prices and trend is as below

In the next step, I removed the 2 columns which were not necessary and created a new column called Tomorrow which contains yesterday closing price.

# remove the last 2 columns
del sp500["Dividends"]
del sp500["Stock Splits"]
# setup the target
sp500["Tomorrow"] = sp500["Close"].shift(-1)

Created a new column called target and check when next day opening was greater than last day closing

sp500["Target"] = (sp500["Tomorrow"] > sp500["Close"]).astype(int)

I realised that there was too much of data , so I removed the data before 1990

sp500 = sp500.loc["1990-01-01":].copy()

In this section, I called the RandomForestClassifier model. I initialized the model, fit the data into the model and started making predictions using the training dataset.

领英推荐

Simple Linear Regression

360DigiTMG 11 个月前

Everything About Decision Tree From Scratch

Learnbay 2 年前

Simplifying key Data Science Concepts! (drafted by Dr…

Dr. Ratika Datta 2 年前

#training initial model using RandomForestClassifier

from sklearn.ensemble import RandomForestClassifier


model = RandomForestClassifier(n_estimators=100, min_samples_split=100, random_state=1)

#train and test test for time series data
train = sp500.iloc[:-100]
test = sp500.iloc[-100:]

#state the predictors
predictors = ["Close", "Volume", "Open", "High", "Low"]

#fit the model using predictors columns and train the Target column
model.fit(train[predictors], train["Target"])

The model is trained on features extracted from the historical stock prices and we need to improve the accuracy score.

from sklearn.metrics import precision_score
preds = model.predict(test[predictors])

import pandas as pd
#turn into a pandas series
preds = pd.Series(preds, index=test.index)
precision_score(test["Target"], preds)

#build a backtesting system

def predict(train, test, predictors, model):
    model.fit(train[predictors], train["Target"])
    preds = model.predict(test[predictors])
    preds = pd.Series(preds,
                      index=test.index,
                      name="Predictions")
    combined = pd.concat([test["Target"], preds], axis=1)
    return combined

# create a backtest function
def backtest(data, model, predictors, start=2500, step=250):
    all_predictions = []

    for i in range(start, data.shape[0], step):
        train = data.iloc[0:i].copy()
        test = data.iloc[i:(i+step)].copy()
        predictions = predict(train, test, predictors, model)
        all_predictions.append(predictions)

    return pd.concat(all_predictions)

predictions = backtest(sp500, model, predictors)

precision_score(predictions["Target"], predictions["Predictions"])

Precision score was 52% and in subsequent run, it improved upto 57%.

I ran the data using the 3 other models

Understanding the Models

The data used in our analysis consists of historical stock prices for the S&P 500. We trained each model on this data to predict whether the stock price would rise or fall the next day.

XGBoost: Known for its speed and accuracy, XGBoost is a go-to model for structured data. It performed well in our test, achieving a precision score of 61.90%. XGBoost’s ability to handle large datasets and its regularization techniques make it a strong candidate for stock market prediction.
Support Vector Classifier (SVM): SVMs are effective in high-dimensional spaces and are known for their ability to model non-linear data. In our test, the SVM achieved a precision score of 59.0%. While slightly lower than XGBoost, SVM remains a robust option, particularly when feature selection and scaling are well-optimized.
Gradient Boosting Classifier: This model builds trees sequentially, each one correcting the errors of the previous. The Gradient Boosting Classifier achieved a precision score of 61.11%, closely trailing XGBoost. It’s powerful but may require careful tuning to avoid overfitting.

Link to the github or click here for code

Conclusion

Among the 4 models, XGBoost delivered the highest precision, but Gradient Boosting was a close second. SVM also provided valuable insights, particularly in complex feature spaces. By experimenting with these models and fine-tuning them, you can potentially achieve even better results in stock market prediction.

What other models have you used for building the stock price simulation or predictor tool ?

Product Management Bytes

1,644 位关注者

Vivek Khandelwal

6 个月

Am sure your deep insights will be instrumental for many Harsha

2 次回应

要查看或添加评论，请登录

Harsha G.的更多文章

The Microservices Revolution in Software-Defined Vehicles (SDVs)

2025年3月5日

The Microservices Revolution in Software-Defined Vehicles (SDVs)

Imagine there's a single chef in the kitchen who does everything. They take orders, cook all the food, wash the dishes,…
From Coherence to Context: Redefine Your AI Chatbot Experience!

2025年2月11日

From Coherence to Context: Redefine Your AI Chatbot Experience!

As a first level implementation of the AI, its good to have RAG model which will ensure that chatbot can respond…

1 条评论
Communication in AUTOSAR world

2025年2月6日

Communication in AUTOSAR world

What is AUTOSAR? AUTOSAR is the acronym for Automotive Open System Architecture. It is an open and standardized…
Why Edge Computing is the NITRO/NOS for Fully Connected Vehicles

2024年11月26日

Why Edge Computing is the NITRO/NOS for Fully Connected Vehicles

The objective of this article is to introduce new distributed architecture patterns which are emerging or taking shape…
Auto updates - Oct 2024

2024年11月8日

Auto updates - Oct 2024

Its November 8 2024 and few more weeks before 2024 is done. We are looking at 2025 at our doorstep and planning horizon…
What is ADAS #1

2024年10月30日

What is ADAS #1

In last article, I explained about ADAS and spoke about various testing methods to be done before ADAS hits the road…
SDV updates

2024年10月11日

SDV updates

Software-Defined Vehicles (SDVs) are transforming how cars work by making them smarter, more connected, and better for…
What is ADAS?

2024年10月8日

What is ADAS?

ADAS stands for Advanced Driver Assistance Systems. ADAS can help you to keep the car in the same lane, stop if there’s…
The Future of Cars: Software-Driven Vehicles (SDVs)

2024年10月7日

The Future of Cars: Software-Driven Vehicles (SDVs)

We have seen the evolution of phone into smart phones Desktop being replaced by laptops since they are easier to carry…

2 条评论
Unlock the Power of Agentic AI: How It Solves Your Query

2024年9月6日

Unlock the Power of Agentic AI: How It Solves Your Query

OpenAI , Meta, Google are working on the next big thing in AI. Some of them are around the corner to release a new…

See all articles

Stock Market Prediction: A Practical Guide for choosing models

Harsha G.

Senior Director @ NTT Data | Automotive SME | Pre Sales | SaaS |

领英推荐

Understanding the Models

Conclusion

What other models have you used for building the stock price simulation or predictor tool ?

Leave a comment.

Product Management Bytes

1,644 位关注者

Harsha G.的更多文章

社区洞察

其他会员也浏览了

Using AI and ML for FP&A Forecasts

Building 10 Classifier ????Models in Machine?Learning + Notebook

Books I considered helpful

Practical Linear Regression with R: A case study on diamond prices

Approaching (Almost) Any Machine Learning Problem

Boost Your Machine Learning: Exploring XGBoost vs LightGBM

Mastering Machine Learning Insights from Day 2 of Exploration

Simple Linear Regression...made simple (Machine Learning Concepts)

AI and Python in Enhancing Business Analytics for Digital Transformation

Graph Search: Essential Techniques for Developers

领英推荐

Understanding the Models

Conclusion

What other models have you used for building the stock price simulation or predictor tool ?

Leave a comment.

Product Management Bytes

1,644 位关注者

Harsha G.的更多文章

The Microservices Revolution in Software-Defined Vehicles (SDVs)

From Coherence to Context: Redefine Your AI Chatbot Experience!

Communication in AUTOSAR world

Why Edge Computing is the NITRO/NOS for Fully Connected Vehicles

Auto updates - Oct 2024

What is ADAS #1

SDV updates

What is ADAS?

The Future of Cars: Software-Driven Vehicles (SDVs)

Unlock the Power of Agentic AI: How It Solves Your Query

社区洞察

其他会员也浏览了

Using AI and ML for FP&A Forecasts

Building 10 Classifier ????Models in Machine?Learning + Notebook

Books I considered helpful

Practical Linear Regression with R: A case study on diamond prices

Approaching (Almost) Any Machine Learning Problem

Boost Your Machine Learning: Exploring XGBoost vs LightGBM

Mastering Machine Learning Insights from Day 2 of Exploration

Simple Linear Regression...made simple (Machine Learning Concepts)

AI and Python in Enhancing Business Analytics for Digital Transformation

Graph Search: Essential Techniques for Developers