Stock Market Prediction: A Practical Guide for choosing models
AI generated

Stock Market Prediction: A Practical Guide for choosing models

Predicting stock market trends is a challenge, but with machine learning, we can achieve valuable insights. In this article, we'll explore how various models would fare ( precision score) to predict stock market movements, and discuss how each model that could enhance your prediction accuracy.

The goal is to predict stock price movement using historical data. The RandomForestClassifier, a robust ensemble learning method, is employed to classify whether the stock price will increase or decrease.

Python code to import the stock price using Yahoo finance module for maximum period available.

import yfinance as yf
sp500 = yf.Ticker("^GSPC")
# query the historical prices
sp500 = sp500.history(period="max")        

I started with a line plot to check closing prices and trend is as below


In the next step, I removed the 2 columns which were not necessary and created a new column called Tomorrow which contains yesterday closing price.

# remove the last 2 columns
del sp500["Dividends"]
del sp500["Stock Splits"]
# setup the target
sp500["Tomorrow"] = sp500["Close"].shift(-1)        

Created a new column called target and check when next day opening was greater than last day closing

sp500["Target"] = (sp500["Tomorrow"] > sp500["Close"]).astype(int)        

I realised that there was too much of data , so I removed the data before 1990

sp500 = sp500.loc["1990-01-01":].copy()        

In this section, I called the RandomForestClassifier model. I initialized the model, fit the data into the model and started making predictions using the training dataset.


#training initial model using RandomForestClassifier

from sklearn.ensemble import RandomForestClassifier


model = RandomForestClassifier(n_estimators=100, min_samples_split=100, random_state=1)

#train and test test for time series data
train = sp500.iloc[:-100]
test = sp500.iloc[-100:]

#state the predictors
predictors = ["Close", "Volume", "Open", "High", "Low"]

#fit the model using predictors columns and train the Target column
model.fit(train[predictors], train["Target"])        

The model is trained on features extracted from the historical stock prices and we need to improve the accuracy score.

from sklearn.metrics import precision_score
preds = model.predict(test[predictors])

import pandas as pd
#turn into a pandas series
preds = pd.Series(preds, index=test.index)
precision_score(test["Target"], preds)

#build a backtesting system

def predict(train, test, predictors, model):
    model.fit(train[predictors], train["Target"])
    preds = model.predict(test[predictors])
    preds = pd.Series(preds,
                      index=test.index,
                      name="Predictions")
    combined = pd.concat([test["Target"], preds], axis=1)
    return combined

# create a backtest function
def backtest(data, model, predictors, start=2500, step=250):
    all_predictions = []

    for i in range(start, data.shape[0], step):
        train = data.iloc[0:i].copy()
        test = data.iloc[i:(i+step)].copy()
        predictions = predict(train, test, predictors, model)
        all_predictions.append(predictions)

    return pd.concat(all_predictions)

predictions = backtest(sp500, model, predictors)

precision_score(predictions["Target"], predictions["Predictions"])        

Precision score was 52% and in subsequent run, it improved upto 57%.

I ran the data using the 3 other models

Understanding the Models

The data used in our analysis consists of historical stock prices for the S&P 500. We trained each model on this data to predict whether the stock price would rise or fall the next day.

  1. XGBoost: Known for its speed and accuracy, XGBoost is a go-to model for structured data. It performed well in our test, achieving a precision score of 61.90%. XGBoost’s ability to handle large datasets and its regularization techniques make it a strong candidate for stock market prediction.
  2. Support Vector Classifier (SVM): SVMs are effective in high-dimensional spaces and are known for their ability to model non-linear data. In our test, the SVM achieved a precision score of 59.0%. While slightly lower than XGBoost, SVM remains a robust option, particularly when feature selection and scaling are well-optimized.
  3. Gradient Boosting Classifier: This model builds trees sequentially, each one correcting the errors of the previous. The Gradient Boosting Classifier achieved a precision score of 61.11%, closely trailing XGBoost. It’s powerful but may require careful tuning to avoid overfitting.

Link to the github or click here for code

Conclusion

Among the 4 models, XGBoost delivered the highest precision, but Gradient Boosting was a close second. SVM also provided valuable insights, particularly in complex feature spaces. By experimenting with these models and fine-tuning them, you can potentially achieve even better results in stock market prediction.

What other models have you used for building the stock price simulation or predictor tool ?

Leave a comment.


Vivek Khandelwal

CXO | LinkedIn Growth Specialist | Mentoring Senior Professionals & Young Entrants to the Workplace | Speaker | Culture & Skill Mentor | India's Top 35 Mentors Niti Aayog | Content Creator | Writer

6 个月

Am sure your deep insights will be instrumental for many Harsha

要查看或添加评论,请登录

Harsha G.的更多文章

  • The Microservices Revolution in Software-Defined Vehicles (SDVs)

    The Microservices Revolution in Software-Defined Vehicles (SDVs)

    Imagine there's a single chef in the kitchen who does everything. They take orders, cook all the food, wash the dishes,…

  • From Coherence to Context: Redefine Your AI Chatbot Experience!

    From Coherence to Context: Redefine Your AI Chatbot Experience!

    As a first level implementation of the AI, its good to have RAG model which will ensure that chatbot can respond…

    1 条评论
  • Communication in AUTOSAR world

    Communication in AUTOSAR world

    What is AUTOSAR? AUTOSAR is the acronym for Automotive Open System Architecture. It is an open and standardized…

  • Why Edge Computing is the NITRO/NOS for Fully Connected Vehicles

    Why Edge Computing is the NITRO/NOS for Fully Connected Vehicles

    The objective of this article is to introduce new distributed architecture patterns which are emerging or taking shape…

  • Auto updates - Oct 2024

    Auto updates - Oct 2024

    Its November 8 2024 and few more weeks before 2024 is done. We are looking at 2025 at our doorstep and planning horizon…

  • What is ADAS #1

    What is ADAS #1

    In last article, I explained about ADAS and spoke about various testing methods to be done before ADAS hits the road…

  • SDV updates

    SDV updates

    Software-Defined Vehicles (SDVs) are transforming how cars work by making them smarter, more connected, and better for…

  • What is ADAS?

    What is ADAS?

    ADAS stands for Advanced Driver Assistance Systems. ADAS can help you to keep the car in the same lane, stop if there’s…

  • The Future of Cars: Software-Driven Vehicles (SDVs)

    The Future of Cars: Software-Driven Vehicles (SDVs)

    We have seen the evolution of phone into smart phones Desktop being replaced by laptops since they are easier to carry…

    2 条评论
  • Unlock the Power of Agentic AI: How It Solves Your Query

    Unlock the Power of Agentic AI: How It Solves Your Query

    OpenAI , Meta, Google are working on the next big thing in AI. Some of them are around the corner to release a new…

社区洞察

其他会员也浏览了