Stock Price Prediction using Machine Learning
Prakhar Prakash
Motilal Oswal | SPJIMR PGDM 2023-2025 | Runner Up, SCPC-Henkel DX Hackathon | Centre for Financial Innovation | Ex-Quantitative Trader at SMC Group | DU 2018 | NUS 2019 | Stock Market Enthusiast
In this article, I'll put some Machine Learning models into action and evaluate their performance. Let's see if deploying a complex model guarantees a better result or not.
There are several Machine Learning models to choose from if you want to work on stock price prediction, sometimes using ML to predict stock prices can be difficult as stock prices are very dynamic and evolving. Today I'm going to evaluate the performance of three models- Linear Regression, Decision Tree Regression, and K Nearest Neighbours(KNN). So without further ado, let's get started.
So the very first thing to do is to read the data. Once the data is read, let's quickly print the closing prices to see if there are any missing/bad values.
Looking at the values we can see that there are no irregular values. We are good to go ahead. In this analysis, I will be using open price as the independent variable and close prices as the dependent variable. So let's have a quick look at our data frame first
Our data frame looks fine. The column containing the open prices(independent variable) will be denoted by X and the column containing close prices(dependent variable) will be denoted by y. Now we are all set. Let's first apply and evaluate how well Linear Regression does in predicting close prices. First, we will have to split the data into a training set and test set. The training set is the set which the model will use to learn the correlation between the dependent and independent variable while the test set is the set on which we will apply the model and see if it is able to predict the close price given the open price. We will keep 80% of the data as training data and 20% as test data.
Once the data is split, we will fit Linear Regression to the training set. This will train the model and prepare it for prediction. Once the model is fit, we will call the predict function and pass in the open price values of the test set and obtain the predicted values of the close price.
We will have all the predicted values of the close price in y_pred while the actual values will be in y_test. Now let's see how well our model has performed, for that we will plot the Predicted v/s Real values on a graph and see how smooth the curve is. This curve looks really smooth, we can safely assume that the model has really performed well in predicting prices.
Now let's move ahead and see how well Decision Tree Regression performs in predicting close prices. The initial coding will be the same as done in Linear Regression. Once the data is split into the test set and the training set we will have to do feature scaling to standardise the variables. Now we are ready to fit the Decision Tree Regression model into our training data set. Once the model is fit on the training data set, we are ready to predict values of the close price. We will just call the predict function and pass open price values of the test set and obtain the predicted values of the close price.
All the predicted values will be stored in y_pred. Like we did with Linear Regression, let's move ahead and plot a Real v/s Predicted values curve and see how well our model has performed. This curve looks smooth but for some values. The value at (760,920) looks way off and the values towards the higher end of the graph look a bit more scattered than in Linear Regression, these might adversely affect the performance result of the model.
Now let's move ahead and see how well KNN performs in predicting close prices. The code used in KNN will be very much simillar to the one used in Decision Tree Regression. After the regular splitting of data and feature scalling, we will fit the KNN model into our training set. Once the model is fit on the training data set, we are ready to predict values of the close price. We will just call the predict function and pass open price values of the test set and obtain the predicted values of the close price.
It is wortwile to note that across all the models, 20% of the data has been kept for the test set and 80% of data has been kept for training set. Let's move ahead and plot a Real v/s Predicted values curve and see how well our model has performed. Just like in the case of Decision Tree Regression, some of the values look scattered and may affect the overall performace of the model.
So now that we have used all the 3 models for stock price prediction, it is time to see which model has performed the best. To do that, we will be calculating the mean absolute error betweeen the predicted close price values and the actual close price values. We have the following values of Mean Absolute Error(MAE) for all the models used-
AND THE WINNER IS - LINEAR REGRESSION. With a MAE of 7.63, Linear Regression occupies the first position. KNN and DTR had simillar curves and hence their MAE values are more or less simillar. Visually we can clearly see that Linear Regression had the slimmest curve and hence it would be the best among all the models.
I hope you've liked my work. Please let me know in the comment section if you want me to apply any other ML model for stock price prediction. Any feedbacks would be highly appreciatd.
Software Engineer at SVAM International Inc.
3 年Great job
Associate at a leading financial services firm
3 年Thanks for sharing !