Ordinary Least Squares (OLS) Regression - Estimate R/L between stock Average Price and SMA Value
Kamal K Chanchal
Experience in Building Trading Application (C++/C#) |Python| MYSQL
Ordinary Least Squares (OLS) is a way to figure out the line that best fits a bunch of data points. Imagine you have a scatterplot with dots all over it. You want to draw a straight line that comes as close as possible to all those dots. That’s what OLS does.
OLS looks at the distance between each dot and the line. It squares these distances (to get rid of negative signs) and adds them all up. Then, it tweaks the line a bit to minimize this total squared distance. The result is the best-fitting line.
Overview:
This project aims to analyze Reliance stock data using linear regression, particularly focusing on Ordinary Least Squares (OLS) method. The code calculates the best-fit line, determining the intercept and slope values. The data spans from January 1, 2020, until the present day, with a timeframe of one day (intraday).
This Python script performs Ordinary Least Squares (OLS) regression analysis using historical data of Reliance Stocks. It calculates the best intercept and slope values for a linear regression model.
The analysis is conducted between the average of Open, High, Low, and Close prices and the Simple Moving Average (SMA) trading indicator based on the average price.
Understand Equation Behind the OLS:
How to calculate Ordinary Least Squares (OLS)
3.Calculate the deviations: Subtract the mean of each variable from every data point.
For each data point (????,????) you’ll have:
4. Calculate the slope (m) :
5.Calculate the y-intercept (b) :
Use the formula: ??=??ˉ???×??ˉ
6.Plug the slope and intercept into your regression equation: Once you’ve found ??m and ??b, you can use them to write your regression equation:
??^=????+??
Let’s Code the above in Python
Data Source:
The script uses historical data retrieved from Yahoo Finance API. The data spans from January 1, 2020, to the present date, with a timeframe of 1 day (intraday).
import yfinance as yf
class FinancialData:
def __init__( self ):
print("Get Historical Data")
print("Skipping major functions - Validation of Data, etc,etc")
def get_historical_data( self,ticker = "RELIANCE.NS", starting_Date = '2020-01-01' ,last_date = '2024-12-31' ):
try:
data = yf.download ( tickers =ticker , start = starting_Date , end = last_date )
return data.reset_index()
except Exception as e:
print(F"Error Occured while Downlaoding Data from API : {e}")
return []
Implementation:
Code Structure: The code is written in Python and follows a class-based structure. This structure encapsulates functions and variables, making the code organized and modular.
Importing Libraries: We import necessary libraries, primarily LinearRegression from sklearn for regression analysis and matplotlib.pyplot for visualization.
sklearn.linear_model [This library provides tools for fitting linear models, including Linear Regression ]
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
The Q_OLS class encapsulates the process of applying Ordinary Least Squares (OLS) regression analysis . The __init__ method initializes the class with historical data of Reliance stocks and triggers the application of the OLS regression model.
class Q_OLS:
def __init__( self , df_data ):
print("Processing Linear Regression Model to get the Best fit line [OLS ]")
Data preprocessing:
The code calculates the average price (Avg) of the stocks using the formula: (Open + High + Low + Close) / 4.
Avg = (Open + High + Low + Close) / 4.
It then calculates the Simple Moving Average (SMA) trading indicator based on the average price. The period chosen for SMA calculation is 9 days.
领英推荐
We handle missing values by dropping records containing NaN values.
#Apply complex calculation on data
self.data [ 'Avg' ] = ( self.data [ 'Open' ] + self.data [ 'High' ] + self.data [ 'Low' ] + self.data [ 'Close' ] ) /4
period = 9
ema_col_name = F"SMA_{period}"
self.data[ema_col_name] = self.data['Avg'].rolling(window=period).mean()
""" remove nan values record """
self.data.dropna( inplace = True)
The __Apply_OLS_reg_model method calculates the average price and Simple Moving Average (SMA) trading indicator, applies the Linear Regression model, and prints the summary of the model.
def __Apply_OLS_reg_model( self ):
try:
#Apply complex calculation on data
self.data [ 'Avg' ] = ( self.data [ 'Open' ] + self.data [ 'High' ] + self.data [ 'Low' ] + self.data [ 'Close' ] ) /4
period = 9
ema_col_name = F"SMA_{period}"
self.data[ema_col_name] = self.data['Avg'].rolling(window=period).mean()
""" remove nan values record """
self.data.dropna( inplace = True)
# --- end calculation
""" feature and targeted data """
X = self.data[[ema_col_name]] #column_name_for_independent_variable
y = self.data [ 'Avg' ] #column_name_for_dependent_variable
# var = self.__variables()
print("Variables : Test & Prepare")
model = LinearRegression()
result = model.fit(X,y)
""" summary """
print("Summary of OLS Model is [Y= mX + C]:")
print ( 'Intercept:' , result.intercept_ )
print ( 'Slope:' , result.coef_ [ 0 ] )
print(" -----End----- ")
self.__View_output(X, result.predict(X) , y )
except Exception as e:
print(F"Error Occured while Applying Model : {e}")
Linear Regression Model:
Variables:
Output:
Getting Model Summary [Output]: We fit our data into the Linear Regression model and get some important numbers: the intercept (where our line crosses the y-axis) and the slope (how steep our line is). These numbers help us understand the relationship between the average price and the SMA.
Model 2
Get Historical Data
Skipping major functions - Validation of Data, etc,etc
[*********************100%***********************] 1 of 1 completed
number of Record Found is : 1074
Processing Linear Regression Model to get the Best fit line [OLS ]
Variables : Test & Prepare
Summary of OLS Model is [Y= mX + C]:
Intercept: 11.692441031825638
Slope: 0.9971953168155703
-----End-----
Visualizing the Results:
Finally, we create a plot to visualize our results. This plot shows the actual average prices plotted against the predicted average prices based on our Linear Regression model. This helps us see how well our model fits the data.
Complete python Code of Above model:
""" Developer Details
Name : Kamal Kumar Chanchal
"""
#Step 1: Import necessary libraries
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
class Q_OLS:
def __init__( self , df_data ):
print("Processing Linear Regression Model to get the Best fit line [OLS ]")
self.data = df_data
self.__Apply_OLS_reg_model()
def __Apply_OLS_reg_model( self ):
try:
#Apply complex calculation on data
self.data [ 'Avg' ] = ( self.data [ 'Open' ] + self.data [ 'High' ] + self.data [ 'Low' ] + self.data [ 'Close' ] ) /4
period = 9
ema_col_name = F"SMA_{period}"
self.data[ema_col_name] = self.data['Avg'].rolling(window=period).mean()
""" remove nan values record """
self.data.dropna( inplace = True)
# --- end calculation
""" feature and targeted data """
X = self.data[[ema_col_name]] #column_name_for_independent_variable
y = self.data [ 'Avg' ] #column_name_for_dependent_variable
# var = self.__variables()
print("Variables : Test & Prepare")
model = LinearRegression()
result = model.fit(X,y)
""" summary """
print("Summary of OLS Model is [Y= mX + C]:")
print ( 'Intercept:' , result.intercept_ )
print ( 'Slope:' , result.coef_ [ 0 ] )
print(" -----End----- ")
self.__View_output(X, result.predict(X) , y )
except Exception as e:
print(F"Error Occured while Applying Model : {e}")
def __View_output( self , X ,results,y):
try:
plt.scatter ( X , y , color = 'blue' , label = 'Actual data' )
plt.plot ( X , results , color = 'red' , label = 'Regression line' )
plt.xlabel ( 'Avg Price' )
plt.ylabel ( 'SMA Values' )
plt.title ( 'OLS Regression Analysis' )
plt.legend ( )
plt.show ( )
except Exception as e:
print(F"Error Occured while Plotting Summary Of OLS Regression model {e}")
Get Complete Machine Learning Repository on Quant from my GitHub:
GitHub permalink: https://github.com/Coderixc/MachineLearning/blob/437b1fb88571c7123ca480aa73ee18b3848795c6/OrdinaryLeastSquares.py
Calling above code:
The project_2() function is defined, which is the main focus of the script. It fetches historical stock data for Reliance using a predefined function. it demonstrates the use of the Q_OLS class from the OrdinaryLeastSquares module to perform Linear Regression analysis on the fetched data.
#import LinearRegression as P
import OrdinaryLeastSquares
# import Ml2 as t
import getHistoricalData as feed
def project_2():
print(" Model 2 ")
""" Get Stocks Data -- Reliance """
_t = feed.FinancialData()
df_stocks_data =_t.get_historical_data()
print(F"number of Record Found is : {len(df_stocks_data)}")
m2 = OrdinaryLeastSquares.Q_OLS(df_stocks_data)
# # Press the green button in the gutter to run the script.
if __name__ == '__main__':
""" Linear Regression to On Mean Price """
# myproject()
""" model 2 : get best fit Line (Reliance Stocks) """
project_2()
Thank you for taking the time to read this post. If you found it informative or interesting, please consider clapping to show your appreciation!
Reference:
??LinkedIn: https://www.dhirubhai.net/in/kamalchanchal
??Gmail : [email protected]
??You can also read my other Post Like:BackTesting Strategy Setup: Building a Python Trading Strategy Analyzer
??Explore the full potential of this project by visiting our GitHub repository.
Subscribe for more updates on Algorithmic Trading, financial analysis, and coding adventures using C# and Python. Thanks for reading!
Let’s stay connected and continue the conversation.