Linear Regression

Today, we’re diving into the math behind one of the most fundamental models in machine learning: linear regression. This is the first model you’ll typically learn when starting out in the field.



DEFINATION

Linear Regression is a way to find a straight line that best matches a set of data points. It helps predict one value based on another by showing the relationship between them. The formula of linear regression –

? = c + mx

where, ? -> predicted value

m -> slope; it tells how much the predicted value changes for each unit in the input value

c -> Intercept; it tells you the predicted value when the input variable is ‘0’

x -> data points

Linear Regression


Notations used in the Research paper are –

In case of more than 1 input feature, the formula will be -

Linear regression is used for:

  • Supervised Learning?—?It’s a type of machine learning where the model is trained on labeled data.
  • Regression Problems?—?It predicts a continuous outcome, like forecasting sales or estimating prices.

IMPORTANT TERMINOLOGIES –

  1. Residual Error: It’s the difference between the actual value and what the model predicts. If the residual error is ‘0’, it may indicate that the model is overfitting.
  2. Cost Function: It measures the error between predicted and actual values. The cost function varies depending on the model. In linear regression, the cost function used is Mean Squared Error.

Cost Function

Where J(θ) → cost function,

m -> number of training samples,

h(θ) -> predicted value,

y -> actual value

3. Repeat Convergence Algorithm?—?It is an iterative process that repeatedly updates the model parameters (θ)?to reduce the cost function until it reaches the global minimum.

where α is the learning rate
      ?/?θ1  J(θ1)   is derivate of cost function, i.e., slope        

CASE I?—?In this case, the slope is +ve ->

θ?:= θ1 – α(+ve)

So, the value of θ1 decreases.

Slope is +ve

CASE II?—?In this case, the slope is -ve ->

θ1?:= θ1 – α(-ve)

θ1?:= θ1 + α(+ve)

So, the value of θ1 increases.

Slope is -ve

CASE III?—?In case, the slope is nearly 0 ->

θ1?:= θ1 α(0)

θ1?:= θ1 – 0

So, the value of θ1 will be unchanged.

4. Learning Rate?—?It is a hyperparameter that controls the step size at each iteration while moving toward a minimum of the cost function. It determines how quickly or slowly the model updates its parameters (weights) during gradient descent.

CASE I?—?Learning Rate is too high?-?The model may overshoot the minimum, leading to divergence.

Learning Rate is too high

CASE II?—?Learning Rate is low?-?The model converges very slowly, taking more time to reach the optimal solution.

Learning Rate is low
How θ value is calculated?

Let’s break down the calculation of θ step by step.

To keep it simple, we’ll use basic data points: (1,1), (2,2), and (3,3), with an initial θ0 = 0 and α = 0.1

STEP 1?—?Define the hypothesis function and cost function for simple linear regression:

  • Hypothesis Function:

Hypothesis Function

  • Cost Function:

Cost Function

STEP 2?—?Assume a Random Value for θ1 and Calculate the Cost Function

Let’s assume θ1 = 0:

STEP 3?—?Minimize the Cost Function Using Gradient Descent:

Iteration I -?θ1 = 0:

When θ

Iteration II?-?θ1 = 0.467:

When θ

Iteration III?-?θ1 = 0.716:

When θ

STEP 4?—?Continue repeating Step 5 until the cost function hits its lowest value.


Interesting Questions —

Q?—?Why do we divide the MSE by 2 in the cost function?

A?—?Let’s look at both scenarios:

Situation I?—?Cost function without 1/2:

When calculating the gradient, an extra factor of 2 (which appears after taking the derivative) makes the math a bit messier.

Without 1/2

Situation II— Cost function with 1/2:

With 1/2

When we divide the cost function by 2, the extra factor of 2 (which appears after taking the derivative) cancels out, simplifying the gradient descent process and making calculations easier.


Q?—?Why do we take the derivative of the cost function while updating θ?

A?—?We take the derivative to find the slope of the cost function, which helps us adjust the model’s parameters to minimize errors and improve the model.


Reference —

  1. https://www.youtube.com/watch?v=jerPVDaHbEA&list=PLTDARY42LDV7WGmlzZtY-w9pemyPrKNUZ&index=2


Finally —

I hope this blog clarifies linear regression for you!

Got a particular ML topic you’re curious about? Drop your suggestions in the comments, and I’ll do my best to cover them. Thanks for reading!

Feel free to hit me up on LinkedIn. Coffee’s on me (virtually, of course) ??



Giovanni Sisinna

??Portfolio-Program-Project Management, Technological Innovation, Management Consulting, Generative AI, Artificial Intelligence??AI Advisor | Director Program Management @ISA | Partner @YOURgroup

5 个月

Great explanation, Ishika Garg. Linear regression is foundational for predictive modeling.

Harpreet Kaur

Immediate joiner | SOC Analyst | #Open to work | Basics of Networking and Cyber Security

7 个月

Interesting

Giovanni Sisinna

??Portfolio-Program-Project Management, Technological Innovation, Management Consulting, Generative AI, Artificial Intelligence??AI Advisor | Director Program Management @ISA | Partner @YOURgroup

7 个月

Insightful overview of linear regression fundamentals! Understanding the cost function's role and the learning rate's impact is crucial for optimizing models efficiently. Thanks for sharing, Ishika Garg!

Adarsh Srivastav

SDE @Amazon | Data Structures and Algorithms | Java | AWS

7 个月

Thanks for Sharing Ishika Garg

Good info , thanks for sharing it with proper implementation steps

要查看或添加评论,请登录

Ishika Garg的更多文章

  • SVD — Single Value Decomposition

    SVD — Single Value Decomposition

    Today, we embark on an exciting journey into the world of Singular Value Decomposition (SVD) — a fundamental concept in…

    8 条评论
  • RAG

    RAG

    RAG stands for Retrieval-Augmented Generation. It’s a game-changer when working with LLMs.

    6 条评论
  • Vector Database

    Vector Database

    In the world of databases, we’re all familiar with traditional databases like RDBMS. But have you heard about vector…

    9 条评论
  • Transformers

    Transformers

    We’re exploring the realm of Deep Learning, focusing on the pivotal role that “transformers” play in driving…

    23 条评论
  • LLM Models

    LLM Models

    LLMs are a category of foundation models trained on large amounts of data (such as books, articles, etc.), enabling…

    14 条评论
  • Foundation Model

    Foundation Model

    FOUNDATION MODEL is a versatile machine learning model that has been pre-trained on a vast amount of unlabelled, and…

    6 条评论

社区洞察

其他会员也浏览了