Linear Regression - part one

Linear Regression - part one

Introduction?

Linear Regression is deceptively easy. At some level, many people are familiar with linear regression from high school in the form of the equation of a straight line (y = mx +c) but as we see below, linear regression has many nuances. It also has some concepts that can be confusing. In this mu;tipart post, I hope to clarify these ideas.?

Traditionally, you learn linear regression on its own.

But I think it makes more sense to consider a number of related concepts pertaining to linear regression together i.e. simple linear regression, multiple linear regression, multivariate linear regression, polynomial regression, non linear regression and generalised linear model.?

??

Essentially, linear regression is a statistical method used to model and analyse the relationship between a dependent variable and one or more independent variables. The primary purpose of linear regression is to predict the value of the dependent variable based on the values of the independent variables.

Key Concepts:

Dependent Variable (Y): The outcome, target, or response variable that you are trying to predict or explain.

Independent Variable(s) (X): The input, predictor, or explanatory variable(s) that you use to predict the dependent variable.

Linear Relationship: In linear regression, it is assumed that the relationship between the dependent variable and the independent variable(s) is linear. We discuss more below on what we mean by a linear relationship.

What is linear regression?

In linear regression, the observations (red) are assumed to be the result of random deviations (green) from an underlying relationship (blue) between a dependent variable (y) and an independent variable (x).



By Krishnavedala - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=15462765

?

Equation: The general form of a simple linear regression equation (with one independent? variable) is:


Multiple linear Regression?

Multiple Linear Regression: When there are multiple independent variables, the model extends to:

Multivariate linear regression?

Multivariate Linear Regression involves multiple dependent variables being predicted simultaneously by a set of independent variables. In other words, there are multiple outcomes, and each outcome is modelled as a linear function of the same set of independent variables.


What do we mean by a ‘linear relationship’

In linear regression, when we say that the relationship between the dependent variable (often denoted as y) and the independent variable(s) (often denoted as x) is linear, we mean that the relationship between them can be described by a linear function.

A linear relationship means that the change in the dependent variable is proportional to the change in the independent variable(s). Mathematically, a linear relationship is one that can be expressed in the form:


where:

  • y is the dependent variable (the outcome or response variable).
  • X1, x2,…etc are the independent variables (the predictors or features).
  • β represents the coefficients associated with each? independent variable. These represent the change in y for a one-unit change in the corresponding xi, holding all other variables constant.
  • ? \epsilon? is the error term (or residual), representing the variation in y that cannot be explained by the linear model. It accounts for randomness, measurement errors, or other variables not included in the model.


This has some implications i.e.?

  1. The representation is a straight line in the simplest case?
  2. In other words, the relationship could be still linear in coefficients - but the representation may not be a straight line?

What do we mean by ‘Linear in coefficients’?

When a model is described as "linear in coefficients," it refers to the way the model's output depends on its parameters (coefficients), rather than on the nature of the variables themselves.

In a linear model, the dependent variable (output) is a linear combination of the independent variables (inputs) and their corresponding coefficients.?

As we have seen before that means a form of?

?

When we say that the relationship between the dependent and independent variables is "linear" in the context of linear regression, we mean that the dependent variable can be expressed as a linear combination of the independent variables, with a constant change in the dependent variable corresponding to a constant change in an independent variable. This model assumes a straight-line relationship between the variables when plotted, at least in the parameter space.

Hence,? In regression models, a linear combination means that the independent variables are combined in a way that involves only multiplication by constants (the coefficients) and summation. There are no interactions or non-linear operations applied to the coefficients themselves, which is what makes the model "linear in coefficients." This is a foundational concept for many types of regression models, including those that use non-linear relationships between predictors and outcomes.

Why is Linear Regression Popular?

Linear regression is a popular statistical method because:

  • It is relatively simple to understand and interpret.
  • The assumptions required for linear regression are often reasonable for many practical applications.
  • The computation of estimates for the coefficients is straightforward, and the results are easily interpretable.
  • Linear regression provides a good baseline or starting point for modeling the relationship between variables before considering more complex models.

?Applications

Predictive Modelling: Linear regression is widely used in predictive analytics to forecast future trends, such as predicting sales based on advertising spend.

Data Analysis: It helps in understanding the strength and nature of the relationship between variables.


In the next series of posts, we will continue this discussion

David Athisayam

Dental Technician | Dental Mechanic Course

3 个月

Congratulations ??

回复
Murugesan Narayanaswamy

Finance and IT Professional, Deep Learning & AI Specialist

3 个月

You have made a distinction between multiple and multivariate linear regression, there indeed is confusion in usage of these terminologies. The term 'multivariate statistics' is also used widely in the context of multiple linear regression, but though correct, it is done in such a way that makes as if multiple regression is same as multivariate regression. This paper says 80% of usage of term 'multivariate statistics' actually pertain to multivariable regression which might be because multivariate statistics is applicable also to multiple linear regression: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3518362/

Very well-explained, Ajit. The basics behind statistics sometimes get short-shrifted due to ready-made statistical packages like SPSS and SAS.

Ashok R. Dinasarapu Ph.D

(Neuro data) Scientist: movement disorders

3 个月

Dr. Ajit Jaokar Thank you for your clear explanation on regression; I look forward to more posts on data science or related topics!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了