登录查看更多内容

Principle Assumptions of Linear Regression model

Deepak Chaubey

Data & Analytics | Data Strategy & Governance | Master Data Management | AI/ML & GenAI Evangelist | Data Engineering & Cloud Engineering Expertise

发布日期: 2020年5月21日

There are four principal assumptions which justify the use of linear regression models for purposes of inference or prediction:

Linearity and additivity of the relationship between dependent and independent variables:
The expected value of dependent variable is a straight-line function of each independent variable, holding the others fixed.
The slope of that line does not depend on the values of the other variables.
The effects of different independent variables on the expected value of the dependent variable are additive.

Statistical independence of the errors (in particular, no correlation between consecutive errors in the case of time series data): It means that the errors in your model are not related to each other. Computation of standard error relies on the assumption of independence, so if you don’t have standard error, say goodbye to confidence intervals and significance tests.

Homoscedasticity (constant variance) of the errors: Homoscedasticity/homogeneity of variance occurs when the spread of scores for your criterion is the same at each level of the predictor. When this assumption is satisfied, your parameter estimates will be optimal. When there are unequal variances of the criterion at different levels of the predictor (i.e., when this assumption is violated), you’ll have inconsistency in your standard error & parameter estimates in your model. Subsequently, your confidence intervals and significance tests will be biased. We can consider it as homoscedasticity (constant variance) of the errors -
versus time (in the case of time series data)
versus the predictions
versus any independent variable
Normality of the error distribution: The assumption of normality in regression manifests in three ways: This assumption is most important when you have a small sample size (because central limit theorem isn’t working in your favor), and when you’re interested in constructing confidence intervals/doing significance testing
For confidence intervals around a parameter to be accurate, the parameter must come from a normal distribution.
For significance tests of models to be accurate, the sampling distribution of the testing must be normal.
To get the best estimates of parameters (i.e., betas in a regression equation), the residuals in the population must be normally distributed.

Plus a Bonus: No influential outliers: This isn’t technically an assumption of regression, but it’s best practice to avoid influential outliers.

要查看或添加评论，请登录

Deepak Chaubey的更多文章

Machine Learning: Key error definitions from Machine Learning:

2021年3月10日

Machine Learning: Key error definitions from Machine Learning:

Mean Percentage Error (MPE): like MAPE, it is the average of the percentage errors (take the actual value and the…
6 new Technology buzzword you must know in 2020

2020年1月24日

6 new Technology buzzword you must know in 2020

1. Virtual Reality & Augmented Reality: Both Augmented Reality (AR) and Virtual Reality (VR) have been around for quite…
Machine learning Interview questions

2019年5月28日

Machine learning Interview questions

Machine Learning Interview Questions: Algorithms/Theory These algorithms questions will test your grasp of the theory…

1 条评论
The data pipeline: why do we need it?How its different from ETL

2019年5月14日

The data pipeline: why do we need it?How its different from ETL

Data pipeline, software that eliminates many manual steps from the process and enables a smooth, automated flow of data…
The Seven Practice Areas of Text Analytics

2019年3月22日

The Seven Practice Areas of Text Analytics

WHAT IS TEXT MINING? Text mining and text analytics are broad umbrella terms describing a range of technologies for…
Yes!!.......200% salary hike is not a myth!!

2019年3月18日

Yes!!.......200% salary hike is not a myth!!

Think about the fact that existing employee become underpaid by 50 % after every 2 year of hiring while company pays…

1 条评论
What are the differences between ORC, Avro and Parquet File Formats in Hadoop, in terms of compression and speed?

2019年3月15日

What are the differences between ORC, Avro and Parquet File Formats in Hadoop, in terms of compression and speed?

In simplest word, these all are file formats. Hadoop like big storage and data processing ecosystem need optimized read…

2 条评论
Python tips & best practices for developers:

2019年3月11日

Python tips & best practices for developers:

Learn to Use Python Dictionary Dictionary data structure in Python is a way to store data; moreover, it is powerful and…
R or Python? Which one is best for a data scientist or data analyst?

2019年3月6日

R or Python? Which one is best for a data scientist or data analyst?

The key to become a data science professional is in understanding the underlying data science concepts and work towards…

1 条评论
Tricky Machine learning question asked during interview

2019年2月18日

Tricky Machine learning question asked during interview

Q- What is deep learning, and how does it contrast with other machine learning algorithms? #MachineLearning…

1 条评论

See all articles

Principle Assumptions of Linear Regression model

Deepak Chaubey

Data & Analytics | Data Strategy & Governance | Master Data Management | AI/ML & GenAI Evangelist | Data Engineering & Cloud Engineering Expertise

Deepak Chaubey的更多文章

社区洞察

其他会员也浏览了

How to Interpret the Intercept in 6 Linear Regression Examples

R-squared in Regression Analysis

10 Assumptions of Linear Regression

The Distribution of Independent Variables in Regression Models

Fit & predict for regression

Proportions as Dependent Variable in Regression–Which Type of Model?

Linear Regression (Less Linear Than You Might Think)

Adding an extra model-validity ratio to your regression results.

Multi-Collinearity

Delve deeper into R-squared.

Deepak Chaubey的更多文章

Machine Learning: Key error definitions from Machine Learning:

6 new Technology buzzword you must know in 2020

Machine learning Interview questions

The data pipeline: why do we need it?How its different from ETL

The Seven Practice Areas of Text Analytics

Yes!!.......200% salary hike is not a myth!!

What are the differences between ORC, Avro and Parquet File Formats in Hadoop, in terms of compression and speed?

Python tips & best practices for developers:

R or Python? Which one is best for a data scientist or data analyst?

Tricky Machine learning question asked during interview

社区洞察

其他会员也浏览了

How to Interpret the Intercept in 6 Linear Regression Examples

R-squared in Regression Analysis

10 Assumptions of Linear Regression

The Distribution of Independent Variables in Regression Models

Fit & predict for regression

Proportions as Dependent Variable in Regression–Which Type of Model?

Linear Regression (Less Linear Than You Might Think)

Adding an extra model-validity ratio to your regression results.

Multi-Collinearity

Delve deeper into R-squared.