Principle Assumptions of Linear Regression model

There are four principal assumptions which justify the use of linear regression models for purposes of inference or prediction:


  • Linearity and additivity of the relationship between dependent and independent variables:
  • The expected value of dependent variable is a straight-line function of each independent variable, holding the others fixed.
  • The slope of that line does not depend on the values of the other variables.
  • The effects of different independent variables on the expected value of the dependent variable are additive.


  • Statistical independence of the errors (in particular, no correlation between consecutive errors in the case of time series data):  It means that the errors in your model are not related to each other. Computation of standard error relies on the assumption of independence, so if you don’t have standard error, say goodbye to confidence intervals and significance tests.


  • Homoscedasticity (constant variance) of the errors: Homoscedasticity/homogeneity of variance occurs when the spread of scores for your criterion is the same at each level of the predictor. When this assumption is satisfied, your parameter estimates will be optimal. When there are unequal variances of the criterion at different levels of the predictor (i.e., when this assumption is violated), you’ll have inconsistency in your standard error & parameter estimates in your model. Subsequently, your confidence intervals and significance tests will be biased. We can consider it as homoscedasticity (constant variance) of the errors -
  • versus time (in the case of time series data)
  • versus the predictions
  • versus any independent variable
  • Normality of the error distribution: The assumption of normality in regression manifests in three ways: This assumption is most important when you have a small sample size (because central limit theorem isn’t working in your favor), and when you’re interested in constructing confidence intervals/doing significance testing
  • For confidence intervals around a parameter to be accurate, the parameter must come from a normal distribution.
  • For significance tests of models to be accurate, the sampling distribution of the testing must be normal.
  • To get the best estimates of parameters (i.e., betas in a regression equation), the residuals in the population must be normally distributed.

Plus a Bonus: No influential outliers: This isn’t technically an assumption of regression, but it’s best practice to avoid influential outliers. 

要查看或添加评论,请登录

Deepak Chaubey的更多文章

社区洞察

其他会员也浏览了