Anomalies of a Cross-Section data model: Multicollinearity, heteroskedasticity and residual normality
Ronaldo Teixeira
Data Analyst | Econometrician | Senior Statistician | Economic Specialist at Upwork | Agricultural and Resource Economics | Quantitative Research | Impact Evaluation | Survey CTO Programmer| Research Consultant at OMR
Here are the provided Variance Inflation Factor (VIF) values:
A VIF of 1 indicates that there is no correlation between the independent variable in question and the other independent variables, and a VIF exceeding 5 or 10 suggests problematic multicollinearity. Therefore, with a VIF of 1.22 for both variables, we can conclude that there isn't much multicollinearity affecting the coefficient estimates. This is beneficial as it means that the estimated coefficients for DeliverySpeed and ProductQuality are reliable and are not inflated by a strong correlation with other variables in the model.
The output of the Breusch-Pagan/Cook-Weisberg test for heteroscedasticity indicates that we are testing the null hypothesis (H0) that there is constant error variance (homoscedasticity) against the alternative hypothesis of non-constant variance (heteroscedasticity).
The test results are:
领英推荐
The chi-squared statistic measures the deviation between what is expected under the null hypothesis and what is observed. The p-value (Prob > chi2) tells us the probability of observing a test statistic at least as extreme as the one observed if the null hypothesis is true.
In the context of the classic Gauss-Markov linear regression assumptions, one of the assumptions is that errors have constant variance (homoscedasticity). If errors are heteroscedastic (their variations vary with the levels of explanatory variables), the ordinary least squares (OLS) estimates are still unbiased, but are no longer efficient, meaning we no longer have the smallest possible variance among the unbiased estimators. Moreover, standard hypothesis tests may not be valid because the standard error estimates of the coefficients are biased, leading to incorrect confidence and prediction intervals.
In our case, with a p-value of 0.3536, we do not reject the null hypothesis of homoscedasticity at the conventional significance level (usually 0.05). This indicates that there is not enough evidence of heteroscedasticity in the model, and thus, the Gauss-Markov assumptions are not violated due to heteroscedasticity. Therefore, we can consider the ordinary least squares estimates efficient and the hypothesis tests on the coefficients valid.
The null hypothesis of the joint test is that residuals are normally distributed in terms of both measures - skewness and kurtosis. A joint p-value of 0.5307, which is higher than the conventional level of 0.05, means we have no statistical evidence to reject this null hypothesis. Therefore, based on these tests, the residuals can be considered normally distributed, fulfilling another of the important assumptions of the Gauss-Markov linear regression model, which is the normality of the error terms (especially important for small samples).
More generally, the normality of the residuals is a good indication that the model is well-specified and that measurement errors are random and unbiased. This also means that confidence interval estimates and hypothesis tests that depend on the normality of the residuals are valid.