登录查看更多内容

Anomalies of a Cross-Section data model: Multicollinearity, heteroskedasticity and residual normality

Ronaldo Teixeira

Data Analyst | Econometrician | Senior Statistician | Economic Specialist at Upwork | Agricultural and Resource Economics | Quantitative Research | Impact Evaluation | Survey CTO Programmer| Research Consultant at OMR

发布日期: 2024年3月28日

+ 关注

é uma saída para o Factor de Infla??o da Variancia (VIF), que é uma medida que quantifica o nível de multicolinearidade nas variáveis independentes de um modelo de regress?o. O VIF fornece uma indica??o de qu?o muito a varia??o de um coeficiente estimado é aumentada devido à multicolinearidade.

Here are the provided Variance Inflation Factor (VIF) values:

DeliverySpeed: VIF of 1.22
ProductQuality: VIF of 1.22

A VIF of 1 indicates that there is no correlation between the independent variable in question and the other independent variables, and a VIF exceeding 5 or 10 suggests problematic multicollinearity. Therefore, with a VIF of 1.22 for both variables, we can conclude that there isn't much multicollinearity affecting the coefficient estimates. This is beneficial as it means that the estimated coefficients for DeliverySpeed and ProductQuality are reliable and are not inflated by a strong correlation with other variables in the model.

The output of the Breusch-Pagan/Cook-Weisberg test for heteroscedasticity indicates that we are testing the null hypothesis (H0) that there is constant error variance (homoscedasticity) against the alternative hypothesis of non-constant variance (heteroscedasticity).

The test results are:

领英推荐

Simple Linear Regression in Statistics using Least…

Lean Manufacturing & Six Sigma Worldwide 8 个月前

Test Assumptions Only After Running an Initial Model

The Analysis Factor 3 周前

Outliers: To Drop or Not to Drop

The Analysis Factor 8 个月前

Chi-squared statistic (chi2(1)): 0.86
Probability associated with the chi-squared (Prob > chi2): 0.3536

The chi-squared statistic measures the deviation between what is expected under the null hypothesis and what is observed. The p-value (Prob > chi2) tells us the probability of observing a test statistic at least as extreme as the one observed if the null hypothesis is true.

In the context of the classic Gauss-Markov linear regression assumptions, one of the assumptions is that errors have constant variance (homoscedasticity). If errors are heteroscedastic (their variations vary with the levels of explanatory variables), the ordinary least squares (OLS) estimates are still unbiased, but are no longer efficient, meaning we no longer have the smallest possible variance among the unbiased estimators. Moreover, standard hypothesis tests may not be valid because the standard error estimates of the coefficients are biased, leading to incorrect confidence and prediction intervals.

In our case, with a p-value of 0.3536, we do not reject the null hypothesis of homoscedasticity at the conventional significance level (usually 0.05). This indicates that there is not enough evidence of heteroscedasticity in the model, and thus, the Gauss-Markov assumptions are not violated due to heteroscedasticity. Therefore, we can consider the ordinary least squares estimates efficient and the hypothesis tests on the coefficients valid.

The null hypothesis of the joint test is that residuals are normally distributed in terms of both measures - skewness and kurtosis. A joint p-value of 0.5307, which is higher than the conventional level of 0.05, means we have no statistical evidence to reject this null hypothesis. Therefore, based on these tests, the residuals can be considered normally distributed, fulfilling another of the important assumptions of the Gauss-Markov linear regression model, which is the normality of the error terms (especially important for small samples).

More generally, the normality of the residuals is a good indication that the model is well-specified and that measurement errors are random and unbiased. This also means that confidence interval estimates and hypothesis tests that depend on the normality of the residuals are valid.

要查看或添加评论，请登录

Ronaldo Teixeira的更多文章

R Studio and FlexDashboard

2024年10月4日

R Studio and FlexDashboard

Discovering Interactive Dashboards in R Studio: A Journey Beyond the Conventional Have you ever thought that R Studio…

1 条评论
A Importancia da Raz?o Mean/SE na Tabela de Equilíbrio

2024年7月3日

A Importancia da Raz?o Mean/SE na Tabela de Equilíbrio

A Importancia da Raz?o Mean/SE na Tabela de Equilíbrio Muita gente n?o dá a devida aten??o à raz?o entre a média e o…
Tabela de Equilíbrio

2024年6月30日

Tabela de Equilíbrio

Interpreta??o da Tabela de Equilíbrio A tabela de equilíbrio compara médias de variáveis entre grupos de controle e…

1 条评论
Viabilidade de Produ??o de Ra??es para Peixes e Alevinos

2024年5月25日

Viabilidade de Produ??o de Ra??es para Peixes e Alevinos

Olá, comunidade do LinkedIn! Gostaria de compartilhar com vocês um projecto que destaca a importancia da piscicultura…
Formula??o do Modelo OLS e Tobit para estudar e analisar o impacto dos factores demográficos e socioeconómicos no rendimento

2024年4月13日

Formula??o do Modelo OLS e Tobit para estudar e analisar o impacto dos factores demográficos e socioeconómicos no rendimento

Para este trabalho, será apresentado um estudo do impacto de factores demográficos e socioeconómicos no rendimento…

3 条评论
Master's Thesis: Economic Valuation of Waste Picker Services in Hulene and Health Impact Assessment Using Econometric Models

2024年3月31日

Master's Thesis: Economic Valuation of Waste Picker Services in Hulene and Health Impact Assessment Using Econometric Models

?? Master's Thesis: Economic Valuation of Waste Picker Services in Hulene and Health Impact Assessment Using…

1 条评论
Identification of Outliers in Data Cleaning and Econometric Interpretation

2024年3月28日

Identification of Outliers in Data Cleaning and Econometric Interpretation

In data analysis, a crucial step before moving on to statistical modeling is data cleaning and preparation. An…
Comparing Logistic Models: LPM, Logit, and Probit

2024年3月18日

Comparing Logistic Models: LPM, Logit, and Probit

Comparing Logistic Models: OLS, Logit, and Probit - A Snapshot from My Master's Dissertation In the realm of…

2 条评论
Evaluating the Predictive Power of the Logit Model in Assessing the Health of Waste Pickers in Hulene

2024年3月18日

Evaluating the Predictive Power of the Logit Model in Assessing the Health of Waste Pickers in Hulene

In crafting this piece of my master's thesis, I was driven by a deep-seated aspiration to contribute to enhancing…
Econometric Analysis of Women's Leadership: Evaluating the Impact of Education, Age, Income, and Self-Confidence on CEO Roles with Probit and Logit

2024年2月7日

Econometric Analysis of Women's Leadership: Evaluating the Impact of Education, Age, Income, and Self-Confidence on CEO Roles with Probit and Logit

Interpretation of Results The results from the probit model suggest that the model can predict the probability of a…

4 条评论

See all articles

Anomalies of a Cross-Section data model: Multicollinearity, heteroskedasticity and residual normality

Ronaldo Teixeira

Data Analyst | Econometrician | Senior Statistician | Economic Specialist at Upwork | Agricultural and Resource Economics | Quantitative Research | Impact Evaluation | Survey CTO Programmer| Research Consultant at OMR

领英推荐

Ronaldo Teixeira的更多文章

社区洞察

其他会员也浏览了

Moving Averages in Time Series Analysis

Can Likert Scale Data ever be Continuous?

Multiple Regression

The Critical Role of Data Normalization in Building Indices

How Big of a Sample Size do you need for Factor Analysis?

Concise Basic Stats - Part X: Distribution-free tests (Nonparametric Statistics)

The Powers of “Normal Distribution”

The measure of Central Tendency

Outliers ( Time Series)

Correlation vs. Causation: Why Correlation Does Not Imply Causation

领英推荐

Ronaldo Teixeira的更多文章

R Studio and FlexDashboard

A Importancia da Raz?o Mean/SE na Tabela de Equilíbrio

Tabela de Equilíbrio

Viabilidade de Produ??o de Ra??es para Peixes e Alevinos

Formula??o do Modelo OLS e Tobit para estudar e analisar o impacto dos factores demográficos e socioeconómicos no rendimento

Master's Thesis: Economic Valuation of Waste Picker Services in Hulene and Health Impact Assessment Using Econometric Models

Identification of Outliers in Data Cleaning and Econometric Interpretation

Comparing Logistic Models: LPM, Logit, and Probit

Evaluating the Predictive Power of the Logit Model in Assessing the Health of Waste Pickers in Hulene

Econometric Analysis of Women's Leadership: Evaluating the Impact of Education, Age, Income, and Self-Confidence on CEO Roles with Probit and Logit

社区洞察

其他会员也浏览了

Moving Averages in Time Series Analysis

Can Likert Scale Data ever be Continuous?

Multiple Regression

The Critical Role of Data Normalization in Building Indices

How Big of a Sample Size do you need for Factor Analysis?

Concise Basic Stats - Part X: Distribution-free tests (Nonparametric Statistics)

The Powers of “Normal Distribution”

The measure of Central Tendency

Outliers ( Time Series)

Correlation vs. Causation: Why Correlation Does Not Imply Causation