Enhancing Regression Models with Geographically Weighted Regression to Address Spatial Autocorrelation

Enhancing Regression Models with Geographically Weighted Regression to Address Spatial Autocorrelation

Spatial autocorrelation (SAC) exists when spatial data points are correlated with one another simply because their locations are near to each other. SAC can cause a failure in the hypothesis test in Ordinary Least Squares (OLS) regression models, as well as bias and inconsistency. This article describes how SAC alters the interpretation of an OLS regression model and provides an example to illustrate a situation in which Geographically Weighted Regression (GWR) provides a nuanced understanding of patterns and relationships in the data, and is therefore a better approach.

The Impact of Spatial Autocorrelation on OLS Regression

A key assumption underlying OLS regression models is that the residuals (errors) are independently and identically distributed. In the case of spatial data, this assumption is widely violated, giving rise to many problems:

  1. Inflated significance levels: SAC inflates the significance levels of beta coefficients, falsely rejecting the null hypothesis (ie, finding spurious correlations) when both dependent and explanatory variables are spatially dependent.
  2. Model Misspecification: SAC in residuals could show misspecification in the model, revealing important spatial features unaccounted for, leading to incorrect hypothesis tests and parameter estimates.
  3. Diminished predictive power: In the presence of SAC, the predictive power of the model may be decreased. If independent variables of environmental nature have high spatial structure inherently, spatial filters in the OLS model may exaggerate this structure and lead to the model assigning more predictive importance to spatial than to environmental factors.
  4. Bias and inefficiency SAC creates biased or inconsistent point estimates to any OLS model, rendering them inefficient for forecasting and statistical inference purposes.

Addressing Spatial Autocorrelation in OLS Regression

In order to avoid any harmful effects of SAC, a lot of improved new methods and models have been created for this purpose:

  1. Spatial filtering: methods like spatial filtering with eigenvectors can reduce the impact of spatial misspecification errors while improving model fit.
  2. Spatial econometric models: the spatial error model and the extended spatial Durbin model deal better with both intrinsic and extrinsic sources of SAC, leading to asymptotically unbiased estimates and asymptotically correct type I error rates.
  3. Machine Learning Integration: The incorporation of spatial features within standard machine learning models such as random forests can reduce the SAC of residuals both globally and locally, and improve model performance.

Geographically Weighted Regression: A Better Alternative

Thanks to GWR, a single equation might be able to account for the power of the original spatial relationships across an area, wherever there’s high or low income, rather than assuming they remain the same throughout. A GWR model that accounts for spatial heterogeneity offers more nuanced, precise understanding of spatially varying relationships than traditional OLS models.

Advantages of GWR

  1. Revelation of Spatial Non-Stationarity: GWR highlights local variations that are missed by the OLS models, suggesting that the relationships between variables are spatially non-stationary rather than uniformly constant.
  2. Better model performance Overall, GWR models tended to do better than OLS models of the same data, with higher R2 values and lower values of the Akaike Information Criterion (AIC) – both indications of a better fit to the data, and therefore more reliable (and often higher) relationships.
  3. Versatile application: GWR’s record of application in research in diverse fields of environmental studies, urban research, crime analysis and transportation research is testament to its versatility and utility as a useful tool for modelling shifting relationships in space.
  4. More Sophisticated Versions: Other models include the integration of gradient-boosting in GWRBoost, as well as other nonparametric models such as Geographically Weighted Nonparametric Regression that retain the benefit of detailed local interpretations and enhance GWR’s ability to handle more complicated data.

Practical Applications of GWR

It turns out that empirical studies have already demonstrated the added value of GWR in different applications. In particular, in urban studies, GWR detected only in specific geographic locations spatially varying relationships between, for instance, elevation, pipeline density and road/square ratio. In public health, GWR variation provided a reliable explanation for why relationships were somehow complex at the neighbourhood level because of spatial non-stationarity.

Final Thoughts

Spatial autocorrelation can dramatically change results. Since the OLS regression model assumes independence of the observations, you are effectively throwing away valuable information by not accounting for this local spatial variation. However, GWR can determine and account for spatial heterogeneity, allowing the user to determine whether and where the relationship has changed. GWR offers roughly 10 times more local information than the OLS model can provide, a distinct advantage for those analyzing spatial data.

Richa Kaushik

PMP? , MPH, MS in Dental Surgery

6 个月

Very informative

回复

要查看或添加评论,请登录