Statistical issues in this paper studying relation between air quality and LULC

A paper got published in Environmental Monitoring and Assessment. It studied relation between land-use classes (Urban, Shrubland, Forest, Water, Green areas) and air quality (O3, NO2, SO2 concentrations). Basically, it compared the concentration values in each of the classes using Kruskal-Wallis H Test.

Link to the paper


Issues with the use of Kruskal-Wallis H Test

Kruskal-Wallis H Test is sensitive to sample size. When we compare groups with large samples, the test will show statistical significance even when practical significance is less. It seemed like practical significance is less in this paper — A difference in median Ozone concentration from 0.1131 mol/m2 (Urban) and 0.1130 mol/m2 (Shrubland) is identified as statistically significant. Is it practically significant? An atmospheric chemist should answer that. For more information on this: Is there a maximum sample size for a Kruskal Wallis test? — Mathematics Stack Exchange

The paper did not mention the exact sample size of each land cover class. But, to its merit, it mentioned that the analysis was done using a stratified sample using 0.01% sample from each land cover class. Nevertheless, it seems like they just picked one stratified sample and performed a Kruskal Wallis Test. A better approach would be to do a Monte Carlo simulation. Take 1000 stratified samples with same approach and conduct a 1000 Kruskal Wallis tests. Report the number of times null hypothesis can be rejected. That would’ve made the analysis stronger. For reference, check this paper: Link. They performed 1000 K-S tests and reported the median p-value.

Assumption of Independence

The paper justified the use of Kruskal-Wallis test by assuming that the concentration values within a group (land cover class) are independent.

First of all, between groups independence is also needed.

Secondly, I don’t think it is such an easy assumption to make — even within a group. I suspect pollution concentrations to have spatial correlation. If a pixel next to me has high pollution, even I would have so! Moran’s I test would have ruled it out.

要查看或添加评论,请登录

Sai Krishna Dammalapati的更多文章

  • LogProbs

    LogProbs

    LogProbs is one of the basic skills for a prompt engineer to have. Some background before implementing it: An LLM model…

  • When to brush your teeth? A good ANOVA study!

    When to brush your teeth? A good ANOVA study!

    I found this paper which did a simple ANOVA study to find out when should one brush their teeth! TL;DR Brush twice a…

  • Bayesian probabilistic forecasts using categorical information | Part 1

    Bayesian probabilistic forecasts using categorical information | Part 1

    In this blog, I will make Bayesian forecasts of Ozone concentrations. My previous blog on Bayesian analysis: Bayesian…

  • 100% Mediation in Action

    100% Mediation in Action

    I wrote about Mediators in the previous article. This is a follow-up to it.

  • Mediators

    Mediators

    I one of my previous blogs, we saw Omitted Variable Bias. In this blog, we’ll do mediation analysis using the same…

  • Visualize Collider Bias with me

    Visualize Collider Bias with me

    It’s 2020. You are a doctor.

  • A Statistician counts well

    A Statistician counts well

    I’ve come across an article Counting as Statistics in Saket Choudhary's blog. The blog has a story on how statisticians…

  • Omitted Variable Bias (OVB)

    Omitted Variable Bias (OVB)

    You performed a regression between house prices and area and obtained a coefficient (β) for area. You’d interpret it…

  • Clarifications into Regression Discontinuity Design (RDD)

    Clarifications into Regression Discontinuity Design (RDD)

    I came across one RDD study last week where observational data was used to find the causal link between air pollution…

  • Real estate broker working with Linear Regression on imbalanced data

    Real estate broker working with Linear Regression on imbalanced data

    I used Housing price data for this analysis. Previous blog based on the same dataset are: How’d you lose in real-estate…

社区洞察

其他会员也浏览了