P-values in forest science
P-values in forest science

P-values in forest science

No alt text provided for this image

What I love about this meme is that it expresses the realities of how people interpret p-values. We think of a statistical test as being “significant” (we have results to share and publish; John Cena is the champion) or “not significant” (we don’t have anything interesting to share despite all our hard work; John Cena faces misery and discontent).

One of the most common topics taught in introductory statistics courses is hypothesis testing. However, much of the data community is moving away from using hypothesis testing, and specifically, the use of p-values.

The “p” in p-value stands for probability, or the probability under a statistical model that a summary of the data would be equal to or more extreme than its observed value. As the American Statistical Association highlighted in 2016, there are several caveats with using p-values and interpreting them.

I was interested to see how often p-values are used in forestry research. I set up a small experiment: to look at how many p-values were reported in the Journal of Forestry in 2021 (find the data here).

What counted as a p-value was more complicated than I first thought. Oftentimes authors will mention the true p-value from a statistical test either in the text or most commonly, in a table of results. However, other times, no doubt a result of the software being used, authors will report the asterisk or double asterisk that shows the significance at a specific level. For example, * and ** were written to denote significance at the 0.10 and 0.05 levels, respectively. Other times p-values will be embedded within figures, which makes finding them with a text string search problematic.

Despite the challenges, out of the 36 research and review articles published in the Journal of Forestry in 2021, 61% did not present any p-values. Here is the distribution of p-values from the articles:

No alt text provided for this image

Of those that did present p-values, the median number of p-values per article was 15. There was one paper that presented 191 different p-values (which made excellent use of the asterisk approach).

I have no idea if the use of p-values in forestry has increased or decreased since they began to be used widely in the mid 1900’s. It strikes me that as we continue to teach hypothesis testing to students, p-values will continue to be used in the future.

It’s important to reflect on the role of hypothesis tests and p-values are and what they aren’t. People analyzing forestry data should have these in their toolbox, but also be aware of their proper use.

Note: I present a few practical tips on using p-values on the full blog post on this topic. Hat tip to Matt Dancho for the idea for the meme.

Yes, the p-value is very often a big hurdle for PhD students. But there are several ways to twig it out.

要查看或添加评论,请登录

Matt Russell的更多文章

社区洞察

其他会员也浏览了