Understanding Common Statistical Pitfalls

Understanding Common Statistical Pitfalls

Overview

Under ideal circumstances, the explanatory power afforded by quantitative studies expands human knowledge through the addition of generalizable findings. The means via which the scientific community vets this knowledge is through the application of inferential statistics that provide estimates of how likely it is that an effect or relationship would occur by chance rather than due to treatments or actual associations. Unfortunately, as of late, a series of replication “crises,” with the one in psychology receiving the most press [1], have been identified with researchers unable to duplicate the findings published in prestigious and niche journals alike. Although the reasons for these crises are manifold, there are some statistical pitfalls, which have been shown to occur across disciplines as varied as education, medicine, and engineering [2,3] that may be at least partially responsible. In this article, we will describe issues associated with three of these practices: p-hacking, cherry-picking, and the overfitting of models to sample data [4].

p-Hacking

Terms such as results fishing and data dredging fall under the umbrella of probability-hacking (i.e., “p-hacking”), which is a deliberate search for statistically significant results (i.e., those with a probability, or “p,” of less than 0.05) regardless of the starting hypotheses [5]. There are many forms that p-hacking can take, including preferential rounding (e.g., reporting a p-value of 0.053 as <0.05), selective reporting of dependent and/or independent variables, inappropriate segmentation and analysis of favorable subgroups, scale redefinition, data imputation, and the usage of multiple statistical tests until one provides the desired outcome (e.g., disregarding the assumptions of the test and using a Mann-Whitney U-Test after finding a non-significant result with a Student’s T-Test, [6]).

Cherry-Picking Results

Whereas p-hacking involves manipulating data and statistics to provide a favorable outcome, cherry-picking involves only selecting and reporting results that are aligned with the hypotheses and conceptual frameworks that are desired. Sometimes classified as a form of hypothesizing after results are known (and shortened to “HARKing” [7]), cherry-picking, like p-hacking, is found across scientific disciplines. It should be noted that cherry-picking can extend beyond results to the citation of only those studies that align with and validate obtained results while burying those that conflict [8].

Overfitting

Researchers often use data to model generalizable phenomena to make predictions and better understand relationships. If, however, the “best fitting” is one that is overly complex or fits the data too closely, then there is a possibility of overfitting the model to the data. For example, in their discussion of modeling psychopathological networks, Fried and Cramer make a compelling case for the issues caused by overfitting through their graphic illustration of how a complex polynomial function may be used to minimize residuals for the relationship between neuroticism and depression even though it should be modeled (though with greater residuals) with a zero-order linear regression line. The end result is, yes, a well-fitting model, but one that comes with a pronounced loss in explanatory power [9]. Overfitting can also occur by failing to collect enough data or by trying to fit models to too many variables simultaneously. As with the other three statistical pitfalls, this technique seems to be pervasive, occurring not just in psychology but with machine learning algorithms as well [4].

Summary

The public places a great deal of trust in the scientific community to report what is in the best interest of society as a whole [10 ]. Given this responsibility, it is imperative that researchers eschew statistical manipulations and techniques that result in questionable findings. The techniques covered here—p-hacking, cherry-picking, and overfitting of data—all contribute to the replication issues in science [1]. Their use only benefits the researchers who use them (through publication) whilst diluting the corpus of usable knowledge. Indeed, there is already enough uncertainty in results, as dictated even by the stringent adherence to a p-value of 0.05, which, given publication bias [11], makes it even more imperative that ethical and rigorous analytical techniques be adopted by researchers across disciplines.

In order to combat these issues, some researchers have reiterated the need for a priori identification of instruments and analyses before conducting studies as a means of avoiding biases that may arise [8]. Additionally, the routine reporting of effect sizes has long been touted as a more meaningful statistic to report than p-values [12]. Finally, it has been suggested that by changing the criterion for statistical significance from a p-value that represents a chance of 1 out of 20 (p=0.05) to a more stringent 1 out of 200 (p=0.005), some of the issues discussed here can be obviated. When this shift was modeled, it was found to result in a decline in the published false positive rate, which was touted as being an advantageous way to address this practice due to its ease of both implementation and oversight [5].

Regardless of how these pitfalls may be avoided, their existence is likely due in part to a fundamental misunderstanding of what the p-value does and does not represent [13]. To this end, it is imperative that researchers understand not only how to use inferential statistics but also the underlying concepts and logic. In this regard, one would hope that through collaboration and open dialogue, scientists would do more than review their peers but would also contribute actively to their continued professional development, in this case, around statistics and their use.

View the article on our blog for a complete list of references:


Newly Updated Course: Essentials of Statistical Analysis (EOSA)

Designed for researchers, research personnel, coordinators, administrators, postdocs, and students, the EOSA course aims to build your statistical literacy and enhance your research capabilities.

Receive a demo link to the course once it is launched.


This course was authored by:


This webinar was presented by:



  • Director of Sponsored Programs
  • Principal Contracts & Grants Officer
  • Senior Contracts & Grants Officer
  • Director, Research Compliance
  • Research Compliance Coordinator

Visit CITI Program’s Employment Center to view all open positions.



要查看或添加评论,请登录

社区洞察

其他会员也浏览了