Fixing the Problem of P-Values in Scientific Research
There has been a trend among scientists and researchers to rely on p-values, a measure of statistical significance. P-values of less than 0.05 have become a determinant for inclusion in academic journals, motivating the production of studies to yield those results.
This trend was evident in a study recently published in The Journal of the American Medical Association [1] by a team of researchers, including John P.A. Ioannidis, MD, DSc, professor of disease prevention and of health research and policy and co-director of the Meta-Research Innovation Center at Stanford, with lead author David Chavalarias, PhD, director of the Complex Systems Institute in France. According to their research, during 1990-2015, 96 percent of the over 1.6 million biomedical research papers studied contained a reported statistical significance of a p-value of 0.05 or lower.
Why is this a problem? The p-value has serious limitations; it is not an indicator of how likely a result is to be true or false. It is not a replacement for scientific reasoning.
Research is being published and touted as “statistically significant” that in actuality, may or may not be replicated, leading to lower quality scientific findings. In a study published in Science [2], researchers replicated 100 studies that were published in 2008 across three leading psychology journals. Although 97 percent of the original studies reported a p-value of 0.05 or less, only 36 percent of the replications had the same results.
For the first time in its history, on March 7, 2016, the American Statistical Association (ASA) released a "Statement on Statistical Significance and P-Values" in effort “improve the conduct and interpretation of quantitative science and inform the growing emphasis on reproducibility of science research.” The statement’s six principles are:
- P-values can indicate how incompatible the data are with a specified statistical model.
- P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.
- Scientific conclusions and business or policy decisions should not be based only on whether a p-value passes a specific threshold.
- Proper inference requires full reporting and transparency.
- A p-value, or statistical significance, does not measure the size of an effect or the importance of a result.
- By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis.
The new guidelines will provide greater transparency, increase reproducibility of study findings, and improve scientific rigor across all areas of scientific research and development. It raises the standards on what will be published in academic journals going forward.
Copyright ? 2016 Cami Rosso All rights reserved.
For more articles, visit: https://www.psychologytoday.com/us/blog/the-future-brain
References
- JAMA, Vol 315, No. 11, March 15, 2016.
Sr. Design Engineer, Murata Power Solutions
9 年You can't just blindly push the enter key on your stats software and expect it to do your thinking for you. This is a big problem in some of the ISO/six sigma shops I have been in. Everybody gets the quality-101 stats intro and how to fly minitab or spss but a lot of the subtle stuff gets left out - like how if you have thousands of samples almost any difference will give you low p-value on a t-test. And the difference between statistical and useful 'significance'
Senior Research Fellow at Simulations Plus, Inc.
9 年On of the problems not mentioned by ASA in practical uses of null hypothesis testing is the employment of Student's t-statistic as the test statistic (https://en.wikipedia.org/wiki/Student's_t-distribution). Well, Student's t-distribution tracks "the mean of a normally distributed population". Thus, implicit in the calculation of p-value is the big assumption of the underlying normal (Gaussian) distribution. Assumed but never verified! How do the researchers _know_ that, e.g., adverse effects of their new drug are normally distributed? They like it, of course, since the "tails" of the normal distribution vanish very quickly making the adverse effects appear to be highly unlikely. Well, the net results of this "hypothesis testing" can be found here: https://www.fda.gov/Drugs/DevelopmentApprovalProcess/DevelopmentResources/DrugInteractionsLabeling/ucm110632.htm#ADRs:%20Prevalence%20and%20Incidence To learn that the normal distribution, the proverbial "bell curve", does not always apply to this world, I highly recommend an excellent book by Nassim Taleb "The Black Swan: The Impact of the Highly Improbable" (https://en.wikipedia.org/wiki/The_Black_Swan_%28Taleb_book%29)
Board Member
9 年Hey Camy, grrrr I have to agree with you here. Stats can be doing more bad that good. The issue is rather on what factors you do your stats. Stats can even be misleading as you pinpoint. We all know th popular joke about a bed being the most dangerous place. This since over 95% of the people die in a bed. Interpreting issues is calling for stats. Still, to be weighted carefully. Quantitative research can the tree that hide the forest. If molecule xyz works on 50% of the cases tested, the it works and is effective on half of the population. This is what counts. Have a great day.