p-hacking
p-hacking isn't really a technique to manipulate data, rather a bunch of various practices involving various techniques that aim to somehow achieve significance. This often involves performing multiple tests, adjusting variables, or selectively reporting results to achieve a p-value less than 0.05
Let's try and understand what constitutes p-hacking with an example. Say a researcher is conducting a study to investigate whether listening to classical music improves cognitive performance. The primary hypothesis is that participants who listen to classical music will perform better on a memory test compared to those who do not listen to any music
Original Analysis
The researcher conducts a memory test on 2 groups: one listens to classical music before the test, and the other group does not listen to any music. The results show no significant difference between the two groups (p > 0.05)
p-hacking techniques:
1. Post Hoc Subgroup Analysis
2. Selective Reporting
3. Re-defining Variables
The researcher redefines the success criterion for the memory test. Initially, the number of correctly recalled items was the measure. They change it to the percentage improvement from a pre-test to post-test, finding that this redefined measure shows a significant improvement (p < 0.05)
4. Data Exclusion
After examining the data, the researcher notices that some participants had very low scores, which could be considered outliers. They decide to exclude these participants from the analysis, and after this exclusion, the results show a significant effect (p < 0.05)
5. Stopping Data Collection
The researcher collects data in phases and checks the results periodically. At one point, they observe a significant result (p < 0.05) and decide to stop collecting further data and report the findings, without mentioning the interim analyses
Outcome:
The final published study claims that listening to classical music significantly improves cognitive performance, specifically in attention and among younger adults. These findings are a result of p-hacking, not a true effect.
领英推荐
How/Why p-hacking works?
The probability of getting a significant result increases with multiple testing due to the principles of probability. When you conduct multiple independent tests, the chance of encountering at least one significant result by random chance increases, even if none of the tests individually indicate a true effect.
Mathematical Explanation:
Suppose you are testing a hypothesis at the 0.05 significance level. This means there is a 5% chance (0.05 probability) of obtaining a significant result purely by chance for any single test, assuming the null hypothesis is true.
For a single test, the probability of not finding a significant result is 1 - 0.05 = 0.95.
If you conduct n independent tests, the probability that none of them will be significant is:
(1?0.05)^n=0.95^n
The probability of finding at least one significant result among these n tests is:
1?0.95^n
As n increases, 0.95^n decreases, and thus 1?0.95^n increases. This demonstrates that the probability of obtaining at least one significant result by chance increases with the number of tests.
Example Calculation:
Let's calculate the probability of finding at least one significant result for different numbers of tests:
As seen from the calculations, the probability of obtaining at least one significant result by chance increases substantially as the number of tests increases.
Why Should you do it?