登录查看更多内容

Demystifying p<.05: A Balanced Approach to Significance Testing (or Avoiding it Altogether) ????

John Neuhoff

Associate Director of UX Research | AI Strategist | PhD Psychology | Retired

发布日期: 2024年1月6日

Feeling shamed for not adhering to a p<.05 statistical significance rule in your UX research? Don’t.

The p<.05 standard is a benchmark set by Ronald Fisher in the 1920s before modern computers were available. It’s the probability that your results occurred simply by random chance. For example, if five users prefer Design A and four prefer Design B, would you be confident that the larger population prefers Design A? Of course not. Why? Because if you reran the test with new users you could get four who prefer A and five who prefer B. There’s no evidence that your designs differ in preference because the likelihood of getting these results by chance is high. In Fisher’s world, the odds that the results occurred by random chance are “greater than 5%." ?

So, how did we end up with the .05 standard? Early statisticians thought it was reasonable, and scientific journals picked it up and made it gospel. Besides, calculating by hand the exact probability of your results occurring by chance could have taken months! So they made a table of “critical values” that you could compare your statistical result against to see if it was “over or under” the critical value that indicates 5%. For its time, it was a useful concept.

Old traditions die hard. Even though we can calculate the exact alpha levels of experiments now in a flash, many people (and journals) cling to the old notion of p <.05 religiously. ?

But think about it. What if there’s a 6% chance that your results occurred by chance? Fisher would say your results are not statistically significant. But, if you’re in business, is there an appreciable difference in your decision-making when the chance of a false positive is 6% versus 5%? What about 9%?

The answer, of course, is “It depends.” What are the costs of a false positive? The p<.20 might be a reasonable standard if the costs are relatively small. If they are life and death, p<.05 seems woefully inadequate. Would you take an experimental treatment if there were “only” a 5% chance it would kill you?

领英推荐

Should PMs Do Research Because Researchers Are Too…

Debbie Levitt ???? 1 年前

Is YOLOv9 better than YOLOv8?

Ritesh Kanjee 1 年前

Modelling Hyper-Edges

Kurt Cagle 1 年前

Statistical significance testing also often ignores the importance of “effect size.” Let’s say you have a very large sample, and your new design is preferred more than the old one with a statistical significance level of p<.01. Great, right? Fisher would be proud. Now let’s say the mean preference on a 1-10 scale for the new design is 7.6, and the mean preference for the old design is 7.5. It’s a reliable statistically significant difference that would almost certainly replicate time and time again. But is it worth it to implement given the associated costs? No, because the effect size (although significant) is too small. ?

Is there a better way? Enter Bayesian analysis. Bayesian methods shift the focus from rigid, binary "significant or not" decisions to probabilistic reasoning. Think of it as a nuanced conversation with your data. Instead of asking, "Is this result statistically significant at the p<.05 level?" Bayesian analysis prompts a more relevant question: "Given the data and our prior knowledge, what is the probability that one design is genuinely better than the other?" This approach is particularly advantageous when dealing with complex or uncertain scenarios common in UX research. It allows for incorporating prior knowledge and expertise into the analysis, yielding contextually richer insights and often more directly applicable to business decisions.

Let's be clear: advocating for a more nuanced approach than the p<.05 standard is not a call to abandon hypothesis evaluation- far from it. Statistical analysis remains a cornerstone of robust UX research. But, it's time to rethink our adherence to the p<.05 dogma in UX research and embrace a more flexible, nuanced approach.

It's crucial to consider the real-world implications of our findings, the magnitude of effect sizes, and the consequences and practicality of decision-making thresholds. With their probabilistic and contextual richness, Bayesian methods offer a compelling alternative. So, let's break free from the shackles of p<.05 and step into a more informed and adaptable era of data analysis, where the true goal is insightful, actionable conclusions, not just statistical victories.

#UXResearchInsights #BeyondP05 #StatisticalSignificance #BayesianAnalysisUX #DataDrivenDesign

#RethinkStatistics

Paula Bach

Principal Director UX Research Microsoft Azure Data and Fabric

1 年

Joshua Noble - Bayesian!

1 次回应

Peter Pantelis, Ph.D.

Quantitative UX Researcher and Product Analytics Specialist

1 年

"The p<.05 standard is a benchmark set by Ronald Fisher in the 1920s before modern computers were available. It’s the probability that your results occurred simply by random chance." This is not what a p-value is, even if it's the most common misinterpretation. Not up for debate, by the way. The p-value, instead, is the likelihood that you would have observed the results you got (or more extreme) under the null hypothesis (usually defined as chance, or no effect). In other words, its the extent to which your data are consistent (or inconsistent) with chance. p(D | H) is not the same thing as p(H | D). Case in point, let's say you flipped a coin twice, got one heads and one tails, and therefore got a p-value of 1.00 in a binomial test. It wouldn't even be coherent, let alone correct, to say that means "There is a 100% chance these results occurred by random chance." You might want to correct this in your article.

2 次回应

Aaron Mooney

UX Researcher / Human Factors Engineer

1 年

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3444174/

1 次回应

Krystal Cooper

@GDC2025- Catch our IGDA session Game Jams for Kids Friday 10am 3/21

1 年

What will it take to change the tide? p<.05 feels like the UX version of the developer’s always ending up debating if something is deterministic or probabilistic for every complex code challenge. So many other things can impact variance and variables and are worth of discussion.

1 次回应

Karla H.R

Chevening scholar at LSE's MSc Management of Information Systems and Digital Innovation

1 年

Alejandro Kantor

查看更多评论

要查看或添加评论，请登录

John Neuhoff的更多文章

UX and AI in 2024: Reflecting on a Year of Transformation and Innovation

2024年12月27日

UX and AI in 2024: Reflecting on a Year of Transformation and Innovation

The interplay between User Experience (UX) and Artificial Intelligence (AI) in 2024 has been revolutionary. This year…
AI is Self-Aware: Here's How It Can Transform Your Business

2024年10月17日

AI is Self-Aware: Here's How It Can Transform Your Business

The question of whether AI is self-aware often leads to discussions comparing AI to the peak of human…
?? How User Experience is Quietly Shaping the Future of AI ??

2024年9月12日

?? How User Experience is Quietly Shaping the Future of AI ??

How User Experience is Quietly Shaping the Future of AI Generative AI is igniting new possibilities for automation…

Demystifying p<.05: A Balanced Approach to Significance Testing (or Avoiding it Altogether) ????

John Neuhoff

Associate Director of UX Research | AI Strategist | PhD Psychology | Retired

领英推荐

John Neuhoff的更多文章

社区洞察

其他会员也浏览了

Data Visualization Is Eating The World

From Pixels to Perfection: A Glimpse at Infinite-ISP Tuning Tool

The future is natural - the journey from GUI to NUI

The Psychology of UI/UX in Data Platforms ???

NUI (1) What is Natural User Interface (NUI)?

ASAP Script Generation using DeepSeek for Optical Design

Ultimate Image Processing APP : Batch Cropping, Zooming In, Resizing, Duplicate Image Removing, Face Extraction, SAM 2 and Yolo Segmentation, Masking

Paper Review: Lumiere: A Space-Time Diffusion Model for Video Generation

Comparative summarisation for explainable recommendation

Modern Visual RecSys: How to Design a Recommender?

领英推荐

John Neuhoff的更多文章

UX and AI in 2024: Reflecting on a Year of Transformation and Innovation

AI is Self-Aware: Here's How It Can Transform Your Business

?? How User Experience is Quietly Shaping the Future of AI ??

社区洞察

其他会员也浏览了

Data Visualization Is Eating The World

From Pixels to Perfection: A Glimpse at Infinite-ISP Tuning Tool

The future is natural - the journey from GUI to NUI

The Psychology of UI/UX in Data Platforms ???

NUI (1) What is Natural User Interface (NUI)?

ASAP Script Generation using DeepSeek for Optical Design

Ultimate Image Processing APP : Batch Cropping, Zooming In, Resizing, Duplicate Image Removing, Face Extraction, SAM 2 and Yolo Segmentation, Masking

Paper Review: Lumiere: A Space-Time Diffusion Model for Video Generation

Comparative summarisation for explainable recommendation

Modern Visual RecSys: How to Design a Recommender?