A Lay Interpretation of Statistical Significance and p-values: Slow Burn, Love at First Sight or Enduring Love?

It's rare that a lay interpretation of a technical matter will be more accurate or less misleading than its technical one, but in the case of statistical significance and the metric used to quantify it, the p-value, this is probably true. This post goes out to all non-technical folks and Statisticians/Data Scientists alike, as what people have studied in rigorous Statistics courses has, surprisingly to many Statisticians/Data Scientists even, been flawed all along.

First, what is the usual technical (but also couched as 'lay,' at least in technical circles) interpretation of a p-value? Say you're trying to see if there's a statistically significant difference between the means of an outcome like height between two groups. Then, the p-value is the probability that you will see a result (a difference in mean heights between the two groups) at least as large as what you're seeing from sampling if there were truly no difference in means between the two groups in the entire population. The importance of this lies in the intuition that it is very unlikely (p-value is very small) that the mean heights of the two samples from each of the two groups would be so vastly different if they were truly the same. Therefore, it's likely to be the case that the mean heights in these two different populations truly differ from each other. In Statistics, the phrase 'truly differ' is not used, but instead, 'are statistically significantly different' is. Does this necessarily mean that there is a large, meaningful difference between the two groups? No, and that's probably why terms like 'significantly different' (without 'statistically' preceding it), 'truly different' and all manner of more concise and 'human' terms are not to be used in the case of hypothesis testing (or what the non-Statistical community calls 'A/B testing') or statistical regression modeling that outputs p-values. The term 'statistically significant' has a very specific interpretation. Unfortunately, the definition in terms of probability that I gave above has long been debunked as quantitatively inaccurate, though it is still the standard definition taught in school and used at work. For those who're interested in its very complex and esoteric quantitative definition, you know how to find it. Nevertheless, understanding the definition above will partly lead us to a very good understanding of the only 2 things you really need to know about statistical significance:

  1. It is affected by sample size. The larger the sample, the greater the likelihood that your p-value will be below 0.05, which is the standard criterion for declaring statistical significance.
  2. It is affected by magnitude of 'effect size.' The larger the difference in means or proportions between your two groups, the greater the likelihood of attaining statistical significance.

Often, people without a background in Statistics (I don't necessarily mean academically) are unaware of or forget about the two factors above that determine statistical significance, and thus jump to wrong conclusions when they view p-values. You can't really blame them, given that the media often use terms like 'significant factors' (not 'statistically significant factors') with meanings that lay people conflate with the meaning of 'statistically significant.' It's not exactly the fault of these media in cases where they never meant to convey any statistical perspective, and besides, the use of the word 'significant' to mean 'substantial' is so prevalent in everyday language that it's easy for most people to assume 'statistically significant' means 'substantial' in whatever way they assume 'substantial' means. If someone who's not statistically trained sees that a p-value is >0.05, they may just assume that an input variable in a regression model has no intrinsic association with the outcome variable. It is right, however, to think that there is no statistically significant association - yet. The sample size may simply not have been huge enough and perhaps statistical significance will emerge in the future when a greater sample has been gathered. On the other hand, just because an input variable is statistically significant doesn't necessarily mean that it has much business value; it could be statistically significant just because the sample size used is huge, such that even a small 'effect size' on the outcome variable allows the variable to be statistically significance. I put the term 'effect size' in quotes simply because I wanted to avoid any confusion that the variable has a causal effect on the outcome variable. This 'effect size' (called a coefficient in regression modeling) is the magnitude of the association with the outcome variable, quantifying how much the outcome variable changes when one increases one unit of the input variable, holding all other variables at a constant level.

From 1) and 2), here is a simple lay aphorism that can help you remember what affects statistical significance: 'slow burn, love at first sight or enduring love?' (Frankly, the term 'enduring passion' fits better.) Say you've been visiting a city and the views through repeated visits during all seasons have been consistently pleasant. You'd move there after all these visits and a declaration to yourself that it's statistically significantly more beautiful than the average city or the one you're moving from, but you probably wouldn't move there after just one or two visits. That is statistical significance derived from a slow burn. Now, suppose you visit a city so beautiful to you that you wish heaven were modeled after it, you'd move there just after one visit. That, of course, is statistical significance from love at first sight. (I would not recommend this. Seasons change, municipal budgets to clean up the streets change, etc.) Of course, statistical significance can be derived from 'enduring passion' too and I think that is the statistical significance that most people solving business problems would like to see - a huge sample and a meaningfully large 'effect size' driving the statistical significance.

I like your analogy of "slow burn" and "love at first sight" for small and large effect size! To some extend when some effect is so significant you don't even need math or stats to know it is (you could use math and stats to prove it is), because you have learned that experience through every day life.

回复

Effect size is so important! because statistical significance can be completely useless without a sufficient effect size!

回复

要查看或添加评论,请登录

Alice SH Wong的更多文章

  • Logistic Regression: Basics, Obscurities and its Membership as a Classifier

    Logistic Regression: Basics, Obscurities and its Membership as a Classifier

    1) Logistic regression (LR) is a regression. And yes, it's also a classifier, insofar as the predicted log odds is a…

    26 条评论
  • Boundaries: Consistency over Levels?

    Boundaries: Consistency over Levels?

    I have to preface this by saying I have an instinctive cynicism about the word 'boundaries' even if it doesn't mean I…

    12 条评论
  • Making Remote Social Dynamics Work

    Making Remote Social Dynamics Work

    1) Scrap hub-and-spoke model The hub-and-spoke model can be useful for finding default contacts especially within other…

    4 条评论
  • Self-Made in Data Science: A Good Idea?

    Self-Made in Data Science: A Good Idea?

    What exactly does 'self-made' mean? Self-made or not isn't a binary condition. After all, everyone is self-made to some…

    14 条评论
  • 3 1-Minute Hacks to Improve Your Models

    3 1-Minute Hacks to Improve Your Models

    1) If your sole purpose is to predict but not perform any statistical inference, you can speed up your logistic…

    1 条评论
  • Top 10 Most Annoying Data Science Topics on LinkedIn

    Top 10 Most Annoying Data Science Topics on LinkedIn

    LinkedIn posts on Data Science are a carousel of the same ten or so Data Science topics floating around. If these…

    18 条评论
  • An Evaluation of pycaret's 'Regression - Level Beginner'

    An Evaluation of pycaret's 'Regression - Level Beginner'

    LinkedIn seems aflush these days with demonstrations of pycaret and demonstrations of covid-19 prediction skills, with…

  • Help Fund BidnBuddy's Clients!

    Help Fund BidnBuddy's Clients!

    I am very happy to announce that BidnBuddy has taken off much better than I'd expected. I received a paltry $70 for my…

  • Buddy Up or Bid Up: An App for Enhanced Matched Funding

    Buddy Up or Bid Up: An App for Enhanced Matched Funding

    How it works: My funding app BidnBuddy at https://apploft.shinyapps.

  • Love, Actually

    Love, Actually

    It's holiday season again, and again, some movie called Love, Actually seems to be airing all over the drive-ins…

社区洞察

其他会员也浏览了