What to Test For: Statistical Significance or Data Stability

Throughout my career, I have always championed that we base our sample size off of the confidence intervals we look for in our data rather than statistical significance unless we we are trying to make a product claim. This is why.

It is safe to assume most products in a given market will have fewer than 1,000,000 customers. If we have a population of 1,000,000 customers for a product over a target time frame, as noted here https://www.hardwickresearch.com/wp-content/uploads/resources-sample-size.pdf, you get the 5% confidence interval at 400 responses and the 10% confidence interval at 100 responses. For less established products, often having 10,000 customers over our target time frame, modeling the population is easier because it is so much smaller. Conversely, if you want to grow your business a tighter confidence interval may give you a limited understanding of how to grow your base beyond your existing customers (as they are a fraction of the available population that might consider your product and they may not reflect the values of customers you do not have). In this case, targeting 100 responses for less ubiquitous products who are looking to grow their market may make sense (specifically because we want to retain some latitude in what may be recommended). Said another way, if we don’t have confidence our existing 10,000 customers buy our product for the same reasons we'd expect 1,000,000 people to do so (perhaps because we need to improve our product for the other 990,000), we do not force the same confidence intervals for studies that have smaller populations to intentionally retain more latitude and uncertainty in our models.

 I should clarify that unless you are going for claims support or projective modeling, I generally do not worry as much about statistical significance in my studies. Folks hear me say all the time I generally target data stability, not statistical significance. The reason is because statistical significance is going to be based on the degree of difference in the stimuli and that degree of difference is what will define your sampling requirements. Take an example of paired preference, a very simple and not very powerful test in terms of probabilistic modeling, given it is just a 50/50 chance of someone selecting a product at random. If a difference is large and obvious, statistical significance can come very early. For instance, I could tell if I offered 10 people $1 vs 1 cent that within the first 10 responses that if 10/10 chose the $1, there is just a p<0.0001 (0.01%) probability this is a chance occurrence. If just 8 out of 10 people chose the $1, I’d still be certain that there is just a p<0.0547 (5.47%) probability this is a chance occurrence. Now consider a case where perhaps people weren’t as obvious in their preferences. Say 51% of people prefer Product 1 and 49% of people prefer Product 2, but this is a real and true preference (in this example) and not a statistical fluke. If I got that result testing 100 people with paired preference, I’d only have a p< 0.4602 (46.02%) certainty this didn’t happen by chance. To feel confident the change is real at the 5% level, I’d have to test over 5000 people, and to feel confident the change is real at the 0.01% level I’d need to test just over 20000 people. (FYI if you are interested, you can run the binomial stats for yourself at this link: https://www.quantitativeskills.com/sisa/distributions/binomial.php?Exp=00.5&Obs=10200&N=20000)

 So this shows how statistical significance is driven by the degree of difference between your products or concepts. If you are pursuing claims support or predictive models, you will need to target statistical significance. But if you don't have to pursue that, instead I generally recommend you test for data stability, the level at which you believe your data is representative and reproducible in the larger sample from which you draw your respondent base. This is the same as targeting the most powerful confidence interval (ie the lowest +/- range we can get) in your studies. Generally you can target 400 for responses for high confidence (+/- 5% confidence interval) and 100 for reasonable confidence (+/-10% confidence interval) that your results are representative of the larger population as a whole. Often that is enough.

Kristian Zivkovic

Moving Specialist | Logistics for Property Management and Real Estate Companies

9 个月

John, thanks for sharing!

回复
Patricia Chou

Insights Leader | Industry-Spanning Marketing Research & Data | Brand Innovation, Positioning & Product Launch

4 年

Thank you for this viewpoint, John Smythe!

回复

要查看或添加评论,请登录

John Smythe的更多文章

  • DON'T PANIC: How to pass an interview like an Amazonian

    DON'T PANIC: How to pass an interview like an Amazonian

    I perform a lot of interviews here at Amazon. For most, the process involves a phone screen followed by what we call a…

    4 条评论
  • 10 Career Commandments To Live By

    10 Career Commandments To Live By

    Among many others things, I believe my career gods want me to remember these ten things..

  • The Biggest Mistakes In Presentations

    The Biggest Mistakes In Presentations

    I've had the luxury of sitting through many great presentations, but it's the horrible presentations that really stick…

  • Messaging: Kid Friendly, Executive Approved

    Messaging: Kid Friendly, Executive Approved

    When people try to create opportunities to improve something, very often they get trapped in the logic of what makes…

  • Succeed With A Clear Quality Target

    Succeed With A Clear Quality Target

    In life, not all things are black and white. This becomes especially apparent as companies strive to define quality…

社区洞察

其他会员也浏览了