What to Test For: Statistical Significance or Data Stability
Throughout my career, I have always championed that we base our sample size off of the confidence intervals we look for in our data rather than statistical significance unless we we are trying to make a product claim. This is why.
It is safe to assume most products in a given market will have fewer than 1,000,000 customers. If we have a population of 1,000,000 customers for a product over a target time frame, as noted here https://www.hardwickresearch.com/wp-content/uploads/resources-sample-size.pdf, you get the 5% confidence interval at 400 responses and the 10% confidence interval at 100 responses. For less established products, often having 10,000 customers over our target time frame, modeling the population is easier because it is so much smaller. Conversely, if you want to grow your business a tighter confidence interval may give you a limited understanding of how to grow your base beyond your existing customers (as they are a fraction of the available population that might consider your product and they may not reflect the values of customers you do not have). In this case, targeting 100 responses for less ubiquitous products who are looking to grow their market may make sense (specifically because we want to retain some latitude in what may be recommended). Said another way, if we don’t have confidence our existing 10,000 customers buy our product for the same reasons we'd expect 1,000,000 people to do so (perhaps because we need to improve our product for the other 990,000), we do not force the same confidence intervals for studies that have smaller populations to intentionally retain more latitude and uncertainty in our models.
I should clarify that unless you are going for claims support or projective modeling, I generally do not worry as much about statistical significance in my studies. Folks hear me say all the time I generally target data stability, not statistical significance. The reason is because statistical significance is going to be based on the degree of difference in the stimuli and that degree of difference is what will define your sampling requirements. Take an example of paired preference, a very simple and not very powerful test in terms of probabilistic modeling, given it is just a 50/50 chance of someone selecting a product at random. If a difference is large and obvious, statistical significance can come very early. For instance, I could tell if I offered 10 people $1 vs 1 cent that within the first 10 responses that if 10/10 chose the $1, there is just a p<0.0001 (0.01%) probability this is a chance occurrence. If just 8 out of 10 people chose the $1, I’d still be certain that there is just a p<0.0547 (5.47%) probability this is a chance occurrence. Now consider a case where perhaps people weren’t as obvious in their preferences. Say 51% of people prefer Product 1 and 49% of people prefer Product 2, but this is a real and true preference (in this example) and not a statistical fluke. If I got that result testing 100 people with paired preference, I’d only have a p< 0.4602 (46.02%) certainty this didn’t happen by chance. To feel confident the change is real at the 5% level, I’d have to test over 5000 people, and to feel confident the change is real at the 0.01% level I’d need to test just over 20000 people. (FYI if you are interested, you can run the binomial stats for yourself at this link: https://www.quantitativeskills.com/sisa/distributions/binomial.php?Exp=00.5&Obs=10200&N=20000)
So this shows how statistical significance is driven by the degree of difference between your products or concepts. If you are pursuing claims support or predictive models, you will need to target statistical significance. But if you don't have to pursue that, instead I generally recommend you test for data stability, the level at which you believe your data is representative and reproducible in the larger sample from which you draw your respondent base. This is the same as targeting the most powerful confidence interval (ie the lowest +/- range we can get) in your studies. Generally you can target 400 for responses for high confidence (+/- 5% confidence interval) and 100 for reasonable confidence (+/-10% confidence interval) that your results are representative of the larger population as a whole. Often that is enough.
Moving Specialist | Logistics for Property Management and Real Estate Companies
9 个月John, thanks for sharing!
Insights Leader | Industry-Spanning Marketing Research & Data | Brand Innovation, Positioning & Product Launch
4 年Thank you for this viewpoint, John Smythe!