Statistics Fundamentals for Testing
Ndifrekeabasi Essien. DipFA. MSc Data Science (In view)
BI Intelligence || Data Analytics & Science || Stakeholder Relationship Management || Document Control || Growth Enthusiast || Business analysis and Business Intelligence Specialist.
Hello there, I hope your last week was amazing and you’re getting ready for the new week ahead. In today’s article, I'll be talking about some basic concepts of sampling, populations, parameters, and statistics, and statistical traps.
Let me start with what a population is. A population can be considered as all potential users or people in a group or things that we want to measure. It is the data you have available that you want to measure. It could be true population or sample population. Let’s say you want to figure out the difference in temperature between a couple coffee shops and the coffee that they serve. So the question is, does one coffee shop serve hotter coffee than the other? And so here, we're comparing the collections of coffee cups. In this case, we have two populations, and they are all the coffee cups at each shop, right? And the parameter of interest here is the mean temperature of each coffee shop. And we want to know if there's a difference between the two. But since we can't measure every single cup unless we had a gadget on the coffee machine, that means we will have to use the sample population because we cannot get the true population. So that’s the difference between true and sample population
The mean and standard deviation are commonly represented byand σ and then there is the statistics from sample, and those are commonly represented by the Latin symbols of ‘’ mean, and ‘s’ for standard deviation, or sometimes as ‘sd’. So, when we have a sample, we use the sample statistics to make inferences about the population parameters.
The mean is the most common measure of central tendency. The shape of the data and how spread out the data is what is known as common variance. The more common measure of variability in statistics is the standard deviation. I’m sure you most likely have an understanding of what standard deviation is, but having a true understanding of how it's calculated, and how it's related to the shape of the data, and how it will vary depending on what that data looks like is really important in understanding the need of certain sample sizes, or how confidence level, intervals, work together. Take note that standard deviation is the measure of variance, so it's a measure of how spread out our data is.
Confidence Intervals
Confidence intervals are the range of values, defined in a way that there's a specific probability of the value of that parameter that lies within it. Statistics is inferential, we use the confidence interval to understand our risk of sampling error. In A/B testing, confidence intervals are the amount of error allowed in A/B testing. The 4 main ingredients to calculate a confidence interval are : Mean, what is in the middle, and what the confidence interval wraps around, Sample size, what determines how wide that confidence interval is Variability, the variability of our data is the shape of the data, and Confidence level, the terms of how confident we want to be that our estimate of the parameter is within that confidence interval.
Statistical significance and the P-value
Statistical significance helps us quantify whether a result is likely due to chance. So when a finding is significant, it means that we can feel confident that it's real, not that we have just got lucky in choosing the samples. So, Statistical significance means a result is unlikely due to chance. When talking about statistical significance, the P-value often comes into play. The P-value is the probability of obtaining the difference we saw from a sample if there really isn't a difference for all the users. So, it's a false positive. So the conventional threshold for declaring statistical significance is a P-value of less than 0.05. It is P less than 0.05. And what that means is that there's a less than 5% chance of a false positive, that you're seeing the result because of chance. Remember that the P-value does not tell us that the probability of B is better than A. And similarly, it doesn't tell us that the probability that we will be making a mistake in selecting B over A. The P-value is what you get after a test is run, it tells you the probability of obtaining a false positive, while the confidence level is what you set before the test and affects the confidence interval and the difference.
Statistical Power
statistical power is the likelihood that a study will detect an effect, when there is an effect to be detected. It's mainly determined by the size of the effect that you're wanting to be able to detect, so maybe the size of the lift and conversion, and then the size of the sample used to detect it. So the bigger effects are easier to detect than the smaller effects, while larger samples offer greater test sensitivity than smaller samples. It is an important concept when it comes to interpreting your tests and determining sample size.
There are a number of statistics traps when doing testing and a few of them are:
Trap 1: Regression to the mean and sampling error. Sampling error is that we don't know the true conversion rate, we're sampling from the bigger population. For regression to the mean, What it might look like is over a long time on the X axis, and then we have our conversion rate on our Y axis, when you start out, the conversion rate goes all over the place and then over time, it accumulates around that central mean.
Trap 2: Too many variants. The optimization process should always be hypothesis-driven. So, testing too many variants is not a very good idea.
Trap 3: Click rates and conversion rate. Do not inflate the levels of importance between click rates and conversion rates just because you increase visits to a product page or because visitors place an item in the shopping cart more often. Select and prioritize a main KPI before you go into testing.
Trap 4: Frequentists vs Bayesian test procedures. The difference is that in a Bayesian point of view, a hypothesis is assigned a probability, but in a Frequentist point of view, a test is run without a hypothesis being assigned a probability. Learn the technical differences between these two dominant modes of probability.
That’s it for this article, I will be posting more articles in the following weeks with more knowledge and insight about Growth Marketing.
If you want to learn more in detail about growth marketing or any other marketing course, feel free to visit CXL Institute website. They have a wide range of marketing courses and top 1% professionals in different fields of marketing that impact first class knowledge. You can also apply for their mini-degree scholarship programs just like i did.
Catch you later!