Real life example of AB test sample size calculation

Real life example of AB test sample size calculation

Availability bias is a well known cognitive bias that refers to the tendency to overestimate the importance of information that is easily available or memorable, while underestimating the importance of information that is harder to recall or obtain. We wanted to check if availability bias can change user behavior in 1:1 sales pitch. In particular, we wanted to test if users who receive an email about account optimization are more likely to implement the optimization after a 1:1 phone call vs users who did not receive the email and only pitched the account optimization on a 1:1 call.


Typically users implement the optimization 30% of the time after 1:1 sales pitch. We wanted to check if users who received an email explaining the benefits of the optimization (e.g how much additional revenue this optimization will generate), are more likely to implement the optimization due to availability bias (reading the email about the optimization). In theory users who receive email will be more familiar with the optimization due to email and hence more likely to implement the optimization after a 1:1 call. To test this theory, we decided to do an experiment.


Experiment setup:

  • Metric: Pitch Implementation Rate
  • Ho: Null Hypothesis: Users who receive email have implementation rate of 30% (no change)
  • Ha: Alternate hypothesis: Users who receive email have higher then 30% implementation rate
  • MDE: 10% increase in implementation rate
  • 95% significance level and 80% statistical power
  • Target Population: Users eligible for the optimization pitch
  • Randomization Unit: User level


Based on above experiment setup, we can calculate the number of users needed to reject the null hypothesis as follows

No alt text provided for this image

Lets calculate parameters in above equation

σ2 = p1*(1-p1) + p2*(1-p2)*r = (0.3 * 0.7)+(0.4*0.6)*1 = 0.45 (p is probability of implementation)

?? = 0.05 (5% significance level or 95% confidence level)

?? = 0.2 (80% statistical power)

Z1-?? = 1.64 (from z statistics table)

Z1-?? = 0.84 (from z statistics table)

Δ = p’ - p = 0.40-0.30 = 0.1 (p’ is expected conversion after email and p is current conversion rate)


n = (square(1.64 + 0.84) * (0.45))/(0.1*0.1) = 278


So we needed at least 278 users in each group to read account optimization pitch email followed by 1:1 pitch to reject the null hypothesis that email does not increase the implementation rate with 95% confidence level and 80% power.


In the end, we learnt that emails do increase the implementation rate and we were unable to reject the null hypothesis in favor of the alternate hypothesis.


More details on power and significance level:

Statistical power: The probability of correctly detecting a difference between the control and treatment groups if it exists. (80%). A power of 0.80 means there’s an 80% chance that, if there is an effect, you’ll accurately detect it without error. Meaning there’s only a 20% chance you’d miss properly detecting the effect.

Significance level: The probability of rejecting the null hypothesis (i.e., that there is no difference between the control and treatment groups) when it is true. (5%). As a very basic definition, significance level alpha is the false positive rate, or the percentage of time a conversion difference will be detected — even though one doesn’t actually exist. This number means there’s less than a 5% chance you find a difference between the control and variant — when no difference actually exists. As such, you’re 95% confident results are accurate, reliable, and repeatable.

Daria Kokoreva, PhD

Product Data Analyst | Researcher

1 年

A nice post, thanks! Just wondering, do you know formulas for hard cases when we can't use z-test or t-test and don't even know the metric distribution? Usually, I see that Monte-Carlo simulation is used for analysing, but nothing about size calculating ??

回复
Gaurav Naidu

Senior Recruiter at Heitmeyer Consulting

2 年

Excellent details, thanks for sharing.

Nick Cooper

Product Leader | AI/ML | Strategy, Tactics, Leadership

2 年

An interesting refresher, thanks! One gets used to yes/no findings from A/B testing, it helps to get a reminder of where a decision comes from. Do you have an equivalent explanation for multi-variate testing?

要查看或添加评论,请登录

Lokesh Sharma的更多文章

社区洞察

其他会员也浏览了