Basics and Example of A/B Test
In this article, we will be covering the basics of A/B testing.
Before understanding the basics and various aspects of A/B testing, we need to know why A/B testing is required.
A/B testing is required when we want to improve a particular aspect of a product. For example, changing the colour of the submit button on a form or evaluating a three-page form vs a one-page form having all the information. The current version is control and newer version is called treatment.
Consider a website planning to add a coupon code option and test out if adding a coupon code option leads to reduction in revenue.
Before we design an experiment to test our hypothesis, let's look at the user journey for this typical ecommerce website.
It is important to finalize the metric which would be considered for evaluating the success of the experiment. An appropriate metric, also known as Overall Evaluation Criteria(OEC), for the experiment is revenue per user.
Step 1 - Users for our experiment:
We have three following choices:-
a) All the users who visited the site
b) Users who completed the purchase process
c) Users who start the purchase process
Option C seems to be the best choice as it removes noise present in option A and option B impacts the total revenue generated, not the percentage of users who completes the purchase.
Step 2 - Hypothesis:??
Adding a coupon option to check out page will decrease the revenue-per-user for users who start the checkout process
H0: There is no difference in the revenue-per-user
Ha: Revenue-per-user is lower in the treatment group(having coupon code option)
Step 3 - Level of Significance (alpha):
It is the probability of rejecting the null hypothesis when it is true. It is also known as Type I Error.
For this experiment, we will be taking 5%, generally taken. Could have been changed to 10% if less conservative or 1% if more conservative.
Step 4 - Power of test:
领英推荐
It is the probability of detecting a meaningful difference between the variants when it really exists. It is generally taken between 80-90%, and we will take 80% for our experiment. The higher the power, the more will be the sample size required.
Power of test is also equal to 1 - Type II error.
Step 5 - Lift/ Practical Significance:
It is the minimum lift business wants to see, as it will make the investment on the experiment worth it.
For our experiment, we will take 1%.
Step 6 - Sample Size: To calculate the sample size we need to know the baseline conversion, conversion rate of existing system (control), is needed. We will consider it to be 20% for our case.
Using the calculator, we get sample size ~ 628K
Step 7 - Number of days test to be run
If the average number of customers initiating the checkout process, then we need around 15 days for the experiment to run.
Points to be noted while considering the number of days:
a) Day Of Week Effect - People behave differently on weekdays and weekends, so it's advisable to consider one whole week at least
b) Seasonality - Don't run experiment if there is an event like X'mas or Diwali as user behave differently during these time
c) Primacy and Novelty Effect - Some new features will instantly be used by users as they want to use it and some features will require to get used to it. In the former case, number of users will stabilize moving forward and in the latter case number of users will pick up.
Step 8 - Run the test and analyze the result
We can run t-test and compare the value of p with alpha.
P-value is the probability of getting the values as extreme as in the treatment group, given that the null hypothesis is true.
If p is less than alpha, reject null hypothesis and p is more than alpha, fail to reject null hypothesis.
For our case, results are below :
P-value is less than alpha, so we reject the null hypothesis.
It means adding coupon code option on checkout page is not a good idea.
This post is created using the material from the Book - Trustworthy Online Controlled Experiments by Ron Kohavi
Please feel to correct me if there are any issues.
Cover Picture is taken from - https://www.techtarget.com/searchbusinessanalytics/definition/A-B-testing
Senior Manager @ BOA | Ex-EXL | Fraud Risk | Credit Risk | Model Development | Business Strategy | BSFI Risk Management | Merchant Risk
2 年Good one Ankur ! Looking forward for your post on more topics ??
Vice President and Technical Fellow | Data Science, Engineering | AI, Machine Learning, Controlled Experiments | Ex-Airbnb, Ex-Microsoft, Ex-Amazon
2 年Thanks, Ankur Bhargava. If you or others are interested in an interactive Zoom class I teach, see https://bit.ly/ABClassRKLI