Basics and Example of A/B Test

Basics and Example of A/B Test

In this article, we will be covering the basics of A/B testing.


Before understanding the basics and various aspects of A/B testing, we need to know why A/B testing is required.


A/B testing is required when we want to improve a particular aspect of a product. For example, changing the colour of the submit button on a form or evaluating a three-page form vs a one-page form having all the information. The current version is control and newer version is called treatment.


Consider a website planning to add a coupon code option and test out if adding a coupon code option leads to reduction in revenue.

No alt text provided for this image

Before we design an experiment to test our hypothesis, let's look at the user journey for this typical ecommerce website.

No alt text provided for this image

It is important to finalize the metric which would be considered for evaluating the success of the experiment. An appropriate metric, also known as Overall Evaluation Criteria(OEC), for the experiment is revenue per user.


Step 1 - Users for our experiment:

We have three following choices:-

a) All the users who visited the site

b) Users who completed the purchase process

c) Users who start the purchase process

Option C seems to be the best choice as it removes noise present in option A and option B impacts the total revenue generated, not the percentage of users who completes the purchase.


Step 2 - Hypothesis:??

Adding a coupon option to check out page will decrease the revenue-per-user for users who start the checkout process

H0: There is no difference in the revenue-per-user

Ha: Revenue-per-user is lower in the treatment group(having coupon code option)


Step 3 - Level of Significance (alpha):

It is the probability of rejecting the null hypothesis when it is true. It is also known as Type I Error.

For this experiment, we will be taking 5%, generally taken. Could have been changed to 10% if less conservative or 1% if more conservative.


Step 4 - Power of test:

It is the probability of detecting a meaningful difference between the variants when it really exists. It is generally taken between 80-90%, and we will take 80% for our experiment. The higher the power, the more will be the sample size required.

Power of test is also equal to 1 - Type II error.


Step 5 - Lift/ Practical Significance:

It is the minimum lift business wants to see, as it will make the investment on the experiment worth it.

For our experiment, we will take 1%.


Step 6 - Sample Size: To calculate the sample size we need to know the baseline conversion, conversion rate of existing system (control), is needed. We will consider it to be 20% for our case.

Using the calculator, we get sample size ~ 628K

Step 7 - Number of days test to be run

If the average number of customers initiating the checkout process, then we need around 15 days for the experiment to run.

Points to be noted while considering the number of days:

a) Day Of Week Effect - People behave differently on weekdays and weekends, so it's advisable to consider one whole week at least

b) Seasonality - Don't run experiment if there is an event like X'mas or Diwali as user behave differently during these time

c) Primacy and Novelty Effect - Some new features will instantly be used by users as they want to use it and some features will require to get used to it. In the former case, number of users will stabilize moving forward and in the latter case number of users will pick up.


Step 8 - Run the test and analyze the result

We can run t-test and compare the value of p with alpha.

P-value is the probability of getting the values as extreme as in the treatment group, given that the null hypothesis is true.

If p is less than alpha, reject null hypothesis and p is more than alpha, fail to reject null hypothesis.

For our case, results are below :

No alt text provided for this image

P-value is less than alpha, so we reject the null hypothesis.

It means adding coupon code option on checkout page is not a good idea.


This post is created using the material from the Book - Trustworthy Online Controlled Experiments by Ron Kohavi

Please feel to correct me if there are any issues.

Cover Picture is taken from - https://www.techtarget.com/searchbusinessanalytics/definition/A-B-testing

Sumit Kumar

Senior Manager @ BOA | Ex-EXL | Fraud Risk | Credit Risk | Model Development | Business Strategy | BSFI Risk Management | Merchant Risk

2 年

Good one Ankur ! Looking forward for your post on more topics ??

Ron Kohavi

Vice President and Technical Fellow | Data Science, Engineering | AI, Machine Learning, Controlled Experiments | Ex-Airbnb, Ex-Microsoft, Ex-Amazon

2 年

Thanks, Ankur Bhargava. If you or others are interested in an interactive Zoom class I teach, see https://bit.ly/ABClassRKLI

要查看或添加评论,请登录

Ankur Bhargava的更多文章

  • ?? Decoding Tokenization: The Building Block of Large Language Models (LLMs) ??

    ?? Decoding Tokenization: The Building Block of Large Language Models (LLMs) ??

    Today, let’s dive into one of the foundational aspects of LLMs: Tokenization. Imagine taking a vast, complex puzzle…

  • Empowering LLMs with Tools: The Agentic Path to Smarter AI

    Empowering LLMs with Tools: The Agentic Path to Smarter AI

    The true potential of Large Language Models (LLMs) lies not just in their ability to process language but in how they…

  • Large Language Model Embeddings Fundamentals

    Large Language Model Embeddings Fundamentals

    Imagine an intricate web, woven from threads of words and meaning, stretching infinitely across a hidden landscape…

    1 条评论
  • Critical Pain Points in Retrieval Augmented Generation (RAG)

    Critical Pain Points in Retrieval Augmented Generation (RAG)

    Retrieval Augmented Generation (RAG) stands as a pinnacle in harnessing the power of Large Language Models (LLMs) to…

    2 条评论
  • ROUGE and BLEU Score

    ROUGE and BLEU Score

    Let's dive into the world of evaluating text generated from Large Language Models (LLMs) and explore the metrics that…

    1 条评论
  • Results to Decision - A/B Test

    Results to Decision - A/B Test

    Few Days back, I wrote an article on how to perform an A/B testing. Once we have done our experiment, now it is the…

  • Training Data

    Training Data

    In the chapter 3, Training Data, of the book Designing Machine Learning Systems, author Chip Huyen has talked about how…

  • Basics of Data Engineering

    Basics of Data Engineering

    In the chapter 2, Data Engineering Fundamentals, of the book Designing Machine Learning Systems, author Chip Huyen has…

    1 条评论
  • Designing Machine Learning Systems

    Designing Machine Learning Systems

    Designing Machine Leaning Systems is an amazing and insightful book written by Chip Huyen. It's a wonderful book if…

    6 条评论

社区洞察

其他会员也浏览了