How to Master A/B Testing? Part 2
Imran Kapadia
Marketing Lead | Digital Performance Marketing | Google Black Belt | Growth Hacker | SEO | SEM | E-commerce | Analytics | PPC | CRM | Affiliates | CRO | Media | Marketing
In case you have not read my last post you should go to this link and check it out.
For people who have read my last article just to refresh your memory we talked about;
- Why do we need to do A/B test as a business?
- A framework that you can use for A/B testing your site.
- ROAR Model
- A/B test guide tool
- 6V Conversions Canvas
This week we will start off by learning how we can start our A/B testing by first working on our Hypothesis.
We start working on our Hypothesis based on a premise (Yes it's always research-based and Yes you are solving an already identified problem)
How to design and develop your hypothesis?
Whenever you design a hypothesis, it is better to use only A and B.
It should just be the default and a challenger.
Adding more test variations will have an impact on your minimal detectable effects.
Which means that you will be only able to detect the impacts of more than 20%.
That is why it’s better to run two simultaneous A/B experiments on the full population of the websites in the same location.
How to configure the test?
Wesseling suggests that the first variation you pick is actually the default.
Which is important to understand because you want to control the percentage of people that the particular site will be shown to.
Then your first variation, which you will set as the default, to 50%. And also the challenger, you will set to 50%.
A lot of people don't ask this question but they should.
Is there enough data to run an A/B test? And, what’s a good test duration?
I have explained this in detail in my last article as well but the rule of thumb is that you shouldn’t be running A/B tests on websites with fewer than 1000 conversions monthly.
If you have less than 1000 conversions, you cannot find a true winner and even if you could the chances are really low, the only reason for that is that in order to find a winner you will need to run your test for way too long.
This has to do with the concept of statistical power.
What the hell is statistical power?
One of the simplest definitions of statistical power is:
Statistical power is the likelihood that the experiment will detect an effect when there is an effect to be detected.
Before you jump on and start making tests you should keep the above details in mind.
ALWAYS!!!
- Null Hypothesis
H0 is when your change will not have an impact on your test variation. If you fail to reject the null hypothesis, you will assume as if the null hypothesis is true then you should not launch your new feature.
Yes, this will happen but don't worry if something is not working that is also learning that this will not work for your category or audience but this should not stop you from testing.
It is about the mindset of an optimizer that you need to do a lot of tests. You will hate to support a NULL HYPOTHESIS but you will need to explain to yourself and your team that we are moving in the right direction.
If anything seems right does not mean it is RIGHT.
- Alternative Hypothesis
H1 is alternate to the above-explained null hypothesis whereby the change will have an effect on your test variation. If you fail to reject the null hypothesis, you accept the alternative hypothesis and you should then go with whatever change tested.
In order to test your hypothesis you need to understand these concepts, you have to understand the difference between the two error types: false positive (type A) and false negative (type B).
if the above example is boring for you, please check the below example. source
What is False-positive error?
False-positive errors occur when your challenger page variation is not, in actual terms, better than the control page,
but the test shows that it is. Or, in statistical language, you reject a true null hypothesis (your original page is always considered a null hypothesis or H0).
What is False-negative error?
False-negative errors occur when your control page does perform worse than the challenger,
but the results show the opposite. Or, in statistical terms, you are not able to reject a false null hypothesis.
You need to be mindful that whatever results you are deriving from the tests is actually true in real sense.
You should not jump on to the conclusion that I have a winner here or my hypothesis failed. You need to make sure that you have enough data to validate or disapprove your premise.
How to pick the right KPIs for A/B testing?
Wesseling shares the pyramid model of 5 most popular A/B Testing KPIs:
- Clicks
- Behaviour
- Transactions
- Revenue per user
- Potential LTV
On the bottom of the pyramid are the easiest KPIs to influence — clicks and behaviour. It’s relatively easy to optimize for these types of actions because they’re low effort for users. Personally speaking, it does not mean if people are clicking more they will convert eventually. You need to be very mindful of the consumer journey here and the next action you want the user to take.
Am saying this because these actions might not end up benefiting your company’s bottom-line (Revenue).
Yet, when you have fewer than 1000 conversions this might be useful for you. If your data shows a strong correlation between certain website behaviours and conversions, you might opt for these KPIs. I call it Micro-KPIs which are directly contributing to your Macro-KPI (Business numbers)
In the middle of the pyramid, we have transactions which of course is the most encouraging and my personal favourite. It’s the most common testing KPI and often it’s the starting point for testing.
In some cases, focusing on transaction volumes is not always the best option.
Why?
If I am talking about leads or conversions with lower basket value then yes it will be beneficial for the business because some transactions are more valuable than others.
You, of course, need to optimize towards Revenue per user.
It’s not always easy to set up this kind of tests as the tools are typically geared towards binary values such as transactions.
You will need to run the experiment by only filtering the users with transaction volume higher than a certain number (e.g. >= 50$).
On top of the pyramid, we have potential LTV.
This is the most meaningful KPI for any business, but it’s really difficult to evaluate for a lot of businesses and most of the businesses does not calculate LTVs.
Way forward
If you are interested to learn more than check out resources available on CXL
Weekly Posts
Week 1
What is Growth Hacking for Beginners? - Detailed Review
Week 2 (1/2)
What is User-Centric Marketing and Why is it Important today?
Week 3 (2/2)
How can you increase the Conversion Rate of your website?
Week 4
Conversion Research Essentials - What should you look for?
Week 5
How to work on your Digital Measurement Strategy?
Week 6
Optimizing Conversion Rate using Google Analytics
Week 7