Four Questions to Help You Predict uplift of an A/B Test
I was explaining to a friend of mine the other day how to calculate the sample size of an A/B test to understand the test duration- by taking the current conversion rate from the page(s) we are planning to test and the uplift (or minimum detectable effect) we are predicting to get from the test (and optionally we can select a statistical significance and/or power of the test and simply apply this on a statistical model to find the sample size).
Rightly so, he asked, so how can you tell the amount of uplift you are going to achieve as you haven’t done the test yet! I thought that was a brilliant question and whilst there are a number of articles talk about test prioritisation or sample size calculation where you apply the impact/uplift you will be able to achieve from the test, I Googled and to be honest, on my surprise, I wasn’t able to find a clear step by step approach or any methods to predict the uplift of an A/B test – which is clearly one of the most important thing for any CRO practitioner!
This does not have to be something that you put your finger in the air and guess a number! Trust me, it is, from my experience, a lot simpler than that. You should ask yourself these four questions when trying to predict the uplift of a test:
1. How important is this element or the set of elements for the user to accomplish their task/journey in order to convert?
Two things here that you should look for: First, the more important the element is to accomplish the task, the more likely you are going to see a shift in behaviour. Second, the amount of shift in behaviour is dependent on the amount of changes you are making to that element or set of elements.
CTA buttons are a classic example –“you should test your CTAs†and the fundamental reason for that is – that will directly interfere with the users’ journey to accomplish a task. Now, within the CTA itself, changing the text or colour or position or all of them together will provide different results.
By the way I am not at all suggesting to spend your entire effort on just testing different variations of the CTA as that will not be very cost effective. Consider making a significant amount of change and then roll this back to have a better testing roadmap – you get the idea right?
In contrast, let’s consider you are making a change to the recommended products that are being displayed on the product details pages. Again – product details page itself is an extremely important page (depending on the website and what you are selling) however, changing the recommended products (such as you may also like) at that page might not have a significant impact on users’ decision making to complete their task. Unless, you target is to increase interaction to that area you might want to predict a smaller uplift for those tests.
2. How persuasive these changes are to change users’ behaviour to convert?
I have to refer to B. J. Fogg’s paper “A Behavior Model for Persuasive Design†as I think this is one of the best paper that explains how you can try to persuade someone to make an action. In summary, you have one axis that defines the ability to make an action (the higher the ability the better the possibility of taking the action) and on another axis, you have the motivation factors (the more motivation you can provide, the better chance for you user to take that action) and you have an area at the top right corner where you can persuade the users to trigger the action.
When you are coming up with a test idea, you simply ask yourself whether you are increasing one or the other or the both. Depending on the type of motivation your changes are bringing in or increased ability you are introducing, you can assess and predict the behavioural shift. For example, fear is a great motivator – fear of losing something (only 3 items left), fear of losing out (your friends already purchased this), or even fear of not having a great Xmas present (only 5 days left for Xmas) can work amazingly to persuade your users to take an action.
Similarly, reducing disruption on your customers’ journey might help increase ability – such as clearly defining the next steps, displaying clear steps within the funnel, or even sometimes an overlay at the right time can work to increase ability to persuade users.
3. What did you learn from your previous experience?
This is something where you can look back on the tests you have done before and what was the result of it. What did you learn by making similar changes? How much lift did you achieve? And most importantly, try to answer the rationale behind the uplift you previously achieved by answering the first two questions for that previous test. You will be able to immediately see a pattern and this will help you out with predicting the uplift for the test in hand.
4. How much is the current conversion rate?
I kept this question at the end and this is why: if the area you are trying to change already converts better, you will probably struggle to improve that area. For example, let’s say from the 3D secure page (something that many card users are familiar with when they have to provide random characters of their password to complete the purchase) you get around 90% conversion rate. If you are trying to make changes to that page to get an extra 5%, in theory, you might need to work very hard to do so. Once you have the answers of the first three questions, look at the current conversion rate to get an understanding as to how much uplift you can get on top of your current rate - your predicted uplift should be higher when the current conversion rate is low and vice versa.
Now the most important thing to remember, even if you are predicting the uplift with answering all of the questions above, your prediction might not always be true – at the end of the day you need to remember that this is a test. You have a hypothesis and you are TESTING that hypothesis. This is a guide to help you find the test duration, and also prioritise your tests against each other based on the predicted uplift you are getting for each test. If the prediction is always right, we would probably be in a completely different industry! I hope this helps moving forward to plan your tests.