How A/B Testing Works
This article summarises Tom Wesseling’s lecture on mastering A/B testing.
A/B testing is rapidly evolving, and has gone through several reforms in the past years.
A/B testing, also known as split testing or bucket testing compares two versions of a webpage or app against each other to determine which one performs better. It is essentially an experiment where two or more variants of a page are shown to users at random, and statistical analysis is used to determine which variation performs better for a given conversion goal.
A/B testing is seen as the big silver bullet for companies who understand its value.
For Jeff Bezos, the CEO of Amazon, he attributes the success of his Multinational Technology Company to the number of experiments they run. Reflect on this question: if Amazon still runs weekly to yearly experiments, what stops you from carrying out A/B tests on your business, website, app, or campaign? — Except your business has less than 1000 conversions.
A/B testing aims for effectiveness that can bring you on top of efficiency. As explained by Tom, the big promise of A/B testing experimentation is to put effectiveness on top. You want to know for sure if you're going to do something that has an impact and you are making the move above and not based on just few or just some sprints and tests and opinions.
When to use A/B testing
There are three key objectives for running A/B tests:
· Deployment: When something is deployed on the website or app it could be for a new feature, an update, or several other reasons. This is usually carried out by the engineering team. These deployments may be delivered as experiments to test if it has no negative impact on the company’s KPIs.
· Research: This is basically using A/B-testing for research before you optimize which could be in the form of a fly-in, pop-up, or conversion signal on your website, app, or even campaign. For instance, you want to see if adding some elements on your product page will affect conversion, so you make two variations of your product page and then you run your A/B test to see if there will be a negative or positive impact and pick which one to optimize based on the results.
- Optimization: It is also referred to as lean deployment, and is mostly done by marketing. After you must have done your research to find out which test has more impact, you’re now ready to optimize (still run as an experiment) while you look out for wins. Here, you're looking for signals on impacts or flood line to understand what needs to be done.
Tom divides the course into 5 stages: Intro | Planning | Execution | Results | Outro.
The Planning Stage
Do you have enough data to conduct A/B tests?
This stage introduces the ROAR model, which explains how to calculate if you have enough data to conduct A/B tests.
Wesseling states that the rule of thumb border is 1,000 conversions per month — conversions here varies. It can be leads, purchases, or even clicks, it all depends on what you have chosen to measure as conversion. If you are below 1,000 conversions per month, you cannot run A/B tests; this is because it’s really hard to find a winner between two tests as you are just too low on data and even if you do find a winner, chances are pretty high this is not a real winner. This doesn’t stop you from deploying tests but you just can’t create variants or splits. What this means is that the higher your number of conversions, the more A/B tests you can run. Wesseling believes that if you have 10,000 conversions per month, you can run four or start four A/B tests per week, so you have 200 A/B tests to run in a year.
Deciding on KPIs for your tests
If you run an A/B test, what is/are your KPI(s)? Is it clicks? Is it transactions? Is it lifetime value? How do you decide?
A business probably has more than one key performance indicator, but your A/B test should also have its own key performance indicator. In A/B testing KPIs are often referred to as goal metric
The goal metrics are ranked from the most significant to the least. This can guide you in deciding on a goal metric for you’re A/B tests. Clicks are the easiest goal metrics to achieve and that’s why they rank lowest; it could be something as simple as changing the color of a button or making it bigger or offer discounts. If you’re familiar with running ads and carrying out tests, you know that this is accurate — a high CTR on a button doesn’t necessarily mean a significant uplift in behavior transactions.
Measuring behavior as a goal metric on the other hand can be good for businesses that have low conversions. Here, it can be an interesting metric to observe for your business. Essentially, the goal metrics to really test for and optimize start from transactions all the way to lifetime value.
Research
Understanding the 6V research model and how it can be used to generate user behavior insights
’Without an analyzed reason to start, it doesn’t make any sense to start at all.’
The 6V research model is all about desk research and is divided into six parts: value, versus, view, validated, verified, voice.
Value: What company values are important and relevant? What focus delivers the most business impact? You need to know the short-term goals, long-term goals, what’s the strategy of the company? What are the product focus and the KPI focus?
Versus: Carry out competitor analyses — Who are your competitors? Are there any market best practices you can use? Maybe your company is small you don’t know who your competitors are, one way to discover them is by searching on Google for the keywords you’ll like your products and services to be found with, you’ll see your competitors show up on the SERP. You could also use a tool like Alexa by typing in your company website and look for the audience overlap with your website?
Track changes on their website. You want to know if they lower a price if they come with a specific offer. If they change specific content, if they’ve changed USP’s on their website, because in the end, the customer journey on your website is similar to their journey to your competitor’s website.
Once you know who your competitors are, visit them, use their service, buy their products, and go through the full customer journey to understand what it feels like to be their customer. Trust me; you’ll be fascinated by your findings.
View: Here, we’re looking at the view of the customer. This is where data scientists and analysts pop in and really dive into analytics behavior.
There are a number of basic questions that you’ll need answers to. What does the first experience with the website or app feel like? What is the experience with the landing pages or product pages? What is the behavior of the users? Why did they visit the website and where did they come from? Are there specific channels these users are coming from? Are there notable differences in the way they engage with the products? Look out for new visitors and returning visitors too. Is there a change in behavior after a transaction?
Traffic source is also important, is it mostly paid advertising? Is it lots of social traffic? Is it just domain type-ins? Are a significant number coming through referral links? Who are those that already have a clear intent to do something?
What is the CTR? Where are customers dropping off or exiting the website? These are the questions you’ll find answers too when you dive into your data and you can create visual reports to share with the team. Segment your conversion data into users per device, content consumption, user per browser. You can use data sources like heat and scroll maps as well as screen recordings to check the time it takes them to move through your website and through your funnel.
Voice: Analytics can show you the behavior of your customers but it doesn’t tell you why they behave the way they do. So, you need to talk to your customer service, even take some calls or listen in on calls, go through chat logs and if you have a CRM platform you use, check that too. If your email marketing is not using a no-reply email, you can also read through the responses and questions customers respond to your email campaigns. There’s also social media to see what questions users have for your competitors. Create small feedback forms on your website or send feedback surveys to users that visited your website and those that actually carried out a transaction.
Verified: What scientific research, insights, and models are available for the test you want to carry out? For every niche, for every perk, every service, and every segment of users, studies have been done, and so it’s really valuable to understand what is known from existing scientific literature for your specific product or company, on a specific group of users. You can find such research on Google Scholar, there’s also Deep Dyve (Which requires a subscription). You can simply carry out a general Google search and find free existing research as well.
Validated: This stage explores already validated insights from previous experiments that you have carried out. Pull out those stored reports because there’s a chance that you have answers to questions on user behavior, transaction pattern, or just refresh your memory on certain things you’ve learned but forgotten about.
Hypothesis Setting
It’s very important to write a hypothesis before running a research or growth experiment. A hypothesis gets everyone on the team aligned on exactly what you want to research on and why you want to carry out the research.
With a hypothesis,
· You want to describe a problem,
· You want to have a proposed solution,
· You want to predict the outcome.
Once you formulate a proper hypothesis, “we have this problem, this is why we are trying this, and we hope to get this result” This way people at least know why they are doing something and it helps reduce counter-arguments and discussions.
Prioritize your A/B Tests
This is the last phase of the planning stage which focuses on prioritizing your entire hypothesis into a list and creating a roadmap of experiments you want to run.
You can cross your hypothesis on how strongly they match with the 6 Vs, product KPIs, and A/B test KPIs. With these, you can begin to fill out your list based on hypothesis strength and score your hypothesis in a specific order from those that will provide the highest value all the way down the one with the least priority.
Wesseling highlights the PIPE (Potential, Impact, Power, and Ease) framework for scoring hypothesis:
Potential: What is the chance of the hypothesis being true?
Impact: Where will this hypothesis have a bigger effect?
Power: What are the chances of finding a significant outcome?
Ease: How easy is it to test and implement?
In my next post on my learning experience at CXL Institute, I will write about the other parts of A/B testing which is the execution and result stage. It was a really tough class because it was my first time studying about A/B testing.