A/B testing mastery

A/B testing mastery

Introduction

A/B testing splits traffic 50/50 between a control and a variation. A/B split testing is a new term for an old techniquecontrolled experimentation.

A/B testing is fun. With so many easy-to-use tools, anyone can—and should—do it. Yet for all the content out there about it, people still test the wrong things and run A/B tests incorrectly.

There are so many great tools available that make testing easy, but they do not think for you or offer the most elaborate details. You still need to rack your brains to answer the numerous questions that abound in A/B testing e.g.

  • Should you lower the price to sell more?
  • Or raise them to increase the average basket, at the risk of a lower conversion rate?
  • Should products be sorted by increasing price? Or decreasing?
  • Should you broaden your range of products upward or downward? Or both? Or neither?
  • Is an offer such as “3 for the price of 2” a good way to increase your average basket?
  • Should you offer free shipping? Or only from a certain purchase basket value?

Wouldn’t it be great if you could run business experiments to test these hypotheses and make the right decisions? Unfortunately, the statistical analyses used today are very limiting in terms of interpreting the results. 

The Basic Principle of A/B Testing

A/B testing consists of exposing two variations (called A and B) of the same web page to two homogeneous or fully identical populations by randomly splitting the website visitors.

For each variation, we can collect:

  • The number of visitors 
  • The number of purchases 
  • The value of the purchase basket

On paper, it should be quite simple to determine which variation generated the most revenue, hence define the better variation.

However, like any experiment on human behavior, the data is subject to change randomly. If variation B generates a more substantial average basket size than variation A, it doesn’t necessarily mean that B will always be better than A. 

Indeed, it is difficult to assert that a difference observed during a test will recur in the future.

That is the reason why A/B testing tools use statistical analyses to qualify the observed differences and identify the winning variation.

They aim to help sort out the significant data from random and unpredictable fluctuations that are not correlated to the differences between the variations.

In e-commerce, variation B can be considered a “winner” if it generates:

  • A conversion gain: more sales are concluded with this variation.
  • A gain in the average shopping basket: the average shopping basket of variation B is higher than A.
  • A mixed gain: variation B generates both a conversion gain and a gain in the average shopping basket.

Common A/B test mistakes to avoid:

However, there is more to it than just setting up a test. Tons of companies are wasting their time and money.

Here are the 12 A/B test mistakes that are most common across businesses:

  1. Calling A/B tests early;
  2. Not running tests for full weeks;
  3. Doing A/B tests without enough traffic (or conversions);
  4. Not basing tests on a hypothesis;
  5. Not sending test data to Google Analytics;
  6. Wasting time and traffic on stupid tests;
  7. Giving up after the first test fails;
  8. Failing to understand false positives;
  9. Running multiple tests at the same time on overlapping traffic;
  10. Ignoring small gains;
  11. Not running tests all the time;
  12. Not being aware of validity threats.


A/B testing plan

A strong A/B testing plan will allow you to increase your revenue and learn valuable insights about your customers.

Below is a list of the main basic issues and concerns to wrap your head around and focus on in A/B testing:

  1. What is A/B testing?
  2. How to Improve A/B Test Results
  3. How to Prioritize A/B Test Hypotheses
  4. How Long to Run A/B Tests
  5. How to Set up A/B Tests
  6. How to Analyze A/B Test Results
  7. How to Archive Past A/B Tests
  8. A/B Testing Statistics
  9. A/B Testing Tools and Resources

A/B testing basics

When researchers test the efficacy of new drugs, they use a “split test.”

Most research experiments could be considered a “split test,” complete with a hypothesis, a control, a variation, and a statistically calculated result.

For example, if you run a simple A/B test, it would be a 50/50 traffic split between the original page and a variation:


For conversion optimization, the main difference is the variability of Internet traffic. In a lab, it’s easier to control for external variables. Online, you can mitigate them, but it’s difficult to create a purely controlled test.

In addition, testing new drugs requires an almost certain degree of accuracy. Lives are on the line. In technical terms, your period of “exploration” can be much longer, as you want to be damn sure that you don’t commit a Type I error (false positive).

False Positive

A false positive, as you might guess, is when a test result indicates a variation shows an improvement when in reality it doesn’t.

It’s often the case with false positives that version B gives the same results as version A (not that it performs less well than version A).

While by no means innocuous, false positives certainly aren’t a reason to abandon A/B testing. Instead, you can adjust your confidence interval to fit the risk associated with a potential false positive.

Online, the process for A/B split-testing considers business goals. It weighs risk vs. reward, exploration vs. exploitation, science vs. business. Therefore, we view results through a different lens and make decisions differently than those running tests in a lab.

You can, of course, create more than two variations. Tests with more than two variations are known as A/B/n tests. If you have enough traffic, you can test as many variations as you like.

Here’s an example of an A/B/C/D test, and how much traffic each variation is allocated:


A/B/n tests are great for implementing more variations of the same hypothesis, but they require more traffic because they split it among more pages.

A/B tests, while the most popular, are just one type of online experiment. You can also run multivariate and bandit tests.

A/B Testing, multivariate testing, and bandit algorithms: What’s the Difference?

A/B/n tests are controlled experiments that run one or more variations against the original page. Results compare conversion rates among the variations based on a single change.

Multivariate tests

They test multiple versions of a page to isolate which attributes cause the largest impact. In other words, multivariate tests are like A/B/n tests in that they test an original against variations, but each variation contains different design elements. 

For example:

No alt text provided for this image


Each element has a specific impact and use case to help you get the most out of your site. Here’s how:

  • Use A/B testing to determine the best layouts.
  • Use multivariate tests to polish layouts and ensure all elements interact well together.

You need to a ton of traffic to the page you’re testing before even considering multivariate testing.

But if you have enough traffic, you should use both types of tests in your optimization program.

Most agencies prioritize A/B testing because you’re usually testing more significant changes (with bigger potential impacts) and because they’re simpler to run. As Peep once said, “Most top agencies that I’ve talked to about this run ~10 A/B tests for every 1 MVT.”

Bandit algorithms

They are A/B/n tests that update in real-time based on the performance of each variation.

In essence, a bandit algorithm starts by sending traffic to two (or more) pages: the original and the variation(s). Then, to “pull the winning slot machine arm more often,” the algorithm updates based on which variation is “winning.” Eventually, the algorithm fully exploits the best option:

No alt text provided for this image


One benefit of bandit testing is that bandits mitigate “regret,” which is the lost conversion opportunity you experience while testing a potentially worse variation.

Bandits and A/B/n tests each have a purpose. In general, bandits are great for:

No matter what type of test you run, it’s important to have a process that improves your chances of success. This means running more tests, winning more tests, and making bigger lifts.

Research: Getting data-driven insights

To begin optimization, you need to know what your users are doing and why.

Before you think about optimization and testing, however, solidify your high-level strategy and move down from there. So, think in this order:

  1. Define your business objectives.
  2. Define your website goals.
  3. Define your Key Performance Indicators.
  4. Define your target metrics.
No alt text provided for this image

Once you know where you want to go, you can collect the data necessary to get there.

To do this, you can use the highly recommended ResearchXL Framework by CXL, in the form of this executive summary:

  1. Heuristic analysis;
  2. Technical analysis;
  3. Web analytics analysis;
  4. Mouse-tracking analysis;
  5. Qualitative surveys;
  6. User testing and copy testing.


要查看或添加评论,请登录

Jackson Kabui的更多文章

社区洞察

其他会员也浏览了