登录查看更多内容

The benefits of structured A/B testing

Iqbal Ali

Text-mining and analysis, founder @Ressada | Experimentation consultant, coach, training | Developer | Writer & Comics Author

发布日期: 2020年9月28日

I’ve seen firsthand how well-defined processes can transform teams into factories which efficiently deliver one effective experiment after another (I’ll define what I mean by “effective” later in the article).

The idea is to have documented processes for everything from hypothesising new ideas; to prioritising experiments; to determining what metrics we need for specific experiments; and more.

Having these processes means everyone involved with experimentation knows what they’re responsible for. They also know how to undertake their respective tasks so that they are done in a consistent and predictable way.

Since many of my articles involve processes, I thought I’d give you a demonstration of their power with a simple and fun experiment. If you’re interested, or if you need be persuaded about the usefulness of processes, read on.

The drawing experiment

During the lockdown, my 8-year-old son got into drawing. Not being super confident with his art skills, he started following Youtube video guides to help him create pictures of his favourite characters.

These guides present easy-to-follow processes that have viewers create great-looking drawings by following the step-by-step instructions. Take, for example, this drawing of Obi-Wan Kenobi:

The Youtuber in question talks through the creation of each line, communicating the purpose of each mark and describing how to draw the shapes necessary to recreate the character accurately. The entire process takes approximately 15 minutes.

I thought I’d use this video as an experiment for my son. He was thrilled to get involved!

The idea was to have him draw Obi-Wan twice. Once by referring only to the prototype drawing above; then a second time, going through the process presented in the video.

I’d then compare and contrast the two drawings to judge the effectiveness of the drawing process. Since it’s crucial to avoid subjectivity when reviewing the final works of art (because I’m going to love both drawings equally), I’m going to need some objective criteria to score each drawing.

Here is what I settled on:

Number of details captured
Number of lines captured
Accuracy of shape and line

Note that the goal is not to create a great-looking Ob-Wan, but instead, to draw one that matches the prototype as closely as possible. Freestyling was not allowed for this experiment.

Here’s how it all went.

Drawing 1: drawing WITHOUT a process

The following is what my son achieved after about 10 minutes or so of drawing. He had the prototype up on the screen and tried to recreate it as accurately as he could.

It’s a great drawing, but let’s evaluate it using those objective criteria we defined earlier — yeah, yeah, I know, I’m a fun dad:

When put side-by-side, we can tell straight away that details are missing (our first criteria). For example, among other things, Obi-Wan’s boots are missing, as are his eyebrows, and his all-important lightsaber!

Overall, there are fewer lines than the prototype (our second criteria). Notice how there are fewer lines to define the hair, beard and folds in the clothing.

In terms of accuracy of line and shape (our third and final criteria), we can see one arm is bigger than the other, and the head shape is a little wonky.

So, can a process help achieve a closer match to the prototype?

Drawing 2: drawing WITH a process

Full disclosure: my son never made it to the end of the process. By minute 13, my son was keen to get back to watching Star Wars: The Clone Wars. This is why he missed out the lightsaber. You see, I was eating into his TV time with this experiment. But you’ll be relieved to know that I duly extended his TV time as a reward for entertaining my obsessive need to make a point.

Anyway, by following the video guide, he achieved this:

Already, you can tell that the two are supposed to look the same.

Reviewing it against the criteria, we can tell that the drawing has missed fewer details. For instance, Obi-Wan has eyebrows now. He also has boots. The lightsaber is still missing (as I already mentioned), but you can’t have everything.

In terms of the number of lines: there’s roughly the same number of lines as the prototype. There are lines in the beard and hair, as well as folds in the clothing.

In terms of accuracy of line and shape: the head shape looks closer, and the two arms are more comparable in size now, though the legs are a little far apart for my liking, overall the accuracy is pretty decent. Like I said before: you can tell that the two are meant to be the same drawing.

Overall, the win goes to the second drawing!

What all this means

Okay, so what if one drawing captures more detail than another? And so what if one drawing is closer to the prototype? Both drawings look perfectly fine, right? Furthermore, what does all this have to do with A/B testing?

Let’s tackle that last question first, and the rest will fall into place. You see, there are many nuances in running an effective test program.

First, let’s define what makes an individual experiment effective. We can say an experiment has been effective if:

the experiment provides a read we can trust
the experiment provides us with actionable insights/learnings
the experiment either proves or disproves our hypothesis

This means if an experiment fails to meet the criteria above, then the experiment has failed to be effective.

Now, these individual experiments are cogs in a larger machine that we’ll refer to as the experiment program. An experiment program is when multiple experiments work together, each building on one another to create valuable learnings and improve our critical metrics (e.g. conversion rate).

Below is a list of things which make an experiment program effective. We can say a program is effective if:

the program enables a high output of tests
the program enables a steady flow of test results
the program enables a test and learn culture
the program focuses time and resources on experiments which have the highest return on investment

There’s more we can add, but these are a good starting point.

Failing to meet the criteria above means an experiment program will cost the business time and resources. It will also impact the volume of tests which are run. There may also be impacts on learning from the experiments in general — especially if you can’t rely on the test reads.

Overall, all of those criteria I’ve mentioned above constitute our Obi-Wan prototype. Not living up to this model means we have ineffectual experiments in a potentially failing program.

Details

Just like remembering to draw details such as Obi-Wan’s boots and eyebrows, a good experiment process means we avoid missing aspects of our build. These missing details can result in our test reads being invalid.

Imagine having skewed segments in our test groups. That would render the test read invalid. A process can help avoid that. Failing that it could highlight the problem to us so we don’t make expensive decisions based on a potentially wrong outcome.

The following are examples of some processes we could use:

how to balance risk vs. testing big changes
ensure the right metrics are added to an experiment
prioritisation our backlog of experiments

Lines

Even if we’ve captured these details, missing intricacies is like missing the lines from the beard. For us, this can also result in failed experiments. Capturing these nuances helps us get closer to our idealised prototype model.

Examples: when deciding how to approach testing big changes or adding secondary metrics to your experiment, a process helps capture the nuances of specific scenarios, so you don’t miss those essential details. For prioritisation of experiments, a process ensures tightness and objectivity.

Overall accuracy

So then let’s look at accuracy. The drawing process ensured we drew a decent head shape and ensured that arms are the same size. Repetition and practice further improve skills in those areas. In the same way, having a process for experimentation means we enable greater accuracy by having everyone practice the same techniques. We also ensure consistency.

Examples: for designing big tests, setting up of secondary metrics, and prioritising experiments: a process helps create guardrails ensuring details are consistent and recognisable. This in turn helps debug and review experiments if issues arise. Processes also allow for learning of techniques and skills at a deeper level as they are practised.

Wrap up

When it comes to experiments, we either have generalists (resources who are responsible for a broad selection of tasks), or specialists (e.g. resources who are responsible for their specialised area of expertise). Processes help both.

If you’re a generalist, a process helps with quality and thoroughness. If you’re a specialist, it doesn’t necessarily mean you have experience dealing with experiments, in which case a process helps fill in those knowledge gaps and maintain consistency and (again) thoroughness.

Note: we must never underestimate the importance of consistency. It helps make reviewing and debugging easier, even allowing the creation of new processes to cover those aspects! All this is especially useful if dealing with multiple experiments from multiple teams.

You might have noticed that I’m a fan of processes. They’ve not only helped me roll out an experiment program across an organisation, but they’ve also enabled me to create multiple graphic novels — something I’ve long struggled with before I developed those processes.

No kids had their feelings hurt during the making of this article.?

I’m Iqbal Ali. Former Head of Optimisation at Trainline. Now an Optimisation Consultant, helping companies achieve success with their experimentation programs. I’m also a graphic novelist in my spare time.

要查看或添加评论，请登录

Iqbal Ali的更多文章

A deeper look into whether we should copy our competitors

2022年12月6日

A deeper look into whether we should copy our competitors

"Good artists copy, great artists steal." This is a quote that Steve Jobs famously attributed to Picasso.

10 条评论
Predict the future and maximise the learnings from A/B tests

2022年9月9日

Predict the future and maximise the learnings from A/B tests

Is there such a thing as a failed experiment? The standard answer is "no because we’ll still have learnt something"…

12 条评论
How to analyse A/B experiments using bayesian "expected loss"

2021年4月15日

How to analyse A/B experiments using bayesian "expected loss"

A step-by-step how-to guide for working out Bayesian Expected Loss Let’s imagine we’ve had an A/B experiment running…

3 条评论
What can comics teach us about A/B experiment analysis?

2021年2月25日

What can comics teach us about A/B experiment analysis?

Telling the story of risk, reward and certainty using the principles of comic panel transitions When I first started…
A visual guide to why your company needs to be experimenting

2021年1月13日

A visual guide to why your company needs to be experimenting

I was recently approached by a client who needed to explain the benefits of experimentation to his company. My first…

2 条评论
When to stop A/B experiments early

2020年11月30日

When to stop A/B experiments early

Let’s talk about decision processes for stopping A/B experiments early. By that, I don’t mean concluding an experiment.
The US Election 2020 and the dangers of “peeking” too early on experiment results

2020年11月6日

The US Election 2020 and the dangers of “peeking” too early on experiment results

It’s 08:41 a.m.
How to determine what metrics you need for an A/B?test

2020年8月31日

How to determine what metrics you need for an A/B?test

Is there such a thing as a failed experiment? The standard answer is 'no because you'll still have learnt something'…
8 conversion levers for Conversion Rate Optimisation

2020年8月13日

8 conversion levers for Conversion Rate Optimisation

Conversion Rate Optimisation can be a wildly chaotic exercise—like blindly throwing hundreds of darts hoping that…
How to productise an Optimisation Process (aka: an exercise in cloning)

2020年8月3日

How to productise an Optimisation Process (aka: an exercise in cloning)

How does one scale themselves to avoid becoming a bottleneck? That was the question I needed answering when, by chance,…

See all articles

The benefits of structured A/B testing

Iqbal Ali

Text-mining and analysis, founder @Ressada | Experimentation consultant, coach, training | Developer | Writer & Comics Author

The drawing experiment

Drawing 1: drawing WITHOUT a process

Drawing 2: drawing WITH a process

What all this means

Details

Lines

Overall accuracy

Wrap up

Iqbal Ali的更多文章

社区洞察

其他会员也浏览了

The Strategy Pattern - as easy as it gets (for kids and Grown-Ups Too!): Design Patterns Demystified

A 14th-century theologian, Steve Jobs and the Rolling Stones all keeping it Simple

Tools and Technologies I've Used the Most in 2024

Day 3 of 22 – From Wireframe to Components

What Could Go Wrong? Thinking Like a Bad Actor Can Help You Answer That - and Build Safer Products.

FINDING EIGENVALUES AND EIGENVECTOR OF A MATRIX WITH SMATH STUDIO

From Sprints to Milestones | State of Matterless #35

Go Live: Different approaches to livestreaming; John tries to learn DaVinci Resolve

The most gigantic release we ever did | 4.8

Digital Darwinism, the DC* Way - Article 2

The drawing experiment

Drawing 1: drawing WITHOUT a process

Drawing 2: drawing WITH a process

What all this means

Details

Lines

Overall accuracy

Wrap up

Iqbal Ali的更多文章

A deeper look into whether we should copy our competitors

Predict the future and maximise the learnings from A/B tests

How to analyse A/B experiments using bayesian "expected loss"

What can comics teach us about A/B experiment analysis?

A visual guide to why your company needs to be experimenting

When to stop A/B experiments early

The US Election 2020 and the dangers of “peeking” too early on experiment results

How to determine what metrics you need for an A/B?test

8 conversion levers for Conversion Rate Optimisation

How to productise an Optimisation Process (aka: an exercise in cloning)

社区洞察

其他会员也浏览了

The Strategy Pattern - as easy as it gets (for kids and Grown-Ups Too!): Design Patterns Demystified

A 14th-century theologian, Steve Jobs and the Rolling Stones all keeping it Simple

Tools and Technologies I've Used the Most in 2024

Day 3 of 22 – From Wireframe to Components

What Could Go Wrong? Thinking Like a Bad Actor Can Help You Answer That - and Build Safer Products.

FINDING EIGENVALUES AND EIGENVECTOR OF A MATRIX WITH SMATH STUDIO

From Sprints to Milestones | State of Matterless #35

Go Live: Different approaches to livestreaming; John tries to learn DaVinci Resolve

The most gigantic release we ever did | 4.8

Digital Darwinism, the DC* Way - Article 2