登录查看更多内容

All Else Equal

Daniel Tunkelang

Query Understanding

发布日期: 2024年9月10日

In The Three-Body Problem , Liu Cixin describes how an alien species drives scientists to suicide by making it impossible for them to produce consistent experimental results. Some might find it difficult to relate to the scientists’ existential despair, but I found the premise compelling and chilling.

In this post, I do not tackle anything so sinister or abstract. Rather, I challenge a key assumption of A/B testing — namely, that all else is equal. I hope to inspire curiosity and reflection rather than existential despair.

A/B Testing: A Simple Example

A/B testing is the most popular method for online experimentation. It compares two versions of an application to determine which one performs better. Typically, one is the “treatment” we are considering as a change and the other is a “control” that represents the current state of the application.

For example, consider a simple A/B test to determine whether increasing the page size for search results from 10 (control) to 20 (treatment) leads to an increased conversion rate. This is about as simple an A/B test as it gets.

Let us imagine that the test is successful, delivering a statistically significant increase in the conversion rate. What does this tell us?

The World is Not So Simple

The answer may seem obvious: doubling the page size increases the conversion rate. More precisely, this result only holds if we hold all else equal — since the change in page size might interact with other changes, such as changing the page design. However,.we have to be even more pedantic: the result only holds given the current state of the world.

Consider the factors of screen size and network latency, both of which are determined by the searchers’ devices and locations. Both of these factors interact with page size to affect the experience. Increasing the page size may increase conversion in one set of conditions but decrease it in others.

AI for Good 2 个月前

The Snowball Effect

Mukul Pal 2 年前

The Tangled Web: Why Syllogistic Logic Struggles in…

Towfik Alrazihi 5 个月前

In the physical world, we do not generally worry about the laws of nature being time-dependent. We treat the law of gravity and the speed of light as constants. However, the digital world changes far more rapidly than the physical one, as does user behavior. That makes it dangerous to assume that the conditions for an experiment hold indefinitely.

Do Not Despair!

If you rely on AB testing as part of your day job, you might find this state of affairs disheartening. But please do not despair! We have it much better than the scientists in The Three-Body Problem. No aliens are out to get us!

Fortunately, there are things we can do to detect changes likely to invalidate our experiments over time. Here are a few:

Reverse Testing. You can revisit an A/B test by reversing it — that is, using the current version as the control and the old version as the treatment. The catch is that maintaining the ability to perform reverse tests requires discipline and can incur technical debt .
Long-Term Holdout. Typical A/B tests are short, e.g., two weeks. Running the control for longer (e.g., three months or a year) hedges against conditions changing during that time.
Monitoring. While it is important to look at metrics when evaluating a change as part of an A/B test, it is also important to look broadly at metrics over time when you are not making any changes. Trends or sudden changes in metrics can tell you when the world is changing.
Snapshots. While monitoring can alert you to unexpected changes in metrics, a more direct approach is to take a snapshot of metrics that hold at the time an A/B is conducted. Changes in those are particularly likely to invalidate the test results.

This list is not exhaustive. Hopefully, it helps you think about ways to keep in mind conditions outside the explicit scope of your experiments.

The Only Constant is Change

As Heraclitus said, the only constant is change. When we perform A/B tests, we need to bear in mind that the results assume present conditions that are subject to change. As Ferris Bueller warned us, “Life moves pretty fast. If you don’t stop and look around once in a while, you could miss it”.

Russell Jurney

Graphs and Generative AI

2 个月

Hahahaha, nice opener.

1 次回应

Joel Barajas

PhD, Principal Data Scientist, Ad Measurement Architect at Walmart Ads

2 个月

I am glad you are talking about the dynamics of changing conditions (often over-looked by people running the tests). You are mainly touching about the external validity of a result (say it was tested in June) to hold all the time moving on (say in Christmas holidays). I tend to believe that the misconception comes from importing RCTs from medical treatments, where people’s health outcomes are easier to extrapolate. Long-term holdouts are probably best, but they are noisier (smaller groups) and sometimes difficult to disentangle if you have >5 small changes released over the course of the holdout. Probably it is better to to keep questioning side effects or unexpected behavior that trigger a new test again changing the “improved version”. This probably what keeping A/B tests even for mature products is needed to adapt to a ever-changing landscape. Just a POV

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

All Else Equal

Daniel Tunkelang

Query Understanding

A/B Testing: A Simple Example

The World is Not So Simple

领英推荐

Do Not Despair!

The Only Constant is Change

更多精彩文章

社区洞察

其他会员也浏览了

The future for mankind, part 1: Homo Digitalis

Meaning and the hard problem of life By Gary Tomlinson

RESPONDING THIS THOUGHT PROVOCKING PICTURE...

URSABLOG: If You Can’t Take A Joke

The Conscious Web

5 parables from science & technology that Illustrate the nature of God

WHAT I GOT OUT OF "ALGORITHMS TO LIVE BY"

Math formulae to predict algotrading strategies?

The Logical Framework of HU - Topology

A/B Testing: A Simple Example

The World is Not So Simple

领英推荐

Do Not Despair!

The Only Constant is Change

Quo Vadis Nunc, Quora?

2024年9月25日

Seriously or Literally?

2024年9月18日

Cold Start, Practical Edition

2024年9月16日

Take Searchers Seriously, Not Literally

2024年9月4日

Hallucinating a Post-Search World

2024年8月30日

Handling Facets With Many Values

2024年8月21日

Facets, But Which Ones?

2024年8月15日

Search and Discovery

2024年8月13日

Where Do LTR Labels Come From?

2024年8月6日

How to be a Search Consultant

2024年8月2日

社区洞察

其他会员也浏览了

The future for mankind, part 1: Homo Digitalis

Meaning and the hard problem of life By Gary Tomlinson

RESPONDING THIS THOUGHT PROVOCKING PICTURE...

URSABLOG: If You Can’t Take A Joke

The Conscious Web

5 parables from science & technology that Illustrate the nature of God

WHAT I GOT OUT OF "ALGORITHMS TO LIVE BY"

Math formulae to predict algotrading strategies?

The Logical Framework of HU - Topology