登录查看更多内容

点击“继续加入或登录”，即表示您同意遵守领英的《用户协议》、《隐私政策》及《Cookie 政策》。

The Problem with Explanationless A/B Testing

Al Pittampalli

Senior Data / Decision Scientist

发布日期: 2022年5月2日

+ 关注

Progress According to Pragmatists

In the wide world of A/B testing, you will encounter two kinds of experimenters: pragmatists and realists. Pragmatists are focused on usefulness. On discovering what works. They wake up each morning with the goal of generating effective treatments.?

Realists, on the other hand, are concerned with?understanding. They want to know?why?things work. They yearn, oh how they yearn, to generate good?explanations. Thick descriptions of what is happening behind the curtain, back there in the hidden-from-view realm we call?reality.?

I’m a card-carrying realist. Always have been. However, much to my chagrin, the vast majority of A/B testing professionals are pragmatists. Take just one example of the kind of usefulness-centered tests that pragmatists conduct. This one, taken from the Netflix Tech Blog,?aimed to identify the best performing image for the original Netflix movie?The Short Game.

Their process was straightforward: select a sample of users, choose a few promising variants, randomize, and then observe which one achieves the highest clickthrough rate. In this case, the middle image depicting an adult caddy and young golfer emerged as the winner. Presto, there you have it, hail the pragmatists: a genuine, hot-off-the-grill, scientifically-validated, effective treatment.?

What you don’t have though, realists mutter under our breaths, is an explanation. For that you would need a clear understanding, the deeper the better, of what it was about that?particular?image, in contrast to the others, that brought about the positive effect.?

Pragmatists, though, remain unbothered by this lack of understanding. To be sure, they rarely dismiss explanation entirely. They just don’t see it as a priority. Ask a pragmatist, as I sometimes do, one who just unearthed a shiny, new, effective treatment,?why?they think their treatment worked, and they may giddily, perhaps on a lunch break, join you in a speculative, explanation-focused conversation. So many interesting theories might they conjure, that, honestly, you may not be able to shut them up. But then lunch will end. And your colleague will insist on getting back to their real job: generating more effective treatments. Now, should you attempt to continue the conversation, inquiring seriously about how we might use the tools of scientific experimentation to go beyond speculation, they will, eventually, inevitably, grow irritated with what they view as intellectual indulgence. “Al, if you want to know?why?my treatment worked so badly maybe you ought to take yourself to academia.” They’ll say, “Al, not sure you understand, buddy, we get paid to generate solutions here not explanations.” Keep pressing and watch them get downright rude, “Al, it’s 2am in the morning. Stop. Calling. Me.”?

To pragmatists, explanations are unnecessary because what?their?after is progress. And when pragmatists examine their progress-focused workflow—select treatments, perform tests, scale the winners—they don’t see how any one of these steps hinges on explanations.?

What’s more, some pragmatists I think deep down believe, whether they admit it or not, explanations aren't just unnecessary, they're?counterproductive. In?Experimentation Works, Harvard Business School Professor Stefan Thomke argues that since “most progress is achieved by implementing hundreds or thousands of minor improvements that can have a big cumulative impact,” organizations need to perform a “massive number of experiments” fast. Fast. FASTER. An approach he calls high velocity incrementalism. To be sure, in true pragmatist form, Thomke doesn’t completely jettison the notion of explanation, he even pays lip service to it a few times. But the vast majority of A/B testing examples he cites are flagrantly explanationless. Unsurprising, since slowing down to seek out explanations is simply incompatible with his high velocity approach. It’s impossible, therefore, for a reader to walk away from his book with anything other than the message: “Explanations? Don’t bother.”?????

In this essay, I will argue that the pragmatists are wrong. Full stop. Progress?does hinge on explanations. The physicist David Deutsch puts it even more bluntly in his eye-opening book?The Beginning of Infinity, “All progress, both theoretical and practical” has resulted from the “quest for good explanations.” Consequently, pragmatists that continue to enact their form of understanding-free, explanationless science can expect little more than pseudo-progress, at best unstable and at worst non-existent. In order to see why, we’ll need to first wrap our heads around what an explanation actually is. And to do that we'll make use of a slightly whimsical—some might say over-the-top and to them I would say no one asked you—allegory. Ladies and gentlemen, welcome to BoxTown.

The Instability and Mediocrity of Explanationless Progress

BoxTown was an ordinary world much like ours, until the day, the very bizarre day indeed, that the boxes arrived. Every resident woke up to a mysterious, large, dense, grey box on the front lawn of their home. Each one identical, one large opening in the front labeled ‘IN’, another in the back labeled ‘OUT’. On the side of the box was engraved a dubiously optimistic inscription:?

The Promise: With this Box any state of affairs can be improved, any problem solved. Its potential is limited only by the laws of physics.

That was it. No instruction manual, no context, no clues.

The first thing residents did to investigate was to look inside the box, peeking through its two openings, but the view proved impossibly opaque. Then they began trying to gently, and then forcefully, pry the box open, but that too proved futile. Soon, out of other ideas, filled with skepticism, but also impatient curiosity, residents began placing objects inside the box at random. Lamps, bottles of water, bicycle pumps. The box however failed to dispense anything in return. But the residents were persistent, putting more and more objects inside, until, just as the sun began to set, one of the residents, Hyun Bin shrieked, “I got something!” Nearby residents sprinted over to Hyun’s home and saw her triumphantly holding an object that she informed them had slid out of the box seconds after inserting a fresh deck of playing cards—it was a bright red swiss army knife.

Now the residents skepticism turned into unabashed excitement. Without a word spoken among them, the collective objective became clear: insert as many things in the boxes as fast as possible, in order to discover which ones generated useful things in return. And so the rush began. Balloons, plants, Pelotons, muskets, headphones, Halle & Oates albums—anything residents could find was deemed a candidate for going in the box.?

Over a course of several months, from this mad dash, residents successfully generated many useful items: from knit caps, to tennis rackets, to sociology textbooks. And every time a resident earned a positive result, he or she would document the winning intervention in a continuously growing shared list. As the list grew, so did the residents pride in the rapid progress they were making.

But soon, rumors began to emerge that not every resident was so satisfied. One day, a cadre of these dissenters marched to one of the boxes at the center of the neighborhood. The leader, an old man with a short grey beard, dark glasses, and a colorful dashiki, climbed up onto the box, with the help of his compatriots. Standing to his feet, megaphone in hand, the old man began to speak: “Good people of BoxTown, lend me your ears. We have come here today to warn you that something is wrong. Very wrong. The progress we’ve been led to believe we are making is not as it appears.” At this point, many residents stopped what they were doing and attended to the speech.?

“There are three problems with our shared list of so called?successful interventions. First, the winners are by no means universal. What works in one resident’s box does not necessarily work for everyone else’s. For instance, when Sara Bakersfield placed a 90w light bulb inside her box, out came a beautiful pearl necklace. But when my lovely wife Taraji placed the very same kind of light bulb inside our box, she received a DVD box set of the show?Friends, which is, and I’ll try to be kind here: awful. Just absolutely dreadful. I mean, how on Earth does anyone find these people funny? What’s more, there doesn’t appear to be any way to predict which boxes will deliver us the desired result and which ones won’t. We’ve long assumed that all our boxes, which look the same on the outside, are also the same on the inside. But as time goes by, that assumption seems less and less plausible.

Second, interventions that worked well in the past, don’t always continue to work in the future. When Timothy McCoy first put bananas in his box, out would come cleaning products: laundry detergent, Windex, feather dusters. But now instead he’s getting small rodents: hamsters, capybaras, chinchillas. Adorable, of course, but I’m told those SOBs bite. We believe that whatever is inside these boxes is changing over time. But again, we can’t predict when and how they’re changing and which interventions those changes might affect.

Thirdly, and this is the biggest problem of all. It concerns The Promise. Dear residents of BoxTown, you all remember The Promise don’t you? The one inscribed on the side of each of our boxes. It promises us solutions to any problem, limited only by the laws of physics. I know many find that claim to be outlandish, but we very much believe it. Many of us think the boxes hold the potential to cure diseases, to transport us to far off locations, to create extraordinary experiences that will lead us to new levels of psychological and emotional well-being. And yet here we are settling for cheap jewelry and Mr. Clean. Residents of BoxTown, we’re here to remind you: we can do better. We?must?do better."?

At this point, many in the audience begin to nod their heads in agreement, some albeit with attendant frustration. One such member yells out, “So, wise sage, what pray tell do you propose that we do?”?

The old man pauses, slowly scanning the audience which by now has grown to about eighty people. He takes a deep breath and then proclaims: “Ladies and gentlemen, we must stop this frenzied sprint to discover?what?interventions work. Instead we must slow down to understand?why?things work. In other words, we must focus our limited efforts on answering the pivotal question: what lies?inside?the box?"

The Arrogance of Effective Treatments and the Appreciation of Effective Partnerships

What exactly, you might be asking, do the large, mysterious boxes in BoxTown represent? The answer goes by many names: the underlying structure of reality, the hidden potential in nature. But we will use a term preferred by many philosophers: mechanisms. Mechanisms are the true source of all progress.

Sadly, this point is missed by pragmatists who see?treatments?as the star of the show. Indeed, when pragmatists perform an A/B test and observe a positive effect, they say that their treatment “brought about”, “caused”, or is “responsible for” that effect. But no treatment brings about outcomes by itself. Instead, treatments make their mark on the world by leveraging often hidden-from-view mechanisms. For example, vaccines induce protection against serious infection by leveraging underlying immune systems. Rockets propel into the stratosphere by harnessing unobservable laws of motion. Products and services win over markets because they tap into the psychology—unseen beliefs and desires—of users. In other words, the potential for progress that we seek generally resides in the mechanism, while treatments simply unlock that power. And so the key to progress is not, as pragmatists assume, discovering effective?treatments. It’s developing effective?partnerships. Partnerships, that is, between the treatment and the underlying mechanism.?

Locating effective partnerships, however, is no easy task. Mechanisms are often complex, discriminating, idiosyncratic. For a treatment to fit a mechanism then, it needs to take on a rare configuration, one tailor-made to match its particular shape. For instance, an mRNA vaccine must deliver?precisely?the right coded genetic message to the cells its targeting. Instructions that the cells can properly translate, producing parts of the virus and giving the immune system a low-stakes opportunity to train itself. A slight deviation, however, in those instructions might render the vaccine completely worthless.?

As a result, one’s best hope of finding these rare configurations is by possessing a deep understanding of that mechanism. Robust knowledge of its shape, its characteristics, what makes it tick. And now, finally, we’ve reached the point where we can reveal what realists even mean by this seemingly pedestrian word?explanation.?

An explanation is, in short, a description of the underlying mechanism of interest. As Deutsch defines it, it illuminates “what is there, what it does, and how and why.” In other words, explanations describe the seen (in our case, the observed treatment and resulting effect) in terms of the unseen (the underlying reality that helped bring about the effect). To put it in the context of our story: heretofore, BoxTown residents have witnessed something going into the box and then occasionally something else coming out. But an explanation would disclose what happened in-between.?

Now that we have a better grasp, I hope, of what an explanation is, it ought to be easier to see why pragmatists’ view of progress is so problematic. Without a deep understanding of the underlying mechanism to guide them, they are unlikely to find—through mere blind trial and error—the rare configurations required to develop effective partnerships.

Granted, pragmatists may find themselves at times, even if by chance, in the ballpark. In fact, any A/B test that results in a significant positive effect should signal as much. Remember the positive effect the pragmatists at Netflix observed in their The Short Game A/B test? That result ought to be interpreted as a hint that their winning image managed to tap into, albeit obliquely, a genuine and potentially powerful desire residing in the user’s mind. But sadly, pragmatists on the hunt for effective?treatments, rather than effective?partnerships, take it instead to mean they’ve reached their final destination. And so they stop far short of what could become an effective partnership, ending up with one that is depressingly mediocre.

What’s more, even if by blind luck pragmatists?do?manage to stumble into an effective partnership, there’s no guarantee it will hold up as they try to scale it. After all, mechanisms notoriously differ across populations. For example, the collective immune systems of individuals who volunteered to take part in a randomized control trial for a drug, may be significantly different than those of the general population. And so even though pragmatists observe a significant effect in their study sample, with no explanation to orient them, they can’t capably predict how well their results will generalize to the full population. In other words, if you don’t know?why?something worked in the past, you can’t know?where?it will work in the future.?

Pragmatists who conduct online A/B tests, however, often presume they’re immune from these kinds of generalization issues. Pragmatists rest assured that in their tests users are not asked to volunteer but rather drafted automatically thus mitigating selection bias. But here pragmatists have another issue to be worried about: mechanisms can easily differ over?time. Indeed, mechanisms can by altered by shifts within the mechanism itself or by changes in the surrounding environment, often breaking existing partnerships. But with no explanation to guide them, pragmatists can’t predict which kinds of scenarios would lead to such breakdowns. Sure Netflix’s winning?image for the The Short Game produced a lift in clickthrough rate this month, but what about next month (or next year) after users have endured considerably more exposures? What if Google decides to change the layout and design of its chrome browser? What if there's a mass exodus from social media, harmful 24-hour news cycles, and toxic political tribalism, causing anxiety to plummet dramatically, and thereby reducing the incentive to watch streaming content late into the night in order to drown out nagging thoughts of inadequacy and self-loathing? What if they bring back the McRib?

The point is as the world changes, and it always does, even with a deep understanding of the mechanism, it can be challenging for realists to discern what practical consequences, if any, those changes will entail. But with no understanding of the underlying mechanism pragmatists are effectively blind, in a sense at the mercy of Nature's whims. As the Nobel Laureate Angus Deaton and Professor of Philosophy Nancy Cartwright put it in their paper?Understanding and Misunderstanding Randomized Controlled Trials, when a treatment generates a positive result it only means it’s “capable of working somewhere.” Without additional justification, however, it should by no means be interpreted to mean it will work?elsewhere. For that warrant, you’ll need an explanation.?

Here Come the Realists

Realists are right: progress?does?require explanations. In a world where treatment-mechanism partnerships are the ticket, it’s only those armed with a deep understanding of the underlying mechanisms who are equipped to locate them.?

Nor should we be surprised, since explanation-led progress is in a nutshell, the story of science at large. As Deutsch puts it, “Progress that is both rapid enough to be noticed and stable enough to continue over many generations has been achieved only once in the history of our species. It began at approximately the time of the scientific revolution, and is still under way.” I think it’s difficult for 21st?century humans who, every year are exposed to new medicines, smartphones, parenting books, to appreciate that our ancestors never experienced anything close to this kind of sustained, rapid improvement. Instead, they generally died under similar standards of living to which they were born. “What were people now doing, for the first time, that made the difference?” asks Deutsch. The answer, he puts forth persuasively: systematically seeking out good explanations. Indeed, early humans were largely, in my terminology, pragmatists.?Yes, they managed to discovered stone tools, and fire, and clothes, but, crucially, they did not understand?why those things worked. Therefore they only scratched the surface of nature’s hidden potential. It was only when realists swaggered on the scene, buoyed by a culture of criticism and anti-authority, that the world began to improve rapidly and sustainably.?

It took a realist like James Watt to understand precisely how heat and water interact—William Rosen chronicles in his book?The Most Powerful Idea in the World—?before a steam engine efficient enough to power the Industrial Revolution could be invented. It took a realist like Alan Turing to create explanations of how computers could work to drive the digital revolution. Put another way, it wasn’t until humans began paying attention to what was happening?inside?the box, rather than simply what was coming in and out of it, that progress began to skyrocket.

Now that we know why explanations are so vital to progress we ought to ask: what might our?Short Game?example have looked like, had experimenters slowed down to understand?why?their image worked rather than simply?that?it worked?

Admittedly, I have no idea why this particular image performed better than the others (nor have I even seen the movie), but for the sake of example, allow me to build castles in the air:?

Imagine a number of Netflix subscribers are fathers who use the service primarily as a way to bond with their sons. Every week, leading up to their father-son movie night, dads scour Netflix for a flick they hope might bring the two closer together. When they find themselves landing on this image containing an adult and child pair, they presume the characters to be father and son. Fathers think to themselves:?this looks like the perfect movie to watch with my son. And so they click.

With this understanding of the underlying mechanism to guide them, Netflix can do better than a treatment that merely grazes that mechanism. Instead they can design one that locks in tight. Perhaps a more effective image would make it even more obvious that the characters are father and son by showing them putting their arms around each other. Or why not abandon subtlety completely and just include a big caption that reads: “Critics agree: watch this movie with your son, and it will have you both in tears.” Perhaps if Netflix believes this mechanism contains a lot of potential value, they may want to go beyond this one movie and create a “Best movies to watch with your son” content category, each movie hand selected to spur the kinds of conversations that fathers deeply want to have with their sons. Maybe Netflix can even lead a global father-son community that synchronizes their movie nights while hosting post-movie discussions.?

And these ideas, realists know, are merely the tip of the iceberg. The deeper we understand the underlying mechanism, the distinct, precise shape of users’ beliefs and desires, the more competently we can partner with them. The more rapid, durable, and generalizable progress we can expect to achieve.?

The Power of Explanatory or “Crucial” A/B Tests

How exactly do realists use A/B tests to generate good explanations? With a subtly yet substantively different conception of testing. While pragmatists setup A/B tests as a competition between rival treatments, realists create a competition between rival?explanations. It was Galileo who first popularized this type of explanation-centered experiment but as the epistemology-focused podcaster?Brett Hall?explains, it took the philosopher Karl Popper to “construct the theoretical apparatus” which taught us “how it all worked.” I like to refer to this kind of experiment as an explanatory A/B test and to see how they work, it’s back to BoxTown.??

***

“Ladies and gentleman of BoxTown, only by seeking explanations of what’s?inside the box can we ever know how to command its full potential. And to do this, we of course can’t look inside directly, we’ve already tried as much. Instead, we must take an indirect approach.

First, we must imagine. We must use our creativity to make bold conjectures about what could be in the box and how it might work. Many of these ideas will be implausible, contradicting our best existing theories of how the world works, thus we should discard. But explanations that we do judge to be viable, we must devise clever ways to falsify. And when our best explanations make conflicting predictions, there exists an opportunity to conduct the most powerful test of all, a “crucial” test, one that pits these rival explanations against each other in a way so unforgiving that by the end of the experiment only one will be left standing. When an explanation survives any of these falsification attempts it doesn’t, to be sure, make it true. But when it holds up better than its rivals, we ought to take it seriously as our best guess of what’s inside the box, our best description of reality.?

And once we know what’s inside the box, how it works, and why, we can then, at last, infer where it’s capable of taking us, and specifically what interventions will get us there. Moreover, we’ll finally know how to reliably scale these winners to other boxes, and into the future ensuring our progress can be sustained.

But we shouldn’t stop there. Even our best explanations will contain error. That’s why we must continue this tradition of criticism ad infinitum, forever seeking better and better explanations. For the better our explanations, the better our solutions—the grander the problems we can confront.??

Residents of BoxTown, it’s only by daring to understand what’s?inside?the box that we can make the kind of extraordinary progress that everyone of us—those of you who are still here, listening to my words—deep down knows is possible. It’s only through good explanations that we can achieve progress that is in principle limitless. It’s only through a commitment to ever-deepening understanding that we can have a shot, a real honest chance at fulfilling The Promise."??

Conclusion

Most modern A/B testers are pragmatists who see explanations as dispensable. But their worldview is based on a misconception, a failure to see the crucial role of mechanisms in bringing about progress. They fail to see that the key to forward movement is not effective treatments but rather effective treatment-mechanism?partnerships. Aiming to discover merely?what?works leads to mediocre, brittle partnerships, ones that that fail to harness the enormous potential residing in underlying mechanisms.?

Realists, contrastingly, understand that explanations serve as a compass, pointing them to the rare treatments necessary for building effective partnerships. What’s more, explanations guide realists in knowing how to scale these partnerships to other situations and domains, magnifying their impact. And so they dedicate their limited time and energy to seeking out good explanations. There’s another name for this kind of rigorous explanation-seeking: science. And right now, we need more of it.

Al Pittampalli的更多文章

Toastmasters Has a Problem it Desperately Needs to Address

2017年11月28日

Toastmasters Has a Problem it Desperately Needs to Address

The beloved nonprofit educational organization Toastmasters has, for over a century, aimed to turn its members into…

125 条评论