The vibes about A/B testing are wrong: Why the backlash is a big mistake
There's a wave of anti-A/B testing sentiment in the air. More leaders are talking about the need for taste-based decision-making; many are actively deriding testing as a crutch that stifles creativity and sidesteps leadership.?
They point to tweetable statements like:
As a data leader that’s led large-scale testing initiatives, here’s my take: of course taste is important for decision making; you were kidding yourself if you ever thought otherwise.?
But that doesn’t mean you should throw the baby out with the bath water. A/B testing is an invaluable tool that enables your company to really learn and scale, by systematically separating the wheat from the chaff. You’re also kidding yourself if you think otherwise.
Let’s start by unpacking what’s gone wrong with experimentation.
Where A/B testing goes wrong
Testing gets a bad rap for two reasons:?
How testing gets misappropriated for strategy
In the early days at Uber, the company was laser-focused on driving trip volume. This was beautiful in how easy it was to measure and to communicate. But here’s the problem: "more trips" isn't a strategy; it's a metric.?
Focusing on trips led us to double down on creating trips at any cost, for example with short and cheap trips in products like Uber Pool. That’s different from building repeated, high-quality experiences, and led to a tradeoff that nobody intended to make: short-term growth vs. long-term customer value.?
The problem wasn’t our testing — it was the lack of coherent strategy. When this happens in your organization, data leaders need to have a hard conversation. If you’re being asked to ‘test what features users want’, that’s a sign to push back. What kind of experience are teams trying to create? What’s the long-term product vision??
It may feel uncomfortably close to saying, “You need to do your job first, so I can do mine.” But the strategy needs to come first, and then A/B testing can help you test your path toward it.?
There’s a helpful analogy here: A/B testing isn’t going to tell you what hill to climb. Rather, once you pick a hill, A/B testing will help you find your way to the top.?
How teams screw up A/B testing
I’ve also seen the call coming from inside the house — tests are developed, run, or reported on poorly. The problem is that if testing isn’t executed well, then business leaders will justifiably ask, what’s the point?
There are a few common ways teams fall down on testing. These are obvious in hindsight, but easy to trip up on practice:
1/ Not accounting for seasonality
Teams sometimes run tests during atypical time periods and fail to contextualize the results accordingly. For example, a promotion test that happens to fall over a holiday might show spectacular conversion rates that would be impossible to maintain year-round. If you then extrapolate those results as if they're representative of normal conditions, business stakeholders immediately start questioning your judgement.
2/ Missing long-term effects
Short-term metrics frequently hide longer-term consequences that aren't captured in the initial testing window. A feature might drive an immediate conversion uplift of 15% — while simultaneously increasing negative reviews or return rates that only become apparent weeks later. Without implementing proper longer term measurement, these tests can lead to features that optimize immediate results while quietly eroding brand equity and customer lifetime value.
3/ The sum of the parts is just too big
When multiple teams run parallel experiments, each claiming significant improvements, the reported combined impact often exceeds what seems possible. I’ve seen it firsthand: six teams each claim 5% improvements, while the entire business only grew 20% during that period. There are lots of reasons this can happen (we won’t unpack those here) — and it’s critical that data leaders be careful about rolling these effects all together. Because again, claiming nonsensical victories creates serious credibility issues.?
Six common-sense practices for strong A/B testing
After seeing both the successes and failures of testing programs firsthand, I’ve found these six best practices make the difference between testing that drives good decisions and testing that drives skepticism.?
1/ Define strong success metrics
When your experiments don’t have predefined success criteria, you’ll likely end up cherry-picking whatever looks good in the data. Establish 1-2 primary metrics that directly tie to your hypothesis, along with several secondary metrics to catch potential negative impacts in other areas.?
So if your primary metric is conversion rate, track secondary metrics like time-on-page, user satisfaction scores, and 30-day retention to ensure you're not creating downstream problems.?
2/ Write down a clear hypothesis
Vague experimentation without clear direction wastes resources and creates confusion about what insights to extract. Instead of approaching tests with a generic "Let's see what happens if we change X" mindset, frame each experiment with a specific hypothesis: "We believe changing X will improve Y because Z."?
For example, "We believe showing fewer search results per page will increase conversion because it reduces cognitive load for customers." This structure forces teams to articulate their reasoning and creates natural guardrails for interpretation when results come in.
3/ Run tests long enough
The pressure to move quickly often leads teams to cut testing windows short, missing critical medium and long-term effects. Make sure your tests have sufficient time to capture the full impact of your changes, particularly for features that might influence customer behavior patterns over time.?
Consider that a pricing change might show an immediate uptick in conversions — but lead to decreased customer lifetime value that only becomes apparent after a few months.?
4/ Share both positive and negative results
The nature of experimentation means some hypotheses won’t pan out. But don’t sweep failed tests under the rug — create a culture where every test is valued as a learning opportunity.?
Such as: "Our test to simplify the checkout flow actually decreased conversion by 2%, teaching us that users value security indicators more than we expected." This transparency builds credibility with stakeholders and creates institutional knowledge that prevents teams from repeatedly testing bad ideas.
As Ramesh Johari explained on our podcast, High Signal, this is critical to becoming what he calls “a self learning organization”.
5/ Haircut appropriately
No test exists in a perfect vacuum, and pretending otherwise undermines trust. Acknowledge when a test might be impacted by external factors, and apply appropriate "haircuts" to results when reporting up the chain.?
For instance, "We saw a 10% improvement, but since it was during our peak season, we're conservatively estimating a 5% annual impact." This honest approach builds confidence in your reporting and establishes your team as trustworthy partners rather than metric chasers who don’t understand business context.
6/ Lastly: Run your program tightly
When different teams use different methodologies and reporting approaches, it’s impossible to compare results across experiments — and easy for the entire testing program to lose credibility. Instead, ensure all teams use the same standardized approach to measuring and reporting impact.?
For example, require every test report to include both the relative improvement ("conversion improved by 12%") and the absolute change ("from 5.2% to 5.8%") to prevent discrepancies between how results are interpreted by business stakeholders.
Running a tight, high-quality testing program is such an easy way to drive significant impact and credibility in your organization.
Striking the balance: taste for strategy, testing for tactics
Back at Uber, when the company finally figured out that we needed to be more deliberate about our strategy, we had a realization: a ‘trip isn’t just a trip’. Not all trips are created equal, and we shouldn’t indiscriminately optimize for trips alone. We needed to define a comprehensive strategy, and then a basket of metrics that reflected those goals.?
Despite leaders sometimes complaining that A/B testing encourages focusing on the numbers rather than the big picture, that’s actually the whole point. It just needs to happen within the right context:
Once you’re using the right metrics, A/B testing can do what it’s supposed to: enable more reliable (and much easier) tactical decision-making. It allows teams to quickly, independently, and consistently learn what works and make decisions accordingly — allowing a large organization to get far more done.
So the next time you hear leaders criticizing A/B testing, listen carefully to what they’re actually saying. It doesn’t need to be an OR here, it should be an AND. Testing is a compass, not a roadmap. Pick the right hill, and use testing to chart the fastest path.
righteous awe for meek perfection
6 天前The amount of bot action out there is Blade Runner level and any DS these days needs to be seeing eyeballs before making decisions. Got to find that guy who pitched the jump to conclusions mat in office space and give him a call.
Analytics, Cloud & AI en Attach
6 天前Flavio Flores
Optimising Experimentation: Industry leading Expertise, Coaching and Mentorship
1 周I'd also add that there's a trend here - with experimentation moving more into product teams, this means there *is* more focus on strategic testing, business future exploration, service models, packages, really deep business questions - rather than tactical, marketing, short term, easy win low risks tests. I am hopeful about the outcome but its clear that all these areas (see pic) need to be looked after, to ensure confident and reliable decisions can be made (the whole point of doing this). What I'm worried about is how we can educate and support teams who are less experienced with testing, so they don't make the same mistakes as many of us did. Since companies often don't share experiments or their process, there's nobody there to provide independent confirmation that you are 'doing it right'.
Optimising Experimentation: Industry leading Expertise, Coaching and Mentorship
1 周Great article and covers many of the problems I see working with product teams. Even when we get past the myths and objections like "It slows us down" (speed and velocity are not the same) - we still find lots of problems. If you work back from lousy experiment output, you see that the idea or experiment design was flawed. So you go back another step and someone says "X asked us to build it this way". Back a further step and you find the idea was just 'solution bias' or 'prediction' masquerading as a test. Then back another step to find discovery & research weren't adequate or flawed. Often we find several points where there is 'relevance bleed' or the original good idea is compromised along the way. Researchm Problem exploration & statements, user stories, hypothesis, experiment design & build, analysis - things go wrong at all steps and they add up. I agree with you on governance (making sure the program is accountable, measurable and working). The biggest problem for last - statistical knowledge. Many companies and agencies don't do pre-test analysis, or figure out what sensitivity a given time period would allow. The biggest crime of all, people still stop tests when they hit significance, which destroys trust.
Chief AI Officer & Chief Data Officer @ Magalu / Luizalabs
1 周There is a difference between marketing and reality. In the real world, airbnb does a lot of a/b testing.