The Three Most Common Problems With Experimentation Programs

The Three Most Common Problems With Experimentation Programs

Experimentation and A/B testing has worked its way into practically every industry, in multiple functions, but just because you are running experiments, that doesn’t mean you’re running the RIGHT experiments. Even the most structured and disciplined experimentation programs can fall prey to three common problems.

1. TOO LITTLE BENEFIT

It’s tempting to choose experiments that you believe have the best chance of generating a positive outcome, but experimentation is not always about making the safe bets. In many cases, after you factor in the cost of running the experiment and the cost of a full rollout if it succeeds, the potential ROI may be low. It may not be negative, but it may be small compared to the opportunity cost of not deploying resources elsewhere. The reality is that every company has a limit to their capacity to run experiments and make changes, so tying up too many of those resources with tiny incremental wins can prevent you from finding the big wins.

2. TOO MUCH NOISE

Most experiments are geared towards evaluating one or two success metrics. Those metrics are run through tests of statistical significance to ideally provide a meaningful level of confidence that they worked or didn’t work. The problem is that most business experiments don’t happen in a tightly controlled lab environment. They happen “in the wild”, where many outside factors can impact the behavior of the test subjects.?

Each of these outside factors adds a certain amount of “noise” to the results, obscuring the “signal” from the experiment. If those outside factors have too much natural variation, they can prevent your experiment from having a clean result. When that happens, even though the numbers may calculate as statistically significant, that may not be the truth. Here are two common scenarios where this happens:

Scenario 1: When you can’t run an A/B test

Sometimes the complexity of a business, the cost of implementation, or other factors prevent you from running a true A/B test. In situations like this, people often look to other time periods to set their baseline for comparison, but this can be a very tricky process to get right.?

Take, for example, a streaming service trying to decide whether to “drip” a new series or release all the episodes at once for a “binge”. Functionally, they could split their audience and let one half binge while the other half waits for the drip. Practically, this would be a PR and marketing nightmare, likely creating a huge level of customer dissatisfaction. So the service has to choose whether to binge or drip. Then they have to compare those results to another series with the opposite release strategy. The problem is that every series is different, with a different audience, different appeal and different competition from other media.?Any attempt to determine the success or failure of the release strategy for a single series is bound to be overwhelmed by noise.?

(On the other hand, comparing the relative success of MANY binge releases vs MANY drip releases can yield better quality results)

Scenario 2: When the outliers throw off the results

As experimentation has become more democratized and not just the domain of pure scientists (with statistics degrees), we have become more and more reliant on the built-in analysis of the platforms we use.?Those platforms may have logic to handle outliers, but even that may not be enough to get to the truth of the test.

Imagine an online watch store that, on average sells six hundred $10 watch batteries, a hundred $100 watches, twenty $500 watches and five $10,000 Rolexes each week. If you are A/B testing for average order value, having just a few more of those Rolex buyers fall into the B group could throw the results way off.?Excluding the Rolex purchases doesn't solve the problem, as losing a single one of those sales can make hundreds of other incremental sales meaningless.

3. TOO MUCH RISK

Experiments tend to be measured on short-term success metrics, rather than long-term outcomes. This is largely by necessity, as you can’t expect to have to wait years to evaluate whether an experiment has worked or not. That said, there are plenty of times where a change can lead to a “successful” experiment but introduce far too much long-term risk to the business.?

Imagine if a consumer electronics company wanted to test whether offering a 2-year free replacement warranty (vs an existing 1-year repair-or-replace warranty) would increase sales. It’s entirely likely that experiment would show a statistically significant increase in sales, but how much risk would the company take on if they rolled out that change globally? Unless they already had great data on their product reliability and lifespan, they might risk giving back all of the gains from those higher sales and then some. And the experiment itself would do nothing to provide visibility into that risk.?

When selecting and designing your experiments, it is important to consider what additional risks they might add, and to realize that you won’t be able to measure that risk as part of the analysis of the experiment. That means you have to add scenario planning to your experimentation program, exploring:

  • What could go wrong?
  • Would we be able to fix those problems?
  • How much impact could they have?
  • What’s the worst that could happen?

Choosing the right experiments

Experimentation and testing is immensely valuable, but it is critically important to choose the right experiments to run and to design them in ways that you can feel confident (statistically and otherwise) in the results. Avoid the three problems outlined above and you’ll be in a much better position to profit from your experimentation program.?

#abtesting #experimentation #strategy #marketing #digitalproduct

Glymr can help you make the most of your experimentation program and maximize the value of all of the data in your company.

要查看或添加评论,请登录

Jeff Greenhouse的更多文章

社区洞察

其他会员也浏览了