Stat-sig on a Shoestring Traffic Budget
Status: Early Draft
There comes a time in every company’s life when the CEO says “it’s growth time.”?Now, you’re getting ready to run lots of experiments, until you realize “actually, we don’t quite have the numbers.”
“Given our traffic, what kind of a win could we reasonably detect within a couple of weeks?”
20% minimum??That is a crazy high win.?Even 10% would be a huge winner.?Finding a 20% win is very rare.
So, what do you do - do you give up??Probably not. Instead, you find ways to run the experiments you’re planning for anyway, but with an update in methodology.?
Here’s how.
Part 1: Squint Hard Enough: If you’re only missing your stat-sig target by a bit
Take bigger swings & batch by theme
Say you have a hypothesis that positioning your product more like X is likely to help with conversion.?You come up with 4 potential changes to experiment with & see if they help.
Great.?Now combine them into one mega-experiment & run that instead.?
If you can only detect a 10% win, only take 10% swings.?When prioritizing your experiment backlog, de-prioritize (or batch) ideas that wouldn’t have a big impact if they were successful.
However, when productionizing a batched win, it’s harder to tell which part of the change was actually effective. In the near term, this is tolerable - when the company gets to a larger scale and you revisit this theme, you can always tease out the underlying causes through follow-up experiments.
Use Fewer Variants: A/B/C/D become A/B
You need a certain number of visitors to both your “winning variant” and “control” to determine the result. Every variant you add reduces the oxygen flow to your winning variant.
If you’re tight on traffic, stick to two variants.?
领英推荐
However, this might slow down your pace of learning, until you realize you can also:
Run in Parallel: A/B/C/D becomes A/B, A/C, and A/D
If you can tease out the assumption between every variant, so long as the variants are independent, you can run multiple A/B tests at once, each with two variants.?This approach won’t let you try 4 versions of a hero header, but you could factor out your header, subheader, and call-to-action change each into their separate experiment.?This way, every hypothesis gets 50% traffic, enough to see how ti performs relative to control.
However, you might run into experiment interactions (IE, C only wins when also B).?This happens from time to time; the easiest solution is to analyze the experiments together and identify interactions at that point.?After all, there’s nothing wrong with shipping both B and C.
Run tests for longer: 7 days becomes 2-3 weeks
This is perhaps the most obvious strategy; if you have more traffic, you get a larger sample size, which means you can detect a smaller win with stat-sig.
However, especially at earlier stage companies, I rarely see the discipline to actually leave an experiment running longer than a couple of weeks.?Inevitably, an executive will pop in and demand an update. “Trending positive?” they’ll say, “Great, let’s just ship it, it’s fine.”?And it is fine, except now you’ve committed the cardinal sin of peeking, your likelihood of a false positive has gone up, and you have not truly “learned” anything with confidence.
Aside: If you often find yourself shipping tests earlier than planned, consider switching to a non-frequentist (fixed-sample) methodology, such as Bayesian or Sequential statistics.
Get Comfortable with False Positives: p<0.05 becomes p<0.2
Another tolerable trade-off: in a world where a false positive is harmless, get comfortable setting your Type 1 Error tolerance (IE, how often can I live with a false positive) from its traditional a=0.05 (a 5% false positive) as high as a=.2 (a 20% false positive rate)
However, don’t use this approach for any significant changes, such as a new pricing strategy. Also, realize that your ability to trust your learnings from experiments is decreased, since there’s a greater chance your “insights” are now coming from randomness and not reality.
All of the above work so long as your required traffic is close to what you need. Other times, your traffic is off by an order of magnitude. What desperate measures do you take during desperate times?
Part 2: Desperate Measures is coming soon. Sign up at https://tinyletter.com/engineering-growth to be notified.
VC content studio owner | angel investor | former founder
1 年This is really great
Consolidate potential surfaces/ pages! Buy traffic!