Capping is the low hanging fruit you should pick, unless the whales matter
Jayakrishnan Vijayaraghavan
I've mastered half a dozen hard-learned ways of what works in Data Science (and a thousand ways of what doesn’t). Happy to share, learn, and evolve!
Note: This is the second part of a series of articles discussing variance reduction techniques in A/B testing. The first part can be found here: All strata are useful, but some strata are more useful than the others | LinkedIn
When considering variance reduction techniques, there is a very simple yet surprisingly effective technique known as Capping. Capping is a technique to handle outliers and involves setting an upper and/or lower limit for the range of numbers which might represent your metric. For example, if your metric is Revenue per user, and you decide to cap your metric to $1000, the capped metric value of a user, say bringing in $5000 is still $1000.
There are a few questions which arises as a consequence of this:
Let's deal with each of the above questions in this article.
Why not just remove the outliers rather than cap the value?
Let's assume that your metric has a right skewed distribution (which is the case with most revenue based metrics), with the mode at around $10, and the mean value around $21 representing about 100K non-power users. If you have a 1000 power users, whose average purchase value is around $100, you already know that removing the "power users" from the overall data is not going to impact your metric's mean value by a lot. But you will also observe that your variance has dropped a lot.
In the above example, the mean value is not impacted a lot (~4% drop) and I'll celebrate a 45% drop in variance any day. Now let me make the case of capping the metric rather than removing the outlier.
Capping has a gentler impact on the mean (~2.6% deviation from the actual mean) and I'll still happily take the 37% drop in variance. I know I'm making my case for capping a little harder. 45% is always better than 37%, right? In the world of variance reduction, of course it is, but I still think that capping still deserves to stay in the game for the following reasons:
How to choose the capped value?
Choosing the capped value for a metric is as much art as science. Which is not lingo for "go figure it out yourself"; I come in peace, and here are some steps to guide you in choosing an appropriate capped value:
领英推荐
Are we not arbitrarily skewing the metric by capping it?
Capping a metric to a specific range can alter the original distribution of the data. If the capped value is set too low or too high, it may result in a significant distortion of the data distribution, but having outliers has worse impact on concluding significance. A few outliers are enough to cause a false stat-sig result.
Using the data points in the above example, let's say our experiment treatment has 100,000 users, where the average revenue is 20.73 => Total revenue of 100,000*$20.7 =~ $2.073M. A lift of 2% is $2,000. Sometimes (rarely) a “user” purchases double the lift amount, or around $4,000. That single user who falls into Control or Treatment is enough to significantly skew the result.
What if I truly care about the whales?
You need not be James Cameroon to care about them. Capping is a pretty effective strategy to handle outliers, except when your revenue model is dependant on the "whales". The term comes from the gaming industry, where a microscopic minority of users (<2% of users) known as the "whales", are known to bring in nearly 50-80% of all revenue. Any capping strategy will essentially underplay their revenue impact since it keeps the whales at the bay (you can appreciate me for the wordplay later). In the above example, let's assume that the average revenue per user brought in by the power users (in this case, "the whales", is around $1000). See how the mean value changes between the capped distribution and the overall distribution.
The deviation from the actual mean is too big to ignore, whatever variance reduction it provides.
In short, do a bit of EDA to identify and align on an appropriate capping strategy for the key metrics your team/org cares about. This provides an easy and efficient way to handle outliers, reduce sample size requirement, and reduce false positives. Unless your business model thrives on the whales.
Happy Friday!
I've mastered half a dozen hard-learned ways of what works in Data Science (and a thousand ways of what doesn’t). Happy to share, learn, and evolve!
1 年PART I :?https://www.dhirubhai.net/pulse/all-strata-useful-some-more-than-others-jayakrishnan-vijayaraghavan PART III:?https://www.dhirubhai.net/pulse/cuped-what-you-know-before-experiment-matters-much-vijayaraghavan/