Most read articles on A/B testing

Most read articles on A/B testing

While my book contains my best work and my latest takes on statistical issues in A/B testing, the many articles I've written over the years remain a treasure trove for those willing to delve into A/B testing stats. Here I'll share a list of them, sorted by most read to least read (in 2019), with brief notes on each:

  1. Statistical Significance in A/B Testing – a Complete Guide - unsurprisingly the most important concept for any statistical analysis gets the most attention from readers. If you want to understand p-values, significance thresholds, statistical significance, how these help us quantify uncertainty and what caveats you should be aware of while using them, it is a solid read. doubtedly the
  2. One-tailed vs Two-tailed Tests of Significance in A/B Testing - a crucial reading for anyone looking to refine their use of significance tests and confidence intervals. It can help you improve the efficiency of your A/B tests (if you were using two-tailed ones so far) and to make sure the statistical hypotheses you use align with the business questions being posed. I've refined my arguments and exposition over the years and even spun off a separate website (onesided.org) where the more scientifically curious can dig into more than a dozen articles dedicated to various aspects of the topic.
  3. Confidence Intervals & P-values for Percent Change / Relative Difference - happy to see this one so high up, as the issue of using a statistical model for absolute change while inferring about relative change is, I believe, as rampant as it has ever been. Looking at this numbers, I hope more people today realize the need to use the correct statistical model when speaking about lift, percent improvement, etc.
  4. Statistical Significance for Non-Binomial Metrics – Revenue per User, AOV, etc. - more and more CROs realize that using various Conversion rates as KPIs is no good when there is variability in AOV. While continuous metrics do pose some complications, metrics like Average Revenue Per User (ARPU) provide for much more informative A/B test outcomes.
  5. The Importance of Statistical Power in Online A/B Testing - statistical power, naturally taking its place just below a bunch of articles dealing with type I error. While the type II error which limits the number of false negatives is the less important of the two, it is still crucial to understand it. It is also the more difficult concept. I see many misconceptions and misunderstandings related to a concept tightly related to power - the minimum detectable effect (MDE) which I dub minimum effect of interest (MEI) or minimum reliable detectable effect (MRDE) in an effort to curb those.
  6. Running Multiple A/B Tests at The Same Time: Do’s and Don’ts - a lot of intuitions and writing about the running of concurrent A/B tests are plain wrong and this article explains why. I wish it also provided some easy solutions, but there are none, as far as I'm aware.
  7. The Bane of AB Testing: Reaching Statistical Significance - I'm obviously biased, but I'm yet to see an easier to grasp explanation of the issue of peeking at the outcomes of an A/B test while using a statistical model which does not permit that ("unaccounted peeking with intent to stop" is the term I deem accurate enough, though it is a mouthful). Hopefully in 2020 we'll see less peeking and more properly run sequential tests (as in tests with proper sequential monitoring of data such as AGILE).
  8. The Case for Non-Inferiority A/B Tests - as far as I'm aware, this is the first public argument for using non-inferiority hypotheses in online A/B testing made at length. IMO these are still underused, and so are what I dub 'strong superiority tests'. The post is accompanied by a short white paper, available for free.
  9. A/B Testing with a Small Sample Size - written in 2019, it still made the top 10, which is not surprising given it tackles the old foe of all optimizers - small sample size. Did I do a good enough job? I think so, but, as with other complex problems, I can't give you any easy and universally applicable solutions.
  10. Multivariate Testing – Best Practices & Tools for MVT (A/B/n) Tests - given their ubiquitous use, I'm surprised A/B/n tests are not more popular. I don't really get into any particular methods here (e.g. Dunnett's correction) but the article tackles several common myths and misconceptions about MVTs floating around the CRO world.
  11. Should you do A/A, A/A/B or A/A/B/B tests in CRO? - in this early article of mine I deal with a few misconceptions about the way one can use an A/A, A/A/B, A/A/B/B etc. tests in CRO. This one will benefit from an expansion, and I'm especially grateful to Chad Sanderson for expanding my views on this issue. Would you like to see a new article on the topic?
  12. The Google Optimize Statistical Engine and Approach - the steps back made by Google Optimize compared to earlier Google A/B testing products in terms of statistics are illustrated by the extremely poor documentation for the statistical engine used in Optimize which I discuss in this installment. I mean, even calling it 'documentation' is stretching it. Hopefully things will improve in the next generation of products.
  13. 5 Reasons to Go Bayesian in A/B Testing – Debunked - tempted by the promises of Bayesian methods in A/B testing? I recommend you read this before you succumb to those temptations. P.S. Still zero responses to this article by proponents of Bayesian inference, I wonder why?
  14. Analysis of 115 A/B Tests: Average Lift is 4%, Most Lack Statistical Power - curious of what others are doing with A/B tests? Check out this insightful meta analysis of 115 A/B tests from the database of good friend Jakub Linowski from GoodUI. It has its limitations, but I'm not aware of a work with similar scope and depth.
  15. Improving ROI in A/B Testing: the AGILE AB Testing Approach - it's been three years since its debut and AGILE A/B testing is gaining good amounts of traction. Its efficiency and flexibility, combined with statistical rigor remain its strong points. Furthermore, I think it is the only frequentist sequential testing solution which has easy to use tools available to the public (meaning you don't need to do your own coding in R to get it to work).
  16. Efficient AB Testing with the AGILE Statistical Method - a step-by-step guide for applying AGILE A/B Testing. Works best if you also have the tool open in another tab.
  17. Costs and Benefits of A/B Testing: A Comprehensive Guide - fundamental reading if you want to understand the role of A/B tests as tools for managing business risk. (has it been 3 years since I wrote this??? feels like yesterday)
  18. Bayesian AB Testing is Not Immune to Optional Stopping Issues - for those on the Bayesian train who believe that just because they use Bayes' rule their inferences are immune to the effects of optional stopping / peeking. Nope, they are not, and I'm not aware of an easy way to account for the problem. I'm aware of the hypothetical solutions put forth by A.Gelman, but I'm yet to see them in operation.
  19. Risk vs. Reward in A/B Tests: A/B testing as Risk Management - want to run A/B tests which strike the balance between risk and reward for your clients (internal or external)? Ever wondered how to select a proper significance threshold and how to choose an optimal sample size for your test. This is a great starting point. The article contains a bunch of original research. It also has a not so hidden challenge to Bayesians or would-be Bayesians, but its been three years and no one has brought it up, yet. Do you see it?
  20. Representative samples and generalizability of A/B testing results - I'm not all about statistics, you know. While good statistics will ensure good internal validity, external validity / generalizability is of equal importance to the practical application of online controlled experiments. I think this is the first methodical exploration of the issue. It should definitely be in the top 10 if we're going by importance of the topic.

While there are many more articles I can add to this list, some of them were written in 2019 and so the ranking wouldn't be fair. Still, I'll add some quick honorable mentions:

Why Every Internet Marketer Should Be a Statistician - my first article on A/B testing statistics, still good, but can certainly be improved upon.

Futility Stopping Rules in AGILE A/B Testing - if only more people understood the utility of futility stopping, A/B testing would be much better off.

Designing successful A/B tests in Email Marketing - there are issues specific to A/B testing of emails, and I address some of them here.

20-80% Faster A/B Tests? Is it real? - a detailed comparison of the efficiency of AGILE A/B testing with the simpler fixed sample size models.

The A/B Testing Guide to Surviving on a Deserted Island - I was accused of click-bait for using this title. Is it click-bait if you deliver on it?

Affordable A/B Tests: Google Optimize & AGILE A/B Testing - give this a read if you are looking for an affordable yet high quality A/B testing setup.

Inherent costs of A/B testing: limited risk results in limited gains - of course no one reads this. I mean, who wants to hear about the costs of A/B testing? Give me the juicy wins, costs are for someone else to take care of!

Does Your A/B Test Pass the Sample Ratio Mismatch Check? - written late in 2019 (Nov 15) this one didn't get enough reads to make it into the top, but given the importance of the topic, I'm sure it will rise up in the 2020 rankings.

I hope you like this brief reading list on statistics in A/B testing. Is there a glaring omission here? Any topic you think is of vital importance to the successful application of statistics in A/B testing that you can't find in the above list? Just let me know!

Elliott Golden

Leading product & helping progressive causes gain fired-up supporters

5 年

Epic list. Thanks for sharing!

Gergana Tyaneva

Data Analytics Lead I 10+ Experience in CRO, Product Analytics, Web Analytics

5 年

I am not sure if that has been tackled, but I think the topic of measuring delayed or long-term effects is quite interesting. Thanks for sharing!?

要查看或添加评论,请登录

Georgi Georgiev的更多文章

社区洞察

其他会员也浏览了