Bayesian A-B Testing
Harry Powell
Data science leader with track record of innovation and value creation
… or how we were able to make decisions about price tests in half the time...
The Bank I worked for offers small and medium sized unsecured loans to consumers. These loans are priced as 500 discrete segments according to loan size, loan term, and credit risk. The Bank tries to optimise loan pricing by running A-B tests in various segments, where one part of the segment population is offered the original price and another is offered a test price. If the Bank offers a lower test price the question is whether volume will increase sufficiently to offset the lower margin. Similarly a higher price tests whether the higher margin will offset lower volume. This is very standard price optimisation practice
However the Bank has a problem. Consumer loans is a highly regulated industry and every A-B test must be approved by the regulator. This has two important implications. Firstly the Bank is able to run few price tests compared to an online retailer and this means that we have too little data to estimate the elasticity of demand at each price point. And secondly, because every pricing decision has a large impact on profitability, we need to be sure that it is right. But this meant having to run each price test for a long time, 3 months, in order to get statistically significant results, each test costing hundreds of thousands of pounds in lost revenue even if the test is successful. If we could reduce the time to get a statistically significant result this would have a very positive impact on profitability of the consumer loans business.
We built a hierarchical Bayesian model both to design pricing experiments and to evaluate the results. We were able to use previous experience across many segments to build the prior distribution of the hyperparameters, and we modelled profitability explicitly through the network rather than modelling take up, loan size, loan term and price independently and then combining them together. The modelling exercise turned out to be quite hard and forced us to think very deeply about the assumptions underlying a price test. Through this process we realised that much of the conventional frequentist statistical testing that had been done in the past was not valid even though at first sight it looked reasonable. We estimated the coefficients of the model using standard MCMC libraries which performed OK and we were able to present our results using nice visualisation techniques which our stakeholders found intuitive and very helpful to aid decision making.
One problem we came up against when we were prototyping was how to answer the question “why is your model better than ours”, given that both models were trying to come up with the same answer and in the limit the Bayesian approach converges to the frequentist approach. There are all sorts of rather abstract arguments that we tried, but in the end of the day we simply had to assert that in general it would come to an answer much quicker, although it is hard to say this with certainty. We were however able to use predictive power analysis to estimate the number of samples we would need to collect in order to have some degree of confidence in our recommendation, but again we found it difficult to present this information in a way that was convincing to our commercial stakeholders.
It turns out that, depending upon the prior used, Bayesian A-B testing allowed us to make a decision in around half the time, reducing the length of the test from 3 months to 1? months. This represented an enormous saving for the Bank. On the first test run (which rejected the test price in favour of the original price) we were able to finish the exercise early and this saved the Bank £300,000. Of course we still had to explain to some of the stakeholders why this was a great result given that our statistics rejected their idea!
One of the nice things about this methodology is that we have been able to hand over all the models and libraries that we built to the team that was originally doing the optimisation. After making a couple of hires and doing some training, that team is now fully able to implement and reason about a Bayesian approach to price testing. This is a real success because my team was not resourced to maintain and run models in production in the long term, and we relied upon other parts of the organisation to adopt our technologies.
We learned an enormous amount from this project
The most important learning was that we realised how much probability and statistics forget, even people with good recent quantitative degrees. They seem to lose the knowledge virtually as soon as they have taken their final year exams. Even the basic assumptions of statistical tests are often forgotten. For example it is rare for anyone to question whether the underlying distribution assumed in a test is in fact normal. I found this quite concerning and in fact put into place a statistical training program for the analytics teams to remind them of the importance of good methodology when analysing data. All of this can be quite tricky given that questioning an analyst’s statistical skill set can make them defensive. In order to drive to change one needs to on the one hand to be insistent but on the other hand, be kind. Analysts only forget statistical skills because the organisations they work for do not always value them.
The other thing that was interesting about this project is that it is often assumed that statistics don’t matter much in the era of Big Data. The reasoning goes that the law of large numbers and the central limit theorem makes anything other than the t-test unnecessary. However what we have found is that we now have the data and computational power to personalise, and so while we have a huge dataset in aggregate, we now need to make decisions about individuals, on which we may have limited data. Good analytics will always be pushing the limits of what can be inferred from a dataset, and this will very often require a deep understanding of statistical methodologies.
Value from data
3 年Sarah van der Wal Jack Sanderson-Smith
SVP, AI Research | Capital Markets
3 年Hi Harry, thanks a lot for sharing!! one question, did this sort of models fall within the model validation scope in the bank you worked for? cheers!
SVP, AI Research | Capital Markets
3 年Ioannis Bakagiannis
Data Science Manager at Skyscanner | Medium writer | Problem before Data, Data before ML
3 年From the outside, setting priors seems easy, but oh my the implications they have on the posteriors. Super cool use case. Thanks for sharing!