Mortgage Retention Case Study

Mortgage Retention Case Study

...or how we identified £750 million pa of mortgages that are going to be repaid in the next 30 days….

A Bank had a huge home loans or “mortgage” business in the UK with a book of around £100 billion of assets, around half of which can be repaid by the borrower at any time without any sort of prepayment penalty. The repayment of these “free to go” mortgages presents a real opportunity for the Bank. A great proportion of these repayments represent customers remortgaging with another provider. It turns out that for some reason many of these people had not really considered remortgaging with the Bank. If we could identify those who were about to repay their mortgage, and if we could intervene in some sense, we would have a chance of keeping a good proportion of those customers on the Bank’s mortgage book, albeit on a different mortgage product.

It is not that difficult to identify which customers will repay their mortgage early. There are a number of strong demographic indicators for this behaviour, but it is hard to identify when they will repay within a tight enough time window for the bank to be able to intervene meaningfully. If you make your prediction too far ahead of time then you risk an irrelevant intervention. If you can only predict close to the event itself, they will have already made their remortgage decision. It was thought that a time horizon of around 30 to 60 days would be an appropriate range. Anyone who has built a prediction model will recognise that this is a tight time window to work in.

The main innovation that the team brought to the modelling of this repayment process is to think about sequences of customer interactions leading up to repayment events a bit like sequences of words in a simple language. These interactions are a bit like a communication between the customer and the bank where order and timing matters. As such we were able to compile a sequence of interactions (eg: you have been into a bank branch; you have made an enquiry about your mortgage account etc) and implied events (eg: you have been paid a bonus; you have changed jobs etc) to which then we could apply methodologies taken from natural language processing. These are effectively dimensionality reduction techniques which allowed us to combine that information with broader demographic data (age, gender income, credit rating etc) and market context (changes in interest rates etc) and then model differences between windows of activity leading up to repayment events and windows not leading up to repayment events.

It turns out that incorporating sequencing into the model is pretty powerful. The model offered a 10 x uplift in precision over the baseline which equated to around about £750 million of mortgages in a typical year.

So you’d think that would be a great result wouldn't you? Well it doesn't always work out like that in a large organisation. To begin with some predictive modelling was already being done for the mortgage business by our customer communications team and they didn't seem that pleased when we turned up uninvited with our model.

We agreed that that team could run tests to validate the model. Although we were well aware that we may have ruffled some feathers, we were confident enough in our technology to believe that the benefits it would bring would eventually be recognised. But when they insisted that the model be tested against “live” data (rather than the comprehensive set of historic data that we had trained the model on), we seemed to lose a little bit of control in the process. We were in a political not a scientific situation, and we were a little underprepared.

Be aware that subtleties in experimental design can have a very great effect upon the outcome of a test. I guess you know this already. What surprised us was that many people within corporations, although they are quite used to the terms of statistical analysis ( p-values, normal distributions etc) can be very inexperienced when it comes to understanding the underlying assumptions of what they're doing and how they should structure their test. So it turns out that there were two very significant problems with the tests that our model was subjected to. Firstly, somehow they used the target value as a kind of predictor in their analysis. And secondly, they bundled the predictive power of the model (where the model identifies people who were going to repay within 30 days) with the effectiveness of the intervention (do those people once contacted in fact remortgage with the Bank). The combination of these two problems killed the experiment even though the model continued to be highly predictive. Just being accurate wasn’t enough.

So what are the learnings here?

On a technical basis we learnt that by thinking of sequences of events as similar to language allows us to increase precision very significantly compared to a standard model based on a cross sectional data set.

However on an organisational note it was clear that we had put too much trust in the performance of the model simply assuming that the results would speak for themselves. We should have put much more effort into keeping the incumbent analytics guys on side so that they were less inclined to try to sabotage our work. And we perhaps should not have assumed that the basics of experimental design are well understood by those who run tests in a large corporation.

Edosa Odaro

AI | Value | Advisor | Data | Author | LinkedIn Top Voice | Board NED | Keynote Speaker

4 年

Thanks for posting and will be on the lookout for them all as they come...

回复
Victor Paraschiv

Entrepreneur, engineer, scientist.

4 年

Good summary Harry. Would you be able to think of a Data Transformation Strategy that could fix in the organisation you are describing? It would dovetail very well with one of your previous articles about digital transformation.

回复
Haider Hayat

Head of Data & AI

4 年

I’m currently working on predictive maintenance project that is quite similar in the sense that it requires the target variable to be predicted within a tight enough window to be useful. Also using sensor anomalies and system generated alarms, where the sequence and timing boosts the predictive power. Never really looked at it from the perspective of treating it like natural language. Would love to understand more about this NLP crossover. Perhaps we can have a chin wag some time?

Tim Latham

Head of Data Intelligence

4 年

Hi Harry, I'd love to connect as an old JLR alumni myself now working in the AI field. It sounds like you've got a great team there (and Tata Consulting to call on if required) but if you'd be interested in exploring how explainable DL models might be able to squeeze some further predicituve capabilities out of your data I'd love to have a chat. I have a partner doing some cutting edge work supporting banking and other enterprise teams.

回复
David Hampton

Partner at Advanced Analytics Solutions LLP

4 年

Great story, Harry. The NLP analogy was very interesting and the insights about managing stakeholders rang very true. I remember seeing a presentation at a conference from a Very large UK mobile fone network boasting that they could predict precisely which customers would leave. I asked why then, did they cut me off because I refused to make a pre-payment for my unusually large phone bill - so they lost a business customer who had spent thousands with them over a 15-year period. She admitted that sometimes Finance don't give a monkey's what the prediction model says! I went to EE as soon as my contract allowed, taking my whole family with me, and have had no trouble with them.

要查看或添加评论,请登录

Harry Powell的更多文章

社区洞察

其他会员也浏览了