Predicting Defaulting on Credit Cards
When customers come in financial difficulties, it usually does not happen at once. There are indicators which can be used to anticipate the final outcome, such as late payments, calls to the customer services, enquiries about the products, a different browsing pattern on the web or mobile app. By using such patterns it is possible to prevent, or at least guide the process and provide a better service for the customer as well as reduced risks for the bank.
In this tutorial, we will look at how to predict defaulting, using statistics, machine learning and deep learning. We we also look at how to explain the model itself using TSNE and Topological Data Analysis (TDA Kepler-Mapper). Finally we will look at how to APIfy the model and use it for account alerting.
Notebook here:
https://github.com/natbusa/deepcredit/blob/master/default-prediction.ipynb
Here above a TSNE plot of defaulting (red) vs. non-defaulting (blue) credit card accounts. What can we say about those cluster? Have a look at the full article!
https://github.com/natbusa/deepcredit/blob/master/default-prediction.ipynb
Synopsis
This notebook unfolds in the following phases
- Getting the Data
- Data Preparation
- Descriptive analytics
- Feature Engineering
- Dimensionality Reduction
- Modeling
- Explainability
Modeling (Random Forest vs Boosted Trees vs Deep Learning vs ...)
I will compare the predictive power of four bespoken classes of algorithms
- Logistic regression (scikit-learn)
- Random Forests (scikit-learn)
- Boosted Trees (xgboost)
- Deep Learning (keras/tensorflow)
Interpretability and Explainability
Understanding why a given model predicts the way it predicts, it probably just as important as achieving a good accuracy on the predicted results. I will present 3 methods which can be used to interpret and explain the result and gain a better understanding on the dataset and the various data clusters.
We will use the latent space of an inner (second last) neural network layer as the start for our analysis. The will will apply Extract activations from an inner layer of a neural network
- Apply TSNE on the activation data for dimensionality reduction
- OPTICS for variable density clustering
- Kepler Mapper using the calculated TSNE lens for data topology analysis
About Natalino Busa:
Currently available for projects. Contact me via linkedin or [email protected]
- Principal Data Engineer
- Principal Data Scientist
- Head of Data Analytics
Principal Data Scientist, Director for Data Science, AI, Big Data Technologies. O’Reilly author on distributed computing and machine learning.
Natalino is proficient at the definition, design and implementation of data-driven financial and telecom applications and AI/ML driven data pipelines. He has previously served as Enterprise Data Architect, Principal Engineer at ING in the Netherlands, focusing on fraud prevention/detection, SoC, cybersecurity, customer experience, core banking processes, personalized marketing and ML/AI operational optimization.
Prior to that, he had worked as senior researcher at Philips Research Laboratories in the Netherlands, on the topics of system-on-a-chip architectures, distributed computing and compilers. All-round Technology Manager, Product Developer, and Innovator with 15+ years track record in research, development and management of distributed architectures, scalable services and data-driven applications.
AI & Data passionate. Using technology to create value.
7 年Interesting. Very professionally and thoroughly done. Thanks for sharing it, it is inspiring. P.S. Did I miss the model APIfy part?
Project Manager
7 年Very well written notebook very clear approach, thanks. Have you tried downsampling or upsampling techniques as it is a very unbalanced dataset predicting all the samples to be nondefault would bring 78% accuracy on the training dataset.
Adriano Batista Prieto, MSc. Diego Rodrigues
Lead Data & Software
7 年Very inspiring and interesting, thanks for sharing!
Banking, Data Science and Quantum Computing (Qiskit Advocat)
7 年Great work ! I like it ..Especially using Deep Learning