Predicting Defaulting on Credit Cards

Predicting Defaulting on Credit Cards


When customers come in financial difficulties, it usually does not happen at once. There are indicators which can be used to anticipate the final outcome, such as late payments, calls to the customer services, enquiries about the products, a different browsing pattern on the web or mobile app. By using such patterns it is possible to prevent, or at least guide the process and provide a better service for the customer as well as reduced risks for the bank.

In this tutorial, we will look at how to predict defaulting, using statistics, machine learning and deep learning. We we also look at how to explain the model itself using TSNE and Topological Data Analysis (TDA Kepler-Mapper). Finally we will look at how to APIfy the model and use it for account alerting.

Notebook here:

https://github.com/natbusa/deepcredit/blob/master/default-prediction.ipynb

Here above a TSNE plot of defaulting (red) vs. non-defaulting (blue) credit card accounts. What can we say about those cluster? Have a look at the full article!

https://github.com/natbusa/deepcredit/blob/master/default-prediction.ipynb

Synopsis

This notebook unfolds in the following phases

  • Getting the Data
  • Data Preparation
  • Descriptive analytics
  • Feature Engineering
  • Dimensionality Reduction
  • Modeling
  • Explainability

Modeling (Random Forest vs Boosted Trees vs Deep Learning vs ...)

I will compare the predictive power of four bespoken classes of algorithms

  • Logistic regression (scikit-learn)
  • Random Forests (scikit-learn)
  • Boosted Trees (xgboost)
  • Deep Learning (keras/tensorflow)

Interpretability and Explainability

Understanding why a given model predicts the way it predicts, it probably just as important as achieving a good accuracy on the predicted results. I will present 3 methods which can be used to interpret and explain the result and gain a better understanding on the dataset and the various data clusters.

We will use the latent space of an inner (second last) neural network layer as the start for our analysis. The will will apply Extract activations from an inner layer of a neural network

  • Apply TSNE on the activation data for dimensionality reduction
  • OPTICS for variable density clustering
  • Kepler Mapper using the calculated TSNE lens for data topology analysis


About Natalino Busa:

Currently available for projects. Contact me via linkedin or [email protected]

  • Principal Data Engineer
  • Principal Data Scientist
  • Head of Data Analytics

Principal Data Scientist, Director for Data Science, AI, Big Data Technologies. O’Reilly author on distributed computing and machine learning.

Natalino is proficient at the definition, design and implementation of data-driven financial and telecom applications and AI/ML driven data pipelines. He has previously served as Enterprise Data Architect, Principal Engineer at ING in the Netherlands, focusing on fraud prevention/detection, SoC, cybersecurity, customer experience, core banking processes, personalized marketing and ML/AI operational optimization.

Prior to that, he had worked as senior researcher at Philips Research Laboratories in the Netherlands, on the topics of system-on-a-chip architectures, distributed computing and compilers. All-round Technology Manager, Product Developer, and Innovator with 15+ years track record in research, development and management of distributed architectures, scalable services and data-driven applications.

Massimo Albani

AI & Data passionate. Using technology to create value.

7 年

Interesting. Very professionally and thoroughly done. Thanks for sharing it, it is inspiring. P.S. Did I miss the model APIfy part?

回复

Very well written notebook very clear approach, thanks. Have you tried downsampling or upsampling techniques as it is a very unbalanced dataset predicting all the samples to be nondefault would bring 78% accuracy on the training dataset.

Antoine Jeannot

Lead Data & Software

7 年

Very inspiring and interesting, thanks for sharing!

Prof. Dr. Gerhard Hellstern

Banking, Data Science and Quantum Computing (Qiskit Advocat)

7 年

Great work ! I like it ..Especially using Deep Learning

要查看或添加评论,请登录

Nate Busa的更多文章

  • Full-Stack RAN and AI

    Full-Stack RAN and AI

    A new era for telecom Telecom is changing rapidly. With the demand for faster and more reliable networks, Radio Access…

    2 条评论
  • The AI Renaissance of 2025

    The AI Renaissance of 2025

    As we begin this year, I’d like to share a hopeful perspective on why AI is poised to become more impactful and…

    3 条评论
  • CTO life: how to hack it.??

    CTO life: how to hack it.??

    Today, I am reflecting on the role of a CTO. I’ve come to appreciate it as much more than a technical position—it's a…

    4 条评论
  • Data Science powered APIs with Jupyter

    Data Science powered APIs with Jupyter

    Last year, in august I had the pleasure and the honor to present at the first Jupyter conference in New York…

    1 条评论
  • The AI scene in the valley: A trip report

    The AI scene in the valley: A trip report

    A few weeks back I was lucky enough to attend and present at the Global AI Summit in the bay area. This is my personal…

    7 条评论
  • Data Science: Q&A

    Data Science: Q&A

    I was kindly asked by Prof. Roberto Zicari to answer a few questions on Data Science and Big Data for www.

    1 条评论
  • AI Q&A: Natalino Busa

    AI Q&A: Natalino Busa

    In preparation to my next talk at the Global Artificial Intelligence(AI) Conference on January 19th, January 20th, &…

  • Looking Back 2016, Looking Forward 2017

    Looking Back 2016, Looking Forward 2017

    2016 has been simply incredible. What you will read next is a summary of my journey last year.

    1 条评论
  • The Data Science Singularity

    The Data Science Singularity

    This year I have been so kindly invited for a keynote talk at Big Data Spain which will be held in Madrid 17-18 of…

    3 条评论
  • Containers as a Service: Swarm vs Kubernetes vs Mesos vs Fleet vs Yarn

    Containers as a Service: Swarm vs Kubernetes vs Mesos vs Fleet vs Yarn

    Containerized applications allow a better utilization of resources with less middleware with respect to the well known…

社区洞察

其他会员也浏览了