登录查看更多内容

Predicting Defaulting on Credit Cards

Nate Busa

Director, Tech and Digital at Neom | AI @Stanford | CTO Program @Wharton | Technology Strategy, Innovation, Product Development

发布日期: 2017年5月16日

When customers come in financial difficulties, it usually does not happen at once. There are indicators which can be used to anticipate the final outcome, such as late payments, calls to the customer services, enquiries about the products, a different browsing pattern on the web or mobile app. By using such patterns it is possible to prevent, or at least guide the process and provide a better service for the customer as well as reduced risks for the bank.

In this tutorial, we will look at how to predict defaulting, using statistics, machine learning and deep learning. We we also look at how to explain the model itself using TSNE and Topological Data Analysis (TDA Kepler-Mapper). Finally we will look at how to APIfy the model and use it for account alerting.

Notebook here:

https://github.com/natbusa/deepcredit/blob/master/default-prediction.ipynb

Here above a TSNE plot of defaulting (red) vs. non-defaulting (blue) credit card accounts. What can we say about those cluster? Have a look at the full article!

https://github.com/natbusa/deepcredit/blob/master/default-prediction.ipynb

Synopsis

This notebook unfolds in the following phases

Getting the Data
Data Preparation
Descriptive analytics
Feature Engineering
Dimensionality Reduction
Modeling
Explainability

Modeling (Random Forest vs Boosted Trees vs Deep Learning vs ...)

I will compare the predictive power of four bespoken classes of algorithms

Logistic regression (scikit-learn)
Random Forests (scikit-learn)
Boosted Trees (xgboost)
Deep Learning (keras/tensorflow)

Interpretability and Explainability

Understanding why a given model predicts the way it predicts, it probably just as important as achieving a good accuracy on the predicted results. I will present 3 methods which can be used to interpret and explain the result and gain a better understanding on the dataset and the various data clusters.

We will use the latent space of an inner (second last) neural network layer as the start for our analysis. The will will apply Extract activations from an inner layer of a neural network

Apply TSNE on the activation data for dimensionality reduction
OPTICS for variable density clustering
Kepler Mapper using the calculated TSNE lens for data topology analysis

About Natalino Busa:

Currently available for projects. Contact me via linkedin or [email protected]

Principal Data Engineer
Principal Data Scientist
Head of Data Analytics

Principal Data Scientist, Director for Data Science, AI, Big Data Technologies. O’Reilly author on distributed computing and machine learning.

Natalino is proficient at the definition, design and implementation of data-driven financial and telecom applications and AI/ML driven data pipelines. He has previously served as Enterprise Data Architect, Principal Engineer at ING in the Netherlands, focusing on fraud prevention/detection, SoC, cybersecurity, customer experience, core banking processes, personalized marketing and ML/AI operational optimization.

Prior to that, he had worked as senior researcher at Philips Research Laboratories in the Netherlands, on the topics of system-on-a-chip architectures, distributed computing and compilers. All-round Technology Manager, Product Developer, and Innovator with 15+ years track record in research, development and management of distributed architectures, scalable services and data-driven applications.

Massimo Albani

AI & Data passionate. Using technology to create value.

7 年

Interesting. Very professionally and thoroughly done. Thanks for sharing it, it is inspiring. P.S. Did I miss the model APIfy part?

Benhammadi Mohamed Nassim

Project Manager

7 年

Very well written notebook very clear approach, thanks. Have you tried downsampling or upsampling techniques as it is a very unbalanced dataset predicting all the samples to be nondefault would bring 78% accuracy on the training dataset.

2 次回应

Tiago M.

7 年

Adriano Batista Prieto, MSc. Diego Rodrigues

1 次回应

Antoine Jeannot

Lead Data & Software

7 年

Very inspiring and interesting, thanks for sharing!

1 次回应

Prof. Dr. Gerhard Hellstern

Banking, Data Science and Quantum Computing (Qiskit Advocat)

7 年

Great work ! I like it ..Especially using Deep Learning

1 次回应

查看更多评论

要查看或添加评论，请登录

Nate Busa的更多文章

Full-Stack RAN and AI

2025年1月16日

Full-Stack RAN and AI

A new era for telecom Telecom is changing rapidly. With the demand for faster and more reliable networks, Radio Access…

2 条评论
The AI Renaissance of 2025

2025年1月1日

The AI Renaissance of 2025

As we begin this year, I’d like to share a hopeful perspective on why AI is poised to become more impactful and…

3 条评论
CTO life: how to hack it.??

2024年5月24日

CTO life: how to hack it.??

Today, I am reflecting on the role of a CTO. I’ve come to appreciate it as much more than a technical position—it's a…

4 条评论
Data Science powered APIs with Jupyter

2018年1月11日

Data Science powered APIs with Jupyter

Last year, in august I had the pleasure and the honor to present at the first Jupyter conference in New York…

1 条评论
The AI scene in the valley: A trip report

2017年2月7日

The AI scene in the valley: A trip report

A few weeks back I was lucky enough to attend and present at the Global AI Summit in the bay area. This is my personal…

7 条评论
Data Science: Q&A

2017年2月2日

Data Science: Q&A

I was kindly asked by Prof. Roberto Zicari to answer a few questions on Data Science and Big Data for www.

1 条评论
AI Q&A: Natalino Busa

2017年1月3日

AI Q&A: Natalino Busa

In preparation to my next talk at the Global Artificial Intelligence(AI) Conference on January 19th, January 20th, &…
Looking Back 2016, Looking Forward 2017

2016年12月29日

Looking Back 2016, Looking Forward 2017

2016 has been simply incredible. What you will read next is a summary of my journey last year.

1 条评论
The Data Science Singularity

2016年11月16日

The Data Science Singularity

This year I have been so kindly invited for a keynote talk at Big Data Spain which will be held in Madrid 17-18 of…

3 条评论
Containers as a Service: Swarm vs Kubernetes vs Mesos vs Fleet vs Yarn

2016年10月10日

Containers as a Service: Swarm vs Kubernetes vs Mesos vs Fleet vs Yarn

Containerized applications allow a better utilization of resources with less middleware with respect to the well known…

See all articles

Predicting Defaulting on Credit Cards

Nate Busa

Director, Tech and Digital at Neom | AI @Stanford | CTO Program @Wharton | Technology Strategy, Innovation, Product Development

Synopsis

Modeling (Random Forest vs Boosted Trees vs Deep Learning vs ...)

Interpretability and Explainability

Nate Busa的更多文章

社区洞察

其他会员也浏览了

Early adopter version of my book - explaining machine learning algorithms as a hidden function that maps x and y

Complete Data Science BootCamp!

To Data & Beyond Week 17 Summary

To Data & Beyond Week 3 Summary

Understanding non linearity by contrasting GLM and SVM

Heatmaps: FiftyOne Computer Vision Tips and Tricks – Oct 6, 2023

Breaking the Jargons: Issue 10

Using Generative Adversarial networks (GANs) to augment data

Artificial Intelligence #64: Statistical inference: A good way to understand the mathematical foundations of machine learning

XAI: Tabular Data with LIME

Synopsis

Modeling (Random Forest vs Boosted Trees vs Deep Learning vs ...)

Interpretability and Explainability

Nate Busa的更多文章

Full-Stack RAN and AI

The AI Renaissance of 2025

CTO life: how to hack it.??

Data Science powered APIs with Jupyter

The AI scene in the valley: A trip report

Data Science: Q&A

AI Q&A: Natalino Busa

Looking Back 2016, Looking Forward 2017

The Data Science Singularity

Containers as a Service: Swarm vs Kubernetes vs Mesos vs Fleet vs Yarn

社区洞察

其他会员也浏览了

Early adopter version of my book - explaining machine learning algorithms as a hidden function that maps x and y

Complete Data Science BootCamp!

To Data & Beyond Week 17 Summary

To Data & Beyond Week 3 Summary

Understanding non linearity by contrasting GLM and SVM

Heatmaps: FiftyOne Computer Vision Tips and Tricks – Oct 6, 2023

Breaking the Jargons: Issue 10

Using Generative Adversarial networks (GANs) to augment data

Artificial Intelligence #64: Statistical inference: A good way to understand the mathematical foundations of machine learning

XAI: Tabular Data with LIME