登录查看更多内容

Results obtained building a predictive model for credit risk analysis

Raffaela Loffredo

发布日期: 2023年11月7日

Credit risk analysis is a key component in maintaining the health of financial institutions' balance sheets. Keeping a low default rate ensures that the loans being made are profitable. For this, the use of machine learning to build models capable of identifying patterns and predicting whether a customer may become in default has been intensified.

Clique aqui para ler esse artigo em Português.

* Note

This is a summarized article that shows the main results.

To check the full study, including the codes and methodology used, click here.

The Study

This project aimed to create a machine-learning model that predicts whether a new customer may become default.

Initial considerations

The dataset used in this project was originally made available by Nubank. It contains 45,000 records and 43 attributes.

Some issues were identified in this dataset, with the most detrimental being the imbalance in default information, as the majority of the records were non-default. However, this was expected and was duly considered at the time of constructing the predictive model.

Model?Decision

The algorithm chosen to create the prediction model, XGBoost, comes from the family of supervised classifiers of the Decision Tree type. Its acronym stands for Extreme Gradient Boosting, and it has been widely used by professionals in the field due to its high degree of precision and accuracy in model creation.

This is partly due to the large number of hyperparameters that can be adjusted, significantly improving the model's performance. It can also be applied to various types of problems across a wide range of sectors.

Performance Assessment Metric

Among the metrics to evaluate the performance of the created model, the main one was Recall, which provides the best measure for the specific problem under study. The reason is that in the case of defaults, False Negatives are more harmful to a company than False Positives. In other words, it is better for the model to err by saying that a customer is in default when in reality they are not. Making a mistake by indicating that a customer is not in default when they actually are can lead to losses for the business. With this in mind, the higher the Recall value, the better the model's performance.

Stefanie Grant 1 个月前

LGD vs PD models

Darshika Srivastava 1 年前

Part 2: Closing the Gap between AI Models and Business…

Alejandro Betancourt, Ph.D. 3 个月前

Study Development

First, a base model was created using Logistic Regression. This provided a benchmark of what an algorithm without further adjustments could achieve, resulting in a Recall of 0.0290.

Then, after standardizing and balancing the data, 7 other models were created, also with the aim of comparing Recall values. The best model at this stage was the LGBM Classifier with a Recall of 0.6562, while XGBoost was in second place with 0.6483. However, after optimizing the hyperparameters of XGBoost, this value increased to 0.6640.

In an effort to further improve the model, feature engineering was performed with the creation of 4 new variables. Once again, the base model was run (0.0513), and after standardizing and balancing the data, 7 models were recreated. This time, half of them performed better, and when there was an improvement, it was more significant than the deterioration.

Once again, the hyperparameters for XGBoost were optimized, reaching a Recall value of 0.6663, the best value found so far.

When the test data were run on the XGBoost models created, with and without feature engineering, the Recall values obtained were 0.6872 and 0.6547, respectively. This means that the model with feature engineering was 3.25% better than the model without.

To confirm the superiority of one model over the other, a z-hypothesis test was conducted, with a p-value result of 4.41e-08. This statistically confirms that the model with feature engineering indeed performs better than the one without.

Conclusion

After optimizing the hyperparameters of the XGBoost algorithm and utilizing feature engineering, a model was achieved with a Recall value of 0.6872 in tests, the best value among the 18 models created in this study. Moreover, this improved the XGBoost's own evaluation score by more than 0.325 points, as confirmed by a statistical hypothesis test.

This underscores the importance and influence of both hyperparameter optimization and the execution of feature engineering in enhancing machine learning models.

Get to know more about this?study

This study is available on Google Colab and on GitHub. Just click on the images below to be redirected.

Let's Connect!

Mauro Marsella

Head of Planning and Control at Istituto per il Credito Sportivo | MBA

7 个月

Dear Raffaela, I've just fineshed to read your amazing job on Medium.com . Really well done and informative!

1 次回应

查看更多评论

要查看或添加评论，请登录

ERC-7231: Os dados s?o meus, eu vendo se eu quiser!

2024年5月6日
Building a solution to combat Fake News with Machine-Learning

2024年1月9日
Constru??o de uma solu??o para combate de Fake News com Machine-Learning

2024年1月9日
Resultados obtidos na constru??o de modelo preditivo de análise de risco de crédito

2023年11月7日
Previs?o de demanda de vinhos por meio de análise de séries temporais

2023年10月10日
Forecasting wine demand through time series analysis

2023年10月10日
Results obtained with machine learning models for churn prediction

2023年9月5日
Resultados obtidos com modelos de machine learning para prever churn

2023年9月5日
Cambridge Analytica x Privacidade dos Dados

2023年8月29日
Afinal, quais as diferen?as entre engenheiro, analista e cientista de dados?

2023年8月1日

查看全部

Results obtained building a predictive model for credit risk analysis

Raffaela Loffredo

The Study

Initial considerations

Model?Decision

Performance Assessment Metric

领英推荐

Study Development

Conclusion

Get to know more about this?study

Let's Connect!

更多精彩文章

社区洞察

其他会员也浏览了

Credit Risk: Modelling, Strategies, and the Role of Machine Learning

Credit Risk Modelling: Expanding the Horizons with Machine Learning

Credit Risk Modeling

Credit Risk Assessment Market to Witness Huge Growth by 2030

Post-Model Adjustments: Understand The Sensitivity Of The Risk Drivers To Economic Conditions

The Ripple Effect of Poor Data Quality: A Tidal Wave of Challenges Ahead

Validating New Generation Credit Risk Models

The importance of high-quality data for risk decisions

Incorporating Contagion Risks - Portfolio Credit Risk Modelling

EBA Clarifies the Operational Application of CRR 3 in the Area of Credit Risk Modelling

The Study

Initial considerations

Model?Decision

Performance Assessment Metric

领英推荐

Study Development

Conclusion

Get to know more about this?study

Let's Connect!

ERC-7231: Os dados s?o meus, eu vendo se eu quiser!

2024年5月6日

Building a solution to combat Fake News with Machine-Learning

2024年1月9日

Constru??o de uma solu??o para combate de Fake News com Machine-Learning

2024年1月9日

Resultados obtidos na constru??o de modelo preditivo de análise de risco de crédito

2023年11月7日

Previs?o de demanda de vinhos por meio de análise de séries temporais

2023年10月10日

Forecasting wine demand through time series analysis

2023年10月10日

Results obtained with machine learning models for churn prediction

2023年9月5日

Resultados obtidos com modelos de machine learning para prever churn

2023年9月5日

Cambridge Analytica x Privacidade dos Dados

2023年8月29日

Afinal, quais as diferen?as entre engenheiro, analista e cientista de dados?

2023年8月1日

社区洞察

其他会员也浏览了

Credit Risk: Modelling, Strategies, and the Role of Machine Learning

Credit Risk Modelling: Expanding the Horizons with Machine Learning

Credit Risk Modeling

Credit Risk Assessment Market to Witness Huge Growth by 2030

Post-Model Adjustments: Understand The Sensitivity Of The Risk Drivers To Economic Conditions

The Ripple Effect of Poor Data Quality: A Tidal Wave of Challenges Ahead

Validating New Generation Credit Risk Models

The importance of high-quality data for risk decisions

Incorporating Contagion Risks - Portfolio Credit Risk Modelling

EBA Clarifies the Operational Application of CRR 3 in the Area of Credit Risk Modelling