登录查看更多内容

Building Random Forest Scorecards

Denis Burakov

发布日期: 2023年9月20日

In the lending industry and credit risk research, a risk practitioner can often encounter Weight-of-Evidence logistic regressions (WOE LR) or various versions of gradient boosted trees (GBDT) used for standardized credit risk assessments. However, implementations of Random Forests for Probability of Default (PD) scoring are not as common, which can be partially attributed to the algorithm's inherently random nature, which may not be well-suited for predictive modeling in high-risk domains, especially perceived as such by regulatory authorities.

To the best of the author's knowledge, there have been no attempts to incorporate Random Forests into scorecard methodologies. To this end, this post aims to bridge this gap by demonstrating how Random Forest scorecards can be developed and highlighting several potential benefits they offer for the risk model development and validation process.

Random forest scorecards

Random Forest Scorecard (2 trees per forest)

Famous gradient boosting algorithms such as XGBoost or LightGBM include several types of base learners, including linear trees and random forests. In case of linear trees as base estimators, the idea is to estimate a linear regression within each leaf of a decision tree rather than use a constant value in a leaf. Although using this feature can boost performance in some scenarios, it can also break monotonicity of the risk score regardless of monotonic constraints and is much more difficult to interpret.

In contrast, the utilization of random forests as base learners remains relatively unexplored, with even official XGBoost documentation providing limited insights. According to the documentation, a parameter named num_parallel_tree allows the use of a combination of trees (a forest) instead of a single decision tree. Notably, when both num_parallel_tree and num_boost_round (the number of iterations) exceed 1, training incorporates a blend of random forests and gradient boosting.

This interesting combination allows to construct random scoring sub-models that are incorporated into a larger model. To further enhance interpretability and readability of the model outputs, these sub-models and their components can be converted into a scorecard format. This approach is particularly useful when crafting 2-level models, where each tree can perform at most two splits (max_depth=2).

Below is an example featuring four trees forming the first two random forests within the scorecard. In XGBoost's implementation, each unique tree produces log-odds scores, which are subsequently aggregated into a forest by means of summation of margins (please see the technical appendix for more details). We then transform these log-odds scores into a traditional scorecard that assigns higher scores for good risk and lower scores for bad risk accordingly.

Random Forest Scorecard (2 forests with each tree having a depth of 2)

Practical applications of Random Forest scorecards

One can think of several practical applications of this modeling technique, including:

Feature engineering: By applying random subsampling, each tree within a Random Forest model is trained only on subsets of the original data. This capability of decorrelating trees is invaluable for feature engineering leading to more robust and predictive risk models.
Segmentation: Random Forest scorecards can uncover hidden patterns within the underwriting data. This data-driven approach empowers risk practitioners to fine-tune credit policies and discover unique customer risk segments thus optimizing risk management strategies.
Model validation: When conducting model validation exercises and other model risk management tasks, Random Forest scorecards offer a versatile toolkit. They facilitate the creation of various model challengers, making it easier to identify opportunities for model improvement and robustness enhancement.

领英推荐

Credit Risk Modelling: Expanding the Horizons with…

Shivam Mishra 7 个月前

Elevating Risk Intelligence: RAG’s AI-Driven Overhaul…

Suyash Sharma 1 个月前

Yield Curve Interpolation Techniques: Understanding…

Prateek Yadav 4 个月前

In the example below, we illustrate how such a scoring model can be employed for risk management purposes. Notably, Tree 4, 5, and 8 stand out with individual Gini scores exceeding 50%.

Gini scores of a Random Forest scorecard components

By aggregating the scores from these influential components, we can achieve a Gini score of 70%. This composite score retains 93% of the rank-ordering power of the final model, as indicated by the last bar. Arguably, this approach is much faster and more efficient than a traditional analysis of univariate correlations and exclusion of features with lower rank-ordering power from the combination.

Concluding remarks

Random Forests are not common candidates for credit risk models due to their intrinsic randomness, which impedes interpretability, a critical requirement in high-risk predictive modeling domains such as credit scoring. A non-conventional approach to scorecard building based on Random Forests can enable risk practitioners to explore and discover valuable customer risk segments and craft predictive features for risk management models further optimizing risk strategies.

I hope you have enjoyed reading this post!???

The technical appendix with the code can be found in this notebook.

All views expressed are my own.

Fatih Akdere

1 年

Dieter Kurakov

1 次回应

Nikola Perovi?, FRM

Quantitative Analyst at Raiffeisen Bank International AG

1 年

Thanks for sharing, very interesting! Not sure how to intrepret log-odds transformation of a raw score coming from a tree-based model; perhaps xgboost also has a log-loss cost function. Another alternative could be to fit a random forest or other tree-based models and see which features are useful and which are not. From the useful features then construct model candidates using genetic algorithm -> starting from an initial population, and using a number of generations, on each island model candidates can be mutated, produce offsprings (between two candidates) and migrate to another island with a probability, to be crossed there. The procedure could be given minimum performance criteria model candidates need to satisfy. It can be logistic regression one to keep interpretability, if this is of concern for regulators. It also a useful way to generate features and check whether they're, of value, as modelers can have vast knowledge on products and portfolios modelled, but have a natural creativity limit. This way one can also save time vs. brute forcing through standard univarite /multivariate modelling steps, and building models for many iterations. Cheers!

1 次回应

Anton Treialt

Modeller

1 年

Thank you for the insights. Can you give an example of how would you use these trees for segmentation?

2 次回应

查看更多评论

要查看或添加评论，请登录

Denis Burakov的更多文章

Validating Tree-Based Risk Models

2024年6月12日

Validating Tree-Based Risk Models

Boosting is a fundamental concept in machine learning that has achieved remarkable success in binary classification…

9 条评论
Scorecarding with Na?ve Bayes

2024年2月20日

Scorecarding with Na?ve Bayes

In the consumer lending domain, credit scorecards serve as the fundamental pillars for decision-making processes in…

5 条评论
Balancing Risk and Profit

2023年10月17日

Balancing Risk and Profit

Understanding profit independently of risk is increasingly vital for lenders to create monetary value through proper…

7 条评论
Benchmarking PD Models

2023年9月5日

Benchmarking PD Models

When evaluating various scoring functions for the Probability of Default (PD) modeling, the most commonly assessed…

7 条评论
Unlocking Lending Profitability with Risk Modeling

2023年8月23日

Unlocking Lending Profitability with Risk Modeling

In earlier times, access to banking services required direct in-person communication with a bank officer. The outcomes…
Understanding LGD Risk

2023年7月17日

Understanding LGD Risk

The Loss Given Default (LGD) is a credit risk parameter that plays an important role in contemporary banking risk…

14 条评论
Leveraging Profit Scoring in Digital Loan Underwriting

2023年6月28日

Leveraging Profit Scoring in Digital Loan Underwriting

Traditional loan approval process relies heavily on consumers’ credit bureau scores, debt-to-income (DTI) ratios, and…
Exploring Interpretable Scorecard Boosting

2023年5月23日

Exploring Interpretable Scorecard Boosting

Credit scorecards provide lenders with a standardized and objective method to assess credit risk and make informed…

6 条评论
Measuring the Benefits of Credit Risk Model Use

2023年3月10日

Measuring the Benefits of Credit Risk Model Use

When developing credit risk models, risk practitioners tend to focus on quantitative metrics such as the Gini…
Validating New Generation Credit Risk Models

2022年11月21日

Validating New Generation Credit Risk Models

Model validation can be described as a set of processes and activities intended to verify that models are performing as…

5 条评论

See all articles

Building Random Forest Scorecards

Denis Burakov

Random forest scorecards

Practical applications of Random Forest scorecards

领英推荐

Concluding remarks

Denis Burakov的更多文章

社区洞察

其他会员也浏览了

Interest Rate Risk Measurement using Duration, Convexity -Excel / Python

15 Risk Clinic – Einstein on risks and Expected Credit Loss

Keeping an AI on Risk: Transforming the Risk function with GenAI

Managing Financial Risks with AI: Revolutionizing Risk Assessment and Decision-Making

It’s Just the Way I am. Really?

New Year, New Insights: Navigating 2025 with Cardo AI

Rating Assignment Methodologies (I)

Internal Risk Models – Competitive edge or a costly endeavour?

It handles transitions. It has improved coordination. And it insists on getting your attention. ‘SR 11-7’ is 7 years old.

Random forest scorecards

Practical applications of Random Forest scorecards

领英推荐

Concluding remarks

Denis Burakov的更多文章

Validating Tree-Based Risk Models

Scorecarding with Na?ve Bayes

Balancing Risk and Profit

Benchmarking PD Models

Unlocking Lending Profitability with Risk Modeling

Understanding LGD Risk

Leveraging Profit Scoring in Digital Loan Underwriting

Exploring Interpretable Scorecard Boosting

Measuring the Benefits of Credit Risk Model Use

Validating New Generation Credit Risk Models

社区洞察

其他会员也浏览了

Interest Rate Risk Measurement using Duration, Convexity -Excel / Python

15 Risk Clinic – Einstein on risks and Expected Credit Loss

Keeping an AI on Risk: Transforming the Risk function with GenAI

Managing Financial Risks with AI: Revolutionizing Risk Assessment and Decision-Making

It’s Just the Way I am. Really?

New Year, New Insights: Navigating 2025 with Cardo AI

Rating Assignment Methodologies (I)

Internal Risk Models – Competitive edge or a costly endeavour?

It handles transitions. It has improved coordination. And it insists on getting your attention. ‘SR 11-7’ is 7 years old.