Exploring Interpretable Scorecard Boosting
Source: https://unsplash.com/@fakurian

Exploring Interpretable Scorecard Boosting

Credit scorecards provide lenders with a standardized and objective method to assess credit risk and make informed, automated lending decisions. They play a crucial role in streamlining the lending strategy, minimizing the potential for human bias, and enabling effective risk management.

One of the key methods utilized in scorecard development is the Weight-of-Evidence Logistic Regression (WOE LR) model design. These "glass-box" models have been in use for decades and are known for their ease of interpretation. What is often overlooked, they have demonstrated reliability in production and can be deployed using SQL alone.

However, a limitation of these models is their inability to describe the risk attributes of different segments and products within a single scorecard due to their relative simplicity. Beyond that, the use of numerous scorecards has several drawbacks. Processes such as beta tuning and retraining, validation, versioning, deployment, and maintenance require significant manual effort and subject matter expertise.

In this post, building upon previous work by Scotiabank's ML team, I will follow their proposed credit risk modeling technique based on scorecard boosting. This version of good-bad analysis can enhance scorecard performance without compromising model interpretability, which is crucial for explaining credit decisions to model users, customers, and regulators alike.

Scorecard boosting

Credit scoring is a high-stakes field where accuracy and interpretability are crucial for real-world usability. As a result, many complex "black-box" algorithms have not gained widespread adoption in lending. An excellent example highlighting this is the outcome of the FICO Explainable Machine Learning Challenge, where Duke University's Team emerged as winners by building a fully transparent "glass-box" risk model. Their model not only outperformed more complex alternatives in terms of interpretability but also exhibited superior predictive power.

The need for reliable scoring models is also highlighted by regulation and model risk management best practices. A corollary of this requirement is what is known as "accuracy-explainability" trade-off: it may not be feasible to increase one dimension without decreasing the other.

In his lecture Machine Learning for Retail Credit Risk for NVIDIA, Paul Edwards presented a well-known accuracy-explainability diagram that specifically addressed the intricacies of the credit risk use-case:

No alt text provided for this image
Adapted from "Machine Learning in Retail Credit Risk: Algorithms, Infrastructure, and Alternative Data — Past, Present, and Future" - NVIDIA

As a challenger approach to WOE LR, Scotiabank's ML team proposed a boosting technique for credit scorecard development. Such models, as was demonstrated in the presentation, have the same positive properties of a linear model, yet are based on a more advanced tree-based estimation which yields substantial gains in scoring accuracy.

In the next sections, we will explore how a prototype of a boosted scorecard can be built based on FICO-xML-Challenge dataset.

How does it work?

A boosted scorecard can be described as an ensemble of decision trees that are trained iteratively using the gradient boosting algorithm. In the context of credit decisioning, specific constraints such as monotonicity (to control for the relationship with the default rate) and interaction (to eliminate potential feature interactions) are often applied.

Since we're working with a binary target variable, the predictions in leafs generated by each consecutive tree can be interpreted as log odds. Consequently, they can be converted into scorecard points, similar to the WOE LR approach as Weights and Biases' notebook and model card have shown.

To demonstrate the application of boosted scorecards, we fit a model using a pre-defined set of features with monotonic trends from the OptBinning library's example. Our simple boosted scorecard with a maximum tree depth of 1 consists of 45 individual trees. Below we can see the first five trees:

No alt text provided for this image
Boosted scorecard: first 5 levels

It can be seen from the table above that customers with a credit bureau score less than 74 are assigned 0 points, while those above 74 receive 24 points in the first iteration (Tree 0).

We validate our scorecard by checking against XGBoost model's first and second tree plots:

No alt text provided for this image
Boosted scorecard: first tree plot
No alt text provided for this image
Boosted scorecard: second tree plot

After fitting the scorecard, we can apply model predictions in scorecard points on the test data. The ultimate goal of scorecard development is to determine a cut-off point, which can be referred to as a "sweet spot," where most of the high-risk applicants are rejected while retaining the majority of low-risk applicants.

We further visualize this threshold in the following chart:

No alt text provided for this image
Boosted scorecard: cut-off

As expected, a higher credit score corresponds to lower risk. By setting a cut-off point of less than ~30 points for our known good-bad population, we can reject approximately 80% of bad risk.

Model interpretability

A key advantage of boosted scorecards is their global and local interpretability. Since the final score is derived from the sum of points assigned to each feature in our scorecard, feature importances are straightforward and easy to calculate.

We can visualize scorecard's custom feature importances below:

No alt text provided for this image
Boosted scorecard: feature importances

We can observe that external score has the highest impact on predictions following by delinquency and utilization features.

To validate our results, we can further look at a similar diagram using the SHAP global importance plot, which is a common model-agnostic method for interpreting model results:

No alt text provided for this image
Boosted scorecard: SHAP feature importances

As we can see from the list of top features selected by SHAP, both methods yield highly consistent results, confirming the reliability and interpretability of the boosted scorecard model.

Concluding remarks

Explainability is a crucial element in credit decisioning, as consumers have the right to understand why their applications were rejected and to question the data and methods behind the decision. While scorecards based on linear models have traditionally been considered the gold standard for standardized and reliable credit risk assessment, a more advanced technique called scorecard boosting carries potential in boosting predictive power of models without sacrificing their explainability.

?

--

I hope you have enjoyed reading this post!???

The technical appendix with the code can be found in my?GitHub.

All views expressed are my own.

Dieter Kurakov

Lead Product Owner IRB models at S-Kreditpartner

1 年
Paul Edwards

Director of Risk Decisions at Wealthsimple

1 年

Great work! Love this

Agostino Calamia

Building AI products with a strong focus in product management

1 年

Very intersting Denis! Do you know any company that applies this already?

要查看或添加评论,请登录

Denis Burakov的更多文章

  • Designing AI Underwriters

    Designing AI Underwriters

    The integration of AI into modern loan underwriting is set to reshape how institutions approach credit decisioning. As…

    6 条评论
  • Validating Tree-Based Risk Models

    Validating Tree-Based Risk Models

    Boosting is a fundamental concept in machine learning that has achieved remarkable success in binary classification…

    9 条评论
  • Scorecarding with Na?ve Bayes

    Scorecarding with Na?ve Bayes

    In the consumer lending domain, credit scorecards serve as the fundamental pillars for decision-making processes in…

    5 条评论
  • Balancing Risk and Profit

    Balancing Risk and Profit

    Understanding profit independently of risk is increasingly vital for lenders to create monetary value through proper…

    7 条评论
  • Building Random Forest Scorecards

    Building Random Forest Scorecards

    In the lending industry and credit risk research, a risk practitioner can often encounter Weight-of-Evidence logistic…

    6 条评论
  • Benchmarking PD Models

    Benchmarking PD Models

    When evaluating various scoring functions for the Probability of Default (PD) modeling, the most commonly assessed…

    7 条评论
  • Unlocking Lending Profitability with Risk Modeling

    Unlocking Lending Profitability with Risk Modeling

    In earlier times, access to banking services required direct in-person communication with a bank officer. The outcomes…

  • Understanding LGD Risk

    Understanding LGD Risk

    The Loss Given Default (LGD) is a credit risk parameter that plays an important role in contemporary banking risk…

    14 条评论
  • Leveraging Profit Scoring in Digital Loan Underwriting

    Leveraging Profit Scoring in Digital Loan Underwriting

    Traditional loan approval process relies heavily on consumers’ credit bureau scores, debt-to-income (DTI) ratios, and…

  • Measuring the Benefits of Credit Risk Model Use

    Measuring the Benefits of Credit Risk Model Use

    When developing credit risk models, risk practitioners tend to focus on quantitative metrics such as the Gini…

社区洞察

其他会员也浏览了