ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Scorecarding with Na?ve Bayes

Denis Burakov

å‘å¸ƒæ—¥æœŸ: 2024å¹´2æœˆ20æ—¥

In the consumer lending domain, credit scorecards serve as the fundamental pillars for decision-making processes in determining who qualifies for a loan. Lenders rely on these scorecards as benchmarks, collecting evidence of a customer's creditworthiness. Traditionally, credit scorecards are constructed through well-established methods, namely Logistic Regression and a combination of Weight of Evidence (WOE) + Logistic Regression. This text explores an alternative approach: a non-parametric method employing WOE as a Na?ve Bayes estimator to build a logarithmic scoring system, providing an alternative and simpler methodology for credit risk modeling. ??

Common Ways to Build Credit Scorecards

There are two commonly accepted ways to build credit scorecards for decision-making in the lending industry.

1?? Logistic Regression

When we build credit scorecards using logistic regression, we first standardize input variables to some desired range (e.g., using z-scores) and convert resulting values to credit scores using a technique like Points to Double Odds (PDO).

2?? Weight of Evidence (WOE) + Logistic Regression

With this â€œsemi-Na?veâ€ approach, we create distinct risk groups for each input feature and calculate WOE scores. As a next step, we fit a logistic regression to "fine-tune" and dampen the weights of features. Finally, we can derive credit scores from WOE-transformed features and logistic regression weights, for example, using the PDO method.

Yet, building scorecards using WOE without logistic regression is not so common due to an excessive focus of practitioners on the importance of logistic regression in this process. This text aims to show an alternative way how to build credit scorecards with a non-parametric approach in mind: leveraging WOE as a Na?ve Bayesian estimator??.

WOE-Na?ve Bayes Scorecard Ingredients

The following diagram describes the modeling approach to building a scorecard with the Na?ve Bayes approach:

To understand the modeling workflow we need to introduce some definitions.

?? Binning: Binning is the process of grouping or discretizing numerical data into intervals or "bins." For example, numerical features can be divided into bins with an equal number of points or using a clustering technique like k-means.

?? Target encoding: Target encoding involves calculating the average value of the target variable (default or bad rate in the credit scoring case) for a given bin or group.

?? Log-odds: Log-odds (used interchangeably with "logit" or "logits") represent the natural logarithm of the odds of belonging to class 1 (default) or class 0 (no default), odds are p / (1-p).

?? Weight of Evidence (WOE): WOE in the Na?ve Bayes scorecarding process can be seen as a "distance" in log-odds between a particular risk group's log-odds of default and the average log-odds of the entire sample.

The Origins of WOE Theory

The Weight of Evidence (WOE) concept, which found its wide application in credit scoring, has its roots in the 19th century England. For example, C.S. Peirce, a well-known thinker, used the term to talk about how strong the evidence supports a hypothesis, also introducing the term log-odds. Jumping to the 20th century, Irving John Good, who was the main statistical assistant to Alan Turing, among his other contributions to the field of Bayesian analysis and statistics, developed the theory of Weight of Evidence for hypothesis testing.

In his brief survey on Weight of Evidence published in 1985, Good provided an extensive overview of the concept. Most notably, he wrote:

"The final probability of H, should depend only on the weight of evidence and on the initial probability, say

P(H âˆ£ E) = g[W(H:E), P(H)]

If H and ?H are simple statistical hypotheses, and if E is one of the possible experimental outcomes occurring in the definitions of H and ?H, then P(E âˆ£ H) / P(E âˆ£?H) is a simple likelihood ratio, but this is a very special case. In general, this ratio is regarded as meaningful only to a Bayesian. It could be called a ratio of Bayesian likelihoods."

He provides the formula for WOE (left-hand side), which is more commonly represented in modern terms on the right-hand side

Further in his summary, Good provides the formula for the Bayes Factor (likelihood ratio), which he also refers to as the Bayes-Jeffreys-Turing factor, summarizing its contributors:

"The right side of equation can be described in words as the ratio of the final odds of H to its initial odds, or the ratio of the posterior to the prior odds, or the factor by which the initial odds of H are multiplied to give the final odds.

In current Bayesian literature it is usually called the Bayes factor in favour of H provided by E. Thus weight of evidence is equal to the logarithm of the Bayes factor."

Going back to the WOE definition described in the beginning of this text we can see that:

Because of the independence conditions ("na?ve"), factors are multiplicative and weights of evidence (WOE) are additive.

When comparing this WOE calculation to the traditional credit risk definition, one may observe that the signs are "flipped." This is because in credit risk we normally define the same entity using the log-odds of goods (class 0) to bads (class 1), known as the good-to-bad ratio. The sign may be more intuitive that a negative score drives the credit score down.

How It Works

We can visualize the process step-by-step on a small data subset with 10 samples (the model is pre-trained on the full sample).

Step 1: Raw features ??

Step 2: Binning ??

Step 3: Target encoding ??

Step 4: Target encoding to log-odds ??

Step 5: Average default rate ??

é¢†è‹±æŽ¨è

"Analytics plays a crucial role in assessing and mitigating risks associated with lending and investment activities."

"Analytics plays a crucial role in assessing andâ€¦

Vivriti Capital 8 ä¸ªæœˆå‰

Why is the Bank Statement Analysis Tool Indispensable?

ScoreMe Solutions 1 å¹´å‰

NeSLâ€™s Digital Document Execution (DDE)

Aegis Graham Bell Award 1 ä¸ªæœˆå‰

Step 6: Log-odds of average default rate ??

Step 7: Calculating WOE scores (4-6) ??

In this step we subtract the constant of sample log-odds (-2.1972) from Step 6 from each target-encoded log-odds value from Step 4.

Step 8: Inference: calculating a sum of WOE scores ??

Step 9: Inference: adding bias ??

In this step we add the bias (sample log-odds) from Step 6 to the prediction column from Step 8 to get the final prediction (other columns not affected).

Step 10: Obtaining a probability of default (PD) ??

We can convert the final log-odds score to a probability using sigmoid.

Step 11: Calculating credit scores from WOE scores ??

We convert WOE scores to credit scores by using the PDO method.

In this process, we define PDO, target score, and target odds parameters to calculate the factor and offset parameters. Each factor i has an equal weight (since we are adding evidences), and bias is the sample log-odds.

The following parameters were used for the transformation of WOE scores into credit scores: PDO = 30, Target Score = 500, and Odds = 20.

Step 12: Calculating the final credit score ??

The final score is the sum of individual scores across 6 features.

Below we can visualize the distribution of credit scores on the hold-out set of data points, which were not used in estimating the model:

To test the performance of the WOE Na?ve Bayes model, we can benchmark the discriminatory power and calibration accuracy of WOE Na?ve Bayes score against logistic regression.

Both discrimination and calibration metrics indicate that the WOE Na?ve Bayes model surpasses its linear challenger, boasting better Gini and Brier scores. This indicates superior performance in discriminating good and bad credit risk and achieving more accurate probability of default (PD) estimates.

The WOE-Na?ve Bayes scorecard introduces a novel and simplified approach to credit risk assessment in consumer lending. It harmonizes with traditional scoring methods while offering a fresh perspective with its minimal reliance on assumptions, aside from independence (still a bold one). This approach suggests that a simpler model can effectively support credit risk assessment, maintaining clarity without compromising on insight. ??

Pipeline

Below is a scikit-learn pipeline implementing the WOE Na?ve Bayes estimator:

import numpy as np
from sklearn.preprocessing import (
    KBinsDiscretizer,
    TargetEncoder,
    FunctionTransformer
)
from sklearn.pipeline import make_pipeline
from scipy.special import logit

base_log_odds = np.log(np.mean(y_train) / (1 - np.mean(y_train)))
def convert_to_woe(X: np.ndarray):
    eps = 1e-8
    X = logit(X + eps)
    X = X - base_log_odds
    if X.ndim == 1:
        X = X.reshape(-1, 1)
    return X

woe_pipeline.fit = make_pipeline(
    KBinsDiscretizer(n_bins=10, encode="ordinal", strategy="kmeans"),
    TargetEncoder(smooth=0.0001, cv=2),
    FunctionTransformer(convert_to_woe, validate=False)
)

woe_pipeline.fit(X_train, y_train)

Credit score

This code snippet below illustrates how to create credit scores from the WOE Na?ve Bayes estimator:

import numpy as np

pdo = 30.0
TargetScore = 500
Odds = 20.0

factor = pdo / np.log(2)
offset = TargetScore - (factor * np.log(Odds))

bias_ind = base_log_odds / 6
woe_df['score'] = np.sum(-(woe_df + bias_ind) * factor + (offset / 6), axis=1)

I hope you have enjoyed reading this post!???

I would like to thank Joachim Nsofini, PhD , Guillermo Navas Palencia, PhD , and Paul Edwards for their valuable feedback.

All views expressed are my own.

Explore more on #CreditRiskModeling, #Lending, and #ModelRiskManagement, and stay updated by subscribing here: https://linktr.ee/deburky ??

RenÃ© Naarmann

Director

1 å¹´

Mikael Ling maybe of interest?!

èµž

å›žå¤

ANKIT JAIN,CQF,EPAT

Wholesale Credit Risk Model Validation

1 å¹´

Hey Denis. Thanks for explaining how Naive Bayes' can be used with WOE value. Could you please tell me why we introduced BIAS in our calculation. What's the logic?

èµž

å›žå¤

1 æ¬¡å›žåº”

Roman Matyash

For more than two decades, The Best Rain Gutters Inc. has set the standard for quality and reliability in the gutter installation sector.

1 å¹´

Fascinating insights into leveraging Na?ve Bayes for credit scorecard construction! ??

èµž

å›žå¤

1 æ¬¡å›žåº”

Denis Burakov

1 å¹´

For those interested in the technicalities between the classical and Good's Bayesian formulas for Weight of Evidence (WOE), I invite you to explore this worked-through example ?? https://docs.google.com/spreadsheets/d/14BabLDtJL917JF7bIZXXFwOcVynEKBX_NzRDoPqTv70/edit?usp=sharing

èµž

å›žå¤

3 æ¬¡å›žåº”

æŸ¥çœ‹æ›´å¤šè¯„è®º

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Denis Burakovçš„æ›´å¤šæ–‡ç«

Validating Tree-Based Risk Models

2024å¹´6æœˆ12æ—¥

Validating Tree-Based Risk Models

Boosting is a fundamental concept in machine learning that has achieved remarkable success in binary classificationâ€¦

9 æ¡è¯„è®º
Balancing Risk and Profit

2023å¹´10æœˆ17æ—¥

Balancing Risk and Profit

Understanding profit independently of risk is increasingly vital for lenders to create monetary value through properâ€¦

7 æ¡è¯„è®º
Building Random Forest Scorecards

2023å¹´9æœˆ20æ—¥

Building Random Forest Scorecards

In the lending industry and credit risk research, a risk practitioner can often encounter Weight-of-Evidence logisticâ€¦

6 æ¡è¯„è®º
Benchmarking PD Models

2023å¹´9æœˆ5æ—¥

Benchmarking PD Models

When evaluating various scoring functions for the Probability of Default (PD) modeling, the most commonly assessedâ€¦

7 æ¡è¯„è®º
Unlocking Lending Profitability with Risk Modeling

2023å¹´8æœˆ23æ—¥

Unlocking Lending Profitability with Risk Modeling

In earlier times, access to banking services required direct in-person communication with a bank officer. The outcomesâ€¦
Understanding LGD Risk

2023å¹´7æœˆ17æ—¥

Understanding LGD Risk

The Loss Given Default (LGD) is a credit risk parameter that plays an important role in contemporary banking riskâ€¦

14 æ¡è¯„è®º
Leveraging Profit Scoring in Digital Loan Underwriting

2023å¹´6æœˆ28æ—¥

Leveraging Profit Scoring in Digital Loan Underwriting

Traditional loan approval process relies heavily on consumersâ€™ credit bureau scores, debt-to-income (DTI) ratios, andâ€¦
Exploring Interpretable Scorecard Boosting

2023å¹´5æœˆ23æ—¥

Exploring Interpretable Scorecard Boosting

Credit scorecards provide lenders with a standardized and objective method to assess credit risk and make informedâ€¦

6 æ¡è¯„è®º
Measuring the Benefits of Credit Risk Model Use

2023å¹´3æœˆ10æ—¥

Measuring the Benefits of Credit Risk Model Use

When developing credit risk models, risk practitioners tend to focus on quantitative metrics such as the Giniâ€¦
Validating New Generation Credit Risk Models

2022å¹´11æœˆ21æ—¥

Validating New Generation Credit Risk Models

Model validation can be described as a set of processes and activities intended to verify that models are performing asâ€¦

5 æ¡è¯„è®º

See all articles

Scorecarding with Na?ve Bayes

Denis Burakov

Common Ways to Build Credit Scorecards

WOE-Na?ve Bayes Scorecard Ingredients

The Origins of WOE Theory

How It Works

é¢†è‹±æŽ¨è

Pipeline

Credit score

Denis Burakovçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Why Lenders Need AI to Stay Competitive in 2025

Banking on shaky grounds

Ethical AI: What does it mean for digital lenders?

Are You Leaving Money on the Table? Discover the Future of Lending Now!

Automation of Branch Review Process - Case Study

Scope for Alternative Lending

Early Alerts, Smarter Decisions - Experience Nupeakâ€™s EWS Framework!

Unlocking the Full Power of AI and APIs in Lending: Less Chasing Paperwork, More talking to People

?? Unlock New Skills with the Rohi Support Program's E-Library! ??

Empowering Fintech's Lending decisions

Common Ways to Build Credit Scorecards

WOE-Na?ve Bayes Scorecard Ingredients

The Origins of WOE Theory

How It Works

é¢†è‹±æŽ¨è

Pipeline

Credit score

Denis Burakovçš„æ›´å¤šæ–‡ç«

Validating Tree-Based Risk Models

Balancing Risk and Profit

Building Random Forest Scorecards

Benchmarking PD Models

Unlocking Lending Profitability with Risk Modeling

Understanding LGD Risk

Leveraging Profit Scoring in Digital Loan Underwriting

Exploring Interpretable Scorecard Boosting

Measuring the Benefits of Credit Risk Model Use

Validating New Generation Credit Risk Models

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Why Lenders Need AI to Stay Competitive in 2025

Banking on shaky grounds

Ethical AI: What does it mean for digital lenders?

Are You Leaving Money on the Table? Discover the Future of Lending Now!

Automation of Branch Review Process - Case Study

Scope for Alternative Lending

Early Alerts, Smarter Decisions - Experience Nupeakâ€™s EWS Framework!

Unlocking the Full Power of AI and APIs in Lending: Less Chasing Paperwork, More talking to People

?? Unlock New Skills with the Rohi Support Program's E-Library! ??

Empowering Fintech's Lending decisions

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†