Credit Scoring (III)

Credit Scoring (III)

“Medieval man was a cog in a wheel he did not understand; modern man is a cog in a complicated system he thinks he understands.”

Nassim Nicholas Taleb, The Bed of Procrustes


This is the third part of my article on credit scoring. The previous two can be found here:

  1. Credit Scoring (I)
  2. Credit Scoring (II)

To reduce the number of variables in a model and hence make it more concise and faster to evaluate one should make a variable selection. When using Logistic regression, the procedure to perform variable selection is based on the following statistical hypothesis test:

No alt text provided for this image

In the logistic regression, the test statistic is:

No alt text provided for this image

A chi-square distribution with 1 degree of freedom.

This test statistic will reject the null hypothesis H0 if the estimated coefficient ??i is high in absolute value compared to its standard error s.e.(??i).

Based on the value of the test statistic, we calculate the p-value, which is the probability of getting a more extreme value than the one observed. In practice, the p-value can be compared against a significance level and here are some common values used for decision:

  • If p-value < 0.01, then is highly significant
  • If 0.01<p-value < 0.05, then is significant
  • If 0.05<p-value < 0.1, then is weakly significant
  • If p-value > 0.1, then is insignificant

Various variable selection procedures can now be used based on the p-value. An important point is that as the number of variables increases, the search space grows exponentially. The number of possible variable subsets is given by 2.exp(n)-1. Below you can find a graphical representation of the different possible subsets for a case with 4 variables:

No alt text provided for this image

To keep the search space under control some heuristic search procedures are required. Using the p-values, the variable space can be navigated in three possible ways:

  1. Forward regression: starts from the empty model and always adds variables based with low p-value.
  2. Backward regression: starts from the full model and always removes variables based on high p-values.
  3. Stepwise regression: It starts from an empty model like in the forward regression but once the second variable has been added, it starts re-checking the other variables in the model and remove them if they turn out to be less significant according to their p-values.

Besides statistical significance at least three other criteria should be considered when selecting the variables:

  1. Interpretability: Checking if the sign of the regression coefficient is in line with the expectations of the credit expert is highly desirable as it will provide interpretability and ensure more confidence in the the model. Coefficients can have unexpected signs due to different statistical mumbo jumbo like multicollinearity, noise, or small sample effects. Sign restrictions can be easily enforced in a forward regression setup by preventing variables with the wrong sign from entering the model.
  2. Operational efficiency: The amount of resources allocated to the collection and processing of a variable should be reasonable from a practical point of view. Excessive processing consuming variable like trends or hard to update external data are examples where a potential correlated but less predictive but might be more suitable to use.
  3. Legal/ethical issues: some countries already forbid the use of gender, age, ethnic origin, nationality or religious beliefsto be included in credit scorecards.


Source: Credit Risk Analytics: Measurement Techniques, Applications, and Examples in SAS, 2016

要查看或添加评论,请登录

Asif Rajani的更多文章

  • 10 Ideas In Asset Management For 2024

    10 Ideas In Asset Management For 2024

    This is a summary of Oliver Wyman's 10 Asset Management trends for 2024. The original article can be found here.

  • CS: the Archegos case (I)

    CS: the Archegos case (I)

    In this article, I connect insights from the "Three Lines of Defence" section of my book with the challenges seen in…

  • Collateral Allocation and Optimization (II)

    Collateral Allocation and Optimization (II)

    This article is the second part dedicated to Collateral Allocation and Optimization. Here is the first part for your…

  • Collateral Allocation and Optimization (I)

    Collateral Allocation and Optimization (I)

    To illustrate a specific case of collateral allocation, let’s consider an obligor of the bank: the company We Make…

  • Economics and Banking (I)

    Economics and Banking (I)

    In previous articles about inflation and its impact on banking loan losses and profitability, I refer to a very common…

  • An ECL Stress Model (III)

    An ECL Stress Model (III)

    This is the last article on an ECL Stress Model. The first two can be found here and here.

  • An ECL Stress Model (II)

    An ECL Stress Model (II)

    In a previous article we went forescasted transition matrixes conditional on scenarios. In this article, we will use a…

  • An ECL Stress Model (I)

    An ECL Stress Model (I)

    A potential ECL model flow to be used for stress testing is presented in my book. In this article I present a summary…

  • Inflation and Profitability

    Inflation and Profitability

    In my previous articles, I focused on the effect of inflation in the Loan losses, i.e.

  • Inflation and Loan Losses (II)

    Inflation and Loan Losses (II)

    In the first article about inflation and Loan losses I focused on households. In this article I continue exploring the…

社区洞察

其他会员也浏览了