登录查看更多内容

6. Credit Risk Modelling for MSMEs - feature engineering & model development

Vivek Chaturvedi

Leader in Advanced Analytics and Data Science | Retail, SME, Corporate, Treasury, Transaction Banking & Wealth Management | India & MENA | Thought leader | Public speaker | IIM Bangalore

发布日期: 2023年1月23日

In the last article we saw the various types of financial variables which are used to assess credit risk. In the past few years, technology has evolved and better computing power allows us to try out more combinations of variables before we finalize the ones which we would want to use in a probability of default model. The process of creating variables and shortlisting the useful ones is called feature engineering (the independent variables being referred to as the features). In this article we will use the word feature and variable interchangeably.

Feature creation

As we have seen various variables like Gross profit margin, Fixed asset turnover ratio, Inventory turnover ratio etc. can be used to analyze credit worthiness of a borrower. We go one step ahead and create more features (variables) from these. This step requires imagination on the part of model developer. He/she has to imagine the features which should have an effect on the outcome. Like calculating the current ratio this year by the current ratio of last year thereby checking if the liquidity profile of the borrower has improved or deteriorated. I have listed a few features below. This list is not exhaustive as the purpose is to illustrate the process.

a.??????Cash and bank balance/ Adjusted tangible net worth

b.??????Cash and bank balance/ Net sales

c.??????Total outside liabilities / Total assets

Going like this we can create 300 – 500 features from financial statements.

Feature selection

The next step is to reduce the number of features by selecting a few and rejecting others. Feature selection should be stepwise process allowing the model developer to analyze and evaluate features gradually.

Step 1 – Fill Rate & Univariate Gini

In this step we use two criteria to short list features – fill rate and univariate Gini. These are the simplest ways to identify features.

Fill rate

The first filter is fill- rate. We want to keep only those features which are available for a sufficiently large proportion of the population. If a variable is a great predictor of default, however, it is not available for many of the candidates then it is of little use. We measure fill rate as a percentage of records for which the variable is available out of total records. We reject all such variables which have a fill rate below a threshold. Selecting a threshold is call of the model developer. We have taken a fill rate of 90% for the feature to be retained.

Univariate Gini

Gini is a measure of predictive power of a model[1]. Univariate Gini tells us how good a variable is in predicting the default all by itself. A good cut off can be 10% or 20% depending on the number of features created. One may use packages or libraries in R/ Python to calculate Gini.

Step 2 – Principal Component Analysis

In the next step we conduct Principal Component Analysis (PCA) to identify features which might have a bearing on the dependent variable.

Principal Component Analysis

Principal component analysis, or PCA, is a statistical procedure that allows you to summarize the information content in large data tables by means of a smaller set of “summary indices” that can be more easily visualized and analyzed[2]. PCA arranges the existing features in groups. Group 1 should have highest number of features and it indicates that those features are better predictors of default. Group 2 has fewer features and so on. We should select features in a way that we get maximum from top numbered groupings and also get features from various categories (turnover, profitability etc. – please refer to the previous articles).

领英推荐

Model risk management for banks in the AI paradigm…

Crisil 1 年前

Digital Twins: Transformative Technology for Risk…

Prof. Dr. Ingrid Vasiliu-Feltes 5 个月前

AI and ML: Key Drivers to Building a Resilient Business

Ronald van Loon 4 年前

Step 3 – Binning and Variable Transformation

In this step we perform two exercises. We bin the variables thus creating categorical variables from numerical and we transform the variables to be used for modelling.

Binning

We put the variables in bins or class intervals and observe the “bad – rate” in those bins. Now for a moment let us take a diversion and use a different analogy to understand the importance of binning. Imagine that there are two kids of age five years and four years and you are asked if you think they are significantly different in their ability to talk and communicate. Your answer will most likely be no. But, if there are two kids with ages three and two and you are asked the same question, your answer might change.

When it comes to behavior the absolute difference in value might not matter as much as the threshold. And that is what binning does. Whether the interest coverage ratio of a company is 1.2 or 1.3 might not be very significant information, but whether it is 1 or 0.9 might create a lot of difference in its credit worthiness.

We bin the variables based on the bad rates in those bins. We start with 20 bins and gradually merge those till we get monotonicity[3] in the bad rate. Please see the images below.

Figure 1 - Original bins

Figure 2 - modified bins

Variables Transformation

In my previous articles I have argued that we should use derived variables instead of raw variables for model development. The simple logic being that it does not matter how much current assets you have, what matter is what is the proportion of current assets to current liabilities. And then I extended that argument for creation of chaid variables. I take that argument one step ahead and say that what matter for a feature to be useful is the sample of “bads” contained in the bin divided by the sample of “goods”. We refer to this variable as Weight of Evidence (WOE) and use it for the model development.

Model development

The new variables thus developed can be used as independent variables in a logistic regression model. It is a very standard process and therefore I will not describe that in detail here.

[1] https://www.crisil.com/content/dam/crisil/our-analysis/publications/default-study/crisil-ratings-annual-default-and-ratings-transition-study-fy-2022.pdf

[2] https://www.sartorius.com/en/knowledge/science-snippets/what-is-principal-component-analysis-pca-and-how-it-is-used-507186

[3] https://www.dhirubhai.net/pulse/understanding-financial-statements-2-vivek-chaturvedi/

Advanced Analytics in Banking

571 位关注者

要查看或添加评论，请登录

Vivek Chaturvedi的更多文章

Starting your analytics journey? These are your four guides.

2023年10月2日

Starting your analytics journey? These are your four guides.

Most of us would be aware of the 4P’s of marketing. Phillip Kotler has immortalized this concept and generations of…

1 条评论
Achieving Analytics at Scale & Speed

2023年9月22日

Achieving Analytics at Scale & Speed

Many BFSI organizations want to adopt data analytics and use it increase the bottom line. However, it is imperative to…
Identification of analytics use cases in Business Banking - a step by step approach

2022年10月23日

Identification of analytics use cases in Business Banking - a step by step approach

Use of advanced analytics and data science has gained traction in corporate and SME banking only recently. One struggle…
ONDC – Implications for Business Banking Analytics

2022年9月10日

ONDC – Implications for Business Banking Analytics

Government of India took a huge step in the direction of democratizing e- commerce and set up the Open Network for…
5. Credit Risk Modelling for MSMEs - understanding financial statements (2/2)

2022年8月28日

5. Credit Risk Modelling for MSMEs - understanding financial statements (2/2)

In my last article I described a few features which can be created from a business’ Profit & Loss P&L)/ Income…
4. Credit Risk Modelling for MSMEs - understanding financial statements ( 1 /2)

2022年8月21日

4. Credit Risk Modelling for MSMEs - understanding financial statements ( 1 /2)

In one of my previous articles, I discussed what are the short comings of financial statements when it comes to credit…

3 条评论
3. Credit risk modelling for MSMEs – selecting the right time

2022年8月7日

3. Credit risk modelling for MSMEs – selecting the right time

In the previous article, I discussed the definition of default. In this article, I will try to throw some light on a…
2. Credit Risk Modelling for MSMEs - defining default

2022年7月29日

2. Credit Risk Modelling for MSMEs - defining default

In my last article, I discussed the challenges faced when one uses financial analysis designed for corporate customers…
1. Credit Risk Modelling for MSMEs - introduction

2022年7月28日

1. Credit Risk Modelling for MSMEs - introduction

Challenges in credit assessment of MSMEs Small and Medium Enterprises play a vital role in the economy of a country. It…

1 条评论
India’s huge population – dividend or disaster?

2015年6月17日

India’s huge population – dividend or disaster?

Population Explosion! The theme of the 80’s and 90’s I remember growing up in the 1980’s. I was in school and every…

2 条评论

See all articles

6. Credit Risk Modelling for MSMEs - feature engineering & model development

Vivek Chaturvedi

Leader in Advanced Analytics and Data Science | Retail, SME, Corporate, Treasury, Transaction Banking & Wealth Management | India & MENA | Thought leader | Public speaker | IIM Bangalore

Feature creation

Feature selection

Step 1 – Fill Rate & Univariate Gini

Fill rate

Univariate Gini

Step 2 – Principal Component Analysis

Principal Component Analysis

领英推荐

Step 3 – Binning and Variable Transformation

Binning

Variables Transformation

Model development

Advanced Analytics in Banking

571 位关注者

Vivek Chaturvedi的更多文章

社区洞察

其他会员也浏览了

Misinterpretations of the Cone of Uncertainty

18 Risk Clinic - Refineries of the new oil (part 2): Data Visualisation and Story Telling

14 Risk Clinic - Upgrading MRM from Logistic to ML

Expected Value and Simulation Methods in Schedule Risk Analysis

IMPACT OF GENERATIVE AI ON ENTERPRISE RISK MANAGEMENT – A STRATEGIC PRIMER FOR BANKING, CAPITAL MARKET, AND INSURANCE FIRMS

How to Build Robust Trading Strategies, Focusing on Risk Management + AI

How AI is Transforming Credit Risk Management ????

AI in Finance: Revolutionizing Risk Management and Fraud Detection in 2024

Emerging machine learning trends in data quality and governance can significantly enhance financial risk management.

Feature creation

Feature selection

Step 1 – Fill Rate & Univariate Gini

Fill rate

Univariate Gini

Step 2 – Principal Component Analysis

Principal Component Analysis

领英推荐

Step 3 – Binning and Variable Transformation

Binning

Variables Transformation

Model development

Advanced Analytics in Banking

571 位关注者

Vivek Chaturvedi的更多文章

Starting your analytics journey? These are your four guides.

Achieving Analytics at Scale & Speed

Identification of analytics use cases in Business Banking - a step by step approach

ONDC – Implications for Business Banking Analytics

5. Credit Risk Modelling for MSMEs - understanding financial statements (2/2)

4. Credit Risk Modelling for MSMEs - understanding financial statements ( 1 /2)

3. Credit risk modelling for MSMEs – selecting the right time

2. Credit Risk Modelling for MSMEs - defining default

1. Credit Risk Modelling for MSMEs - introduction

India’s huge population – dividend or disaster?

社区洞察

其他会员也浏览了

Misinterpretations of the Cone of Uncertainty

18 Risk Clinic - Refineries of the new oil (part 2): Data Visualisation and Story Telling

14 Risk Clinic - Upgrading MRM from Logistic to ML

Expected Value and Simulation Methods in Schedule Risk Analysis

IMPACT OF GENERATIVE AI ON ENTERPRISE RISK MANAGEMENT – A STRATEGIC PRIMER FOR BANKING, CAPITAL MARKET, AND INSURANCE FIRMS

How to Build Robust Trading Strategies, Focusing on Risk Management + AI

How AI is Transforming Credit Risk Management ????

AI in Finance: Revolutionizing Risk Management and Fraud Detection in 2024

Emerging machine learning trends in data quality and governance can significantly enhance financial risk management.