Precision in Credit Risk Models: The Art of Defining Business-Centric Target Variables

Precision in Credit Risk Models: The Art of Defining Business-Centric Target Variables

In the world of fintech, where we rely on data to make decisions, defining the right target variables is like an artist crafting a masterpiece. It's not just about technical skills; it's about blending real-world business knowledge with careful analysis.

Imagine a fintech company using machine learning to build a credit risk model. They have a powerful algorithm and big ambitions to improve their risk assessments. But if the target variable, which is like the model's core, doesn't perfectly match the specific business problem they're tackling, everything can fall apart.

Even if a model is super accurate in its predictions, it can turn into a financial disaster if the target variable doesn't fit the problem it's meant to solve. This misalignment can lead to big losses instead of the expected gains.

To succeed in fintech, models need to start by getting the basics right – choosing and defining the right target variables. It's not just about being precise; it's about making sure that every prediction helps steer decisions in line with the business goals. Essentially, aligning target variables with business needs is the foundation for unlocking the full potential of credit risk models in fintech.

One effective way to understand this concept is by examining it through the lens of an Acquisition model or Application Scorecard. These models play a crucial role in the fintech industry, significantly influencing important decisions.

Imagine for a moment that a target variable can come in various forms, such as 90 days past due (DPD) within 12 months of origination (MOB), 90 DPD within 18 MOB, 60 DPD within 9 MOB, 45 DPD within 6 MOB, or 30 DPD within 3 MOB. Established banks and financial institutions typically follow standard targets, like 90 DPD within 12 MOB for secured loans such as home and car loans, and 60 DPD within 9 MOB for unsecured loans. However, these standards may not be suitable for every fintech portfolio.

So, the question arises: How can you determine the ideal combination of DPD and MOB that perfectly aligns with your unique fintech portfolio or book?

This exploration into target variables will delve into the details of choosing the most appropriate combination—one that fits your specific fintech environment. This process sets the stage for data-driven success in your acquisition models. Let's embark on this journey together to uncover the secrets of precision in defining fintech target variables.

Let's break it down into three key stages that serve as the foundation for robust target variable definition:

1. Roll Rate Analysis: The first stage involves meticulous Roll Rate analysis. Here, the crucial decision is to select the right Delinquency Past Due (DPD) threshold for your target definition. This step sets the stage for identifying the specific credit risk you aim to assess. Selecting the appropriate DPD ensures that your model focuses on the most relevant delinquency categories, aligning it with your business objectives.

In the flow diagram above, consider the scenario where individuals have loan accounts in the 30-59 days past due (DPD) category. It's noteworthy that approximately 83% of these individuals transitioned towards a lower risk profile. Conversely, for those with loan accounts in the 60-89 DPD category, nearly 85% shifted to a higher risk profile, with only 15% moving back to a lower risk category. Moreover, individuals with loan accounts in the 90-119 DPD category exhibited a remarkable trend, with around 95% progressing to a higher risk profile and only 5% regressing to a lower risk category.?

This observation underscores a prevalent practice within the fintech domain. When a substantial majority, exceeding 80% of the population, exhibits a tendency to move towards a higher risk profile, it often signifies a prudent choice for establishing the target DPD. Consequently, in this context, the 60 DPD category emerges as a strong contender for the most accurate DPD to consider as the target variable.

2. Vintage Analysis: In the second stage, Vintage analysis takes center stage. Now, the focus shifts to choosing the right performance month window. This window defines the timeframe within which you'll track the delinquency trends. Selecting the optimal performance window is essential for capturing the right data points and ensuring that your model's predictions remain accurate and actionable.

Now that we've established 60 days past due (DPD) as our target variable through the Roll Rate analysis discussed in the previous section, we proceed to determine the appropriate performance window, which forms the second part of our target variable definition.

To achieve this, we calculate and visualize the Cumulative Bad Rate, which is derived from the fresh population each month that reaches the 60 DPD threshold. Our goal is to identify the vintage at which the cumulative bad rate graph stabilizes.

Upon examining the graph above, a clear trend emerges. For most loans originated in different quarters of the financial year, the graphs tend to stabilize around the 8-month mark. Therefore, in this context, an 8-month performance window appears to be the most accurate period to consider for performance observation.

It's important to note that individuals who hit the 60 DPD mark within this 8-month performance window are marked as 'bad' in our analysis. This allows us to precisely capture and assess credit risk over time.

3. Coverage Analysis: The final stage involves Coverage Analysis, where you verify the target definition derived from both Roll Rate and Vintage analysis. This step acts as a quality check, ensuring that your chosen target variable aligns seamlessly with your fintech portfolio's unique characteristics. By confirming alignment, you mitigate the risk of misclassification and enhance the overall reliability of your credit risk model.

In the lending business, the significance of 90 days past due (DPD) lies in its classification as Non-Performing Assets (NPAs). Therefore, when we make a decision to choose a DPD threshold below 90 DPD, we must exercise due diligence in assessing the accuracy of our selection. To achieve this, we construct a table akin to a confusion matrix that compares our chosen DPD threshold with the standard 90 DPD.

Within this table, which comprises various combinations of DPD and performance months of business (MOB), we gain valuable insights. Notably, our chosen definition of 60 DPD in 8 MOB exhibits the ability to accurately identify 91.50% of NPAs (those ever reaching 90 DPD). Furthermore, it successfully captures 81% of loans classified as 'bad' by our target definition, ultimately leading to NPAs.

As an illustration, consider the case of 30 DPD in 10 MOB, which accurately tags 96% of NPAs. However, only 56% of loans identified as 'bad' by the 30 DPD in 10 MOB criterion evolve into NPAs in the future.

The central objective of this analysis is to identify a combination that achieves high percentages on both fronts, which is clearly exemplified by the 60 DPD in 8 MOB scenario. Hence, this analysis conclusively demonstrates that our selected bad definition, specifically 60 DPD in 8 MOB, stands out as the most robust among all target variables.

In fintech, precision isn't a luxury; it's a necessity. By mastering the art of crafting business-centric target variables, we empower our models to make informed, profitable decisions in this ever-evolving industry. It's the path to achieving financial masterpieces and shaping the future of finance.

Author: Vishal Kumar ,Fintech Lead Data Scientist at Paytm

LinkedIn: https://www.dhirubhai.net/in/vishal-kumar-906581ab/

要查看或添加评论,请登录

社区洞察

其他会员也浏览了