Beyond Probabilities: How Survival Analysis Enhances Customer Insight and Credit Risk Management
Source: own elaboration based on model MTLR

Beyond Probabilities: How Survival Analysis Enhances Customer Insight and Credit Risk Management

Survival analysis, a statistical method that focuses on the time to an event, has proven to be an invaluable tool across various domains, including credit risk analysis, probability of default assessment, and propensity to purchase analysis. Its unique approach provides a deeper layer of insight, complementing traditional probability-based methodologies. Here’s a closer look at why survival analysis is an indispensable complement:

1. Precise Risk Modeling:

In the realm of credit risk analysis and default prediction, the element of time is paramount. Some borrowers may default immediately after acquiring a loan, while others maintain a good payment record for an extended period before encountering financial difficulties. Survival analysis allows for a more nuanced modeling of this temporal aspect, acknowledging that not all credit events unfold simultaneously.

2. Data Censoring:

Real-world data is often imperfect. In some cases, we lack complete information for all customers until a particular event transpires, such as a loan default. The occurrence of such right-censored data challenges traditional analysis methods. Survival analysis, however, excels at accommodating this form of data censoring, ensuring that even partial information can be leveraged effectively for modeling and prediction.

3. Identification of Risk Factors:

Survival analysis empowers risk analysts to identify factors that influence the timing of the event in question. By examining variables like income, credit history, age, and more, we can gain a deeper understanding of how they affect the likelihood of default or propensity to make a purchase, and how these relationships evolve over time.

4. Customer Portfolio Monitoring:

In the context of propensity to purchase analysis, survival analysis proves beneficial in monitoring customer behavior post-acquisition. It helps businesses assess how long customers maintain their relationship with the company before potentially switching to a competitor or canceling a service.

5. Modeling Recurrent Events:

Some events, such as loan defaults, can be recurrent. A customer who defaults once may be more likely to do so again in the future. Survival analysis is well-equipped to model these repeat occurrences, offering a dynamic approach to understanding and predicting these events over time.

So, survival analysis serves as an essential complement to traditional probability-based risk assessment and customer behavior analysis. By considering the temporal dimension, it provides a more comprehensive understanding of risks and customer behaviors, empowering businesses to make informed decisions and develop more effective strategies for risk management and customer retention.

An Example, for 1000 people, considering 10 behavioral variables.

  1. Calculate the Default Probability or Propensity.

Source: own elaboration based on synthetic data and R code
Source: own elaboration based on synthetic data and R code

The recommended model depends on the evaluation metric and the specific goals of your project. In the provided summary, various evaluation metrics (MAE, RMSE, and Rsquared) are presented for different models: glm (Generalized Linear Model), tree (decision tree), Random Forest, SVA, and XGBoost. The choice of the model can be based on the metric that is most relevant to your use case. Here are some general considerations:

  1. Mean Absolute Error (MAE): The “tree” model has the lowest MAE, suggesting it is the best model in terms of making predictions with the lowest average absolute error. If minimizing absolute error is critical for your application, this might be the model of choice.
  2. Root Mean Squared Error (RMSE): In terms of RMSE, the “tree” model also has the lowest value. If minimizing root mean squared error is important, this model is a strong candidate.
  3. R-squared (Rsquared): The “Random Forest” model has the highest Rsquared, indicating that it explains the most variance in the data. If you want a model that fits the data better in terms of explained variability, this might be the best choice.

Model selection also depends on other factors such as interpretability, model complexity, and available computational resources. Here are some general recommendations based on the provided metrics:

  • If you are looking for a simple and interpretable model, consider the “tree” model. Although it doesn’t have the highest Rsquared, it has low MAE and RMSE, suggesting good predictions in terms of absolute and squared errors.
  • If you prioritize the ability to explain variance in your data and are willing to trade off a bit in terms of absolute errors, choose the “Random Forest.”
  • If you have specific performance requirements or limited computational resources, select a model that fits your constraints and offers a suitable balance between accuracy and complexity.

Ultimately, it’s important to consider the needs of your project and the metrics that are most relevant to your application before choosing the right model.

So far, this is an analysis model for statistical inference and traditional prediction modeling (I didn′t show it in the example). Whether it's for a credit scoring model, propensity model, churn model, purchase prediction model, recommendation model, breakage model, asset value prediction model, among others. However, it is not yet finished.

Now, let's complete the analysis for better decision making.

2. Calculate the Moment of Default or Propensity.

Source: own elaboration based on model MTLR

Based on the output of the survival regression analysis (MTLR in this case), we can draw some general conclusions:

  1. Annual Income: Annual income has a significant impact on survival over time. As annual income increases, survival tends to improve, as the weights are negative at most time points, indicating a protective influence.
  2. Credit Card Balance and Total Debt: These financial variables seem to influence survival. Weights indicate that a higher credit card balance and total debt can increase the risk of adverse events.
  3. Credit Score: Credit score is also a crucial factor in survival. As credit score increases, survival tends to improve, as the weights are negative at most time points, suggesting a lower probability of adverse events.
  4. Age: Age appears to influence survival over time. The weights for age are positive at various time points, which could indicate a higher risk of events at older ages.
  5. Marital Status and Education Level: These categorical variables also impact survival. Weights may vary at different times, suggesting that their influence can change over time. For example, divorced individuals may have a different risk profile compared to singles.
  6. Employment Tenure and Credit Limit Usage: These variables also affect survival. Weights may vary, indicating that their influence can change over time.

Therefore, now you not only know the Probability of Default or Prepension (and the variables that have the most weight for that default), but we also have the time in days for the Default or Purchase to occur. In addition to knowing which variables affect positively and negatively.

This is explained in my new book.

要查看或添加评论,请登录

Diego Vallarino, PhD (he/him)的更多文章

社区洞察

其他会员也浏览了