登录查看更多内容

Black Box Models : A Governance Approach

Naeem Siddiqi

Author & Advisor | Credit Scoring | Climate Risk

发布日期: 2018年12月7日

The deployment of AI, and in particular Machine Learning (ML) has been a hot topic in almost all business sectors in recent years. In the banking industry, one of the most common use cases cited for these are credit risk/credit scoring. While ML models are being used in certain lending segments, the actual operational usage of ML and AI in credit scoring at present is not prevalent, at least not among banks which are regulated. Some of the main challenges cited include:

- Inability to explain complex models, and how their outputs are generated

- Governance and transparency issues causing perceived resistance from both internal and external parties around the usage of such models

- Increasing customer concerns on data privacy, and being able to understand and challenge decisions made by banks

In order to deal with above issues, many organisations are undertaking research work to explain/interpret the more complex modeling algorithms. Complete model explainability has been a key part of model governance, and an important one, historically. However, with the increasing usage of more complex models in the risk domain, this policy, as well as other rules around governance of models, may need to be re-visited. Recognising that the modeling itself is only part of a much larger process, a more holistic approach that takes into account all the steps in model development should be considered to get over the explainability issue. In fact, what is needed is a satisfactory approach to model governance that assumes that the model itself is not explainable.

In this article, I present a high level framework that can be used to allay some of the main concerns around model interpretability and governance.

Lets start by defining the process of model development across 3 broad areas (understanding that model development is a complex process that has many more tasks):

1. Data Management : the process of acquiring data, performing exploratory work on it, data cleansing, variable transformation etc. that takes place before any model fitting is done.

2. Model Training : the process of taking the above data and applying predictive modeling algorithms on it. This is the area where the usage of black box models and their interpretability causes the most issues.

3. Model Output : the predictions/output of the model.

A comprehensive approach to model governance would cover all 3 areas to provide end users with increased confidence around black box models. Let’s consider each of these model development areas in sequence.

Data Management

If the model itself is very complex and cannot be explained, there are several ways to avoid some of the problems this might cause by understanding its input variables.

These include :

- Making sure that the data is clean, free from significant biases, does not contain any variables that can cause legal or ethical issues, or have dubious causality. I have seen, for example, development datasets that contain variables such as ‘likes’ and ‘follows’ of various media outlets. These are obtained from social media accounts, and can cause problems. In multi-cultural countries such as the UK, Canada, US and Australia, immigrants may choose to like and follow ethnic media, or religious ones. If these variables enter a model, you may be making decisions based on race, religion, ethnic origin and language, which would contravene laws in many countries. If all the input variables are deemed clean and free from the issues cited above, we can then accept that all of them can be viable candidates for the model. This should allay some concerns even if the content of the model is unknown. We recognise of course, that removing some of these obvious problematic variables does not eliminate the issue altogether as there may still be other variables that can indirectly cause legal/ethical problems (e.g. merchant codes for credit card transactions).

- Perform various tests to ensure that variables have explainable relationships to the target. This can be done via variable ranking algorithms as well as Weight of Evidence based groupings. The latter is useful in that it shows the nature of the relationships, as well as the statistical correlations, and can help get buy-in from business users. While recognizing that the more complex algorithms in ML employ complex interactions and relationships that cannot be captured by the simple bivariate analysis, this can at least demonstrate that the input dataset being used does not contain any data that has counter-intuitive or unexplainable relationships to the target. If all the data used in the model is of reasonable strength and can be explained, this would again allay concerns around the usage of black box models.

The above 2 steps would result in an input dataset that has been accepted by all stakeholders, including model development, model validation and business users. While we may not know which ones end up in the final model, we would at least have confidence that whatever has gone in, is of good quality, unbiased, legal/ethical and has strong explainable relationships to the target being modeled.

Model Fitting

This is the step where data from the above is used to build models. Some modeling techniques such as regression are more explainable than others such as Random Forests or Neural Networks. In cases where direct explanation of the model is not possible, there are various techniques available that provide proxies, including:

- Surrogate Models : more transparent models (e.g. points-based scorecards) are built to predict the outcome of black box ones. These can be done at a global or local level. A commonly used local surrogate model is Local Interpretable Model Agnostic Explanations (LIME )

- Sensitivity analyses : Inputs into black box models are perturbed/changed and the impact measured in various ways including changes in the prediction and errors. These include methods such as Feature Importance based on variable perturbation, Partial Dependence plots, Individual Conditional Expectation plots and Shapley Value .

All the techniques described above rely on approximations and have their own advantages and disadvantages. In particular, the use of simple models to explain a much more complex one can be problematic. However, in the absence of 100% certainty, we can use a combination of several methods to give us some sense of the variables that may be in the model, and their relative contributions.

In addition to looking at the variables, one would of course follow all the usual model governance around out-of-sample validations as well as reviewing model fit statistics.

Model Output

After checking the data and model, the third part of the process is to look at the output and perform some simple tests on it. The results of these tests are meant to give us a sense of comfort around the robustness of the black box model developed. Some of these tests are standard practice for most models developed in banks.

- Benchmarking : Compare the results from the black box models with those from simpler, explainable ones. These can include comparisons of model fit statistics such as Area Under the ROC Curve, KS, Gini, AR and so on, distribution of cases by risk/score bands, and measures of false and true positives/negatives depending on the underlying business case for the model.

- Stability : Check the overall population stability of both types of models across time (Population Stability Index is a common measure for this). We can only measure overall stability as that of unknown model inputs cannot be calculated. The black box models should not be significantly more unstable.

- Backtesting : Check the performance of the black box and benchmark models on historical data, as well as stability of the predicted values against actual for the same. This can be extended to include impacts on capital/approval rates or other relevant measures over time. In credit risk for example, we want to make sure that the model will not result in large swings in such measures. This is in addition to the ‘out of sample’ validation done as part of the modeling exercise

The approach I have outlined can help in improving confidence in cases where the model algorithm is a black box and cannot be explained. By validating all the processes surrounding the model, it should be possible to use such models with some confidence.

Naeem Siddiqi is the author of Intelligent Credit Scoring (Wiley & Sons, 2017),

Jacob Kosoff

Data Science & Model Development Executive

5 年

Good points Naeem Siddiqi, especially around the importance of back testing, stability testing, and benchmarking.

1 次回应

Chiranjit Majumdar

Data Scientist at Point Duty

6 年

Sir this is a million dollar help you did for the industry. when evaluation approach comes from someone like you it is so easy to manage the questions. I believe the killer is Surrogate Model: Local Interpretable Model Agnostic Explanations (LIME) but I have not see any paper where it works on structured data (i.e. credit risk model). Is there any reference you have on that ? Thank you very much.

Peter Plochan, FRM

Partnering with ?????????????? & ???????? ?????????????????????????? to ???????? ?????????? ?????????????????? and ???????????????????????? | ???????????? ?????????????? & ???????? ?????????????? | SAS Technology

6 年

Very interesting piece, thanks Naeem for putting your thoughts together on this. While AI & Machine learning models bring new opportunities, we should not forget also the other side of the coin, the risks that come with these opportunities. And here we touch the Model Risk Management area.?? As a matter of fact, the increasing complexity of models & AI will be the number 1 challenge for Model Risk Managers in the long-run.??This is the outcome of our survey conducted with 25 representatives of leading european banks at our Model Risk Management Customer Connection in Frankfurt last week (details below). The ideas you bring forward here, can be the answers how the deal with extra workload with management of AI & Machine Learning models.

5 次回应

Prakash Bade

Credit Risk Modelling Lead,HSBC|| AI/ML || Model Risk Management || Quantitative Modelling

6 年

Very well summarized. Through advanced ML techniques we can also enrich modelling data because most of the time we will not have enough historical data(e.g LDP, Fraud and in AML) due to multiple reasons and ultimately it affects on Model accuracy and stability. But nowadays with help of advanced technologies and ML algorithms we can make that data(Audio, Video, Text) in readable form and can extract meaning information. As you mentioned it could be unethical to use those open source freely available data because it may affect on strategies. But you dont think we will miss huge behavioral data in Banking decisioing?. How do you think we can leverage this data in regulatory modelling environment?.?

1 次回应

查看更多评论

要查看或添加评论，请登录

Naeem Siddiqi的更多文章

Some good non-technical books for data science and risk folks

2024年5月2日

Some good non-technical books for data science and risk folks

These are some of the non-technical books I've enjoyed reading related to credit scoring - all written for general…

12 条评论
Different Levels, Different Problems

2023年8月4日

Different Levels, Different Problems

A few days ago I came across an interesting post (https://www.linkedin.

7 条评论
Best Management Advice I Got - From a Comedian

2022年2月21日

Best Management Advice I Got - From a Comedian

Mike McDonald, may he rest in peace, was a Canadian comedy legend. He was well respected and brought so much joy to so…

3 条评论
Impact of Climate Change on Credit Scoring

2019年11月20日

Impact of Climate Change on Credit Scoring

Two weeks ago, a study backed by 11,000 scientists confirmed that climate change is a “clear and unequivocal”…

11 条评论
Will Automation Reduce Modeling Skills ?

2019年3月16日

Will Automation Reduce Modeling Skills ?

An article in The Toronto Star on how recent airplane crashes are making some people re-think autopilot makes…

14 条评论

See all articles

Black Box Models : A Governance Approach

Naeem Siddiqi

Author & Advisor | Credit Scoring | Climate Risk

Naeem Siddiqi的更多文章

社区洞察

其他会员也浏览了

Secure and Transparent: The New Standard in AI-Powered Analytics with MicroStrategy

The Invisible Architect: How AI Generates Action from Data.

Harnessing the Power of Machine Learning for Market Microstructure Analysis

Three times for Prediction

The Impact of Machine Learning on Financial Modeling

Root Cause Analysis Use Case with the new O1 Reasoning Model

Building Our Predictive AI Model for Commercial Credit Risk

The Data and AI Life Cycle…Correlation?

Evaluating RAG Applications - A Deep Dive

October 02, 2024

Naeem Siddiqi的更多文章

Some good non-technical books for data science and risk folks

Different Levels, Different Problems

Best Management Advice I Got - From a Comedian

Impact of Climate Change on Credit Scoring

Will Automation Reduce Modeling Skills ?

社区洞察

其他会员也浏览了

Secure and Transparent: The New Standard in AI-Powered Analytics with MicroStrategy

The Invisible Architect: How AI Generates Action from Data.

Harnessing the Power of Machine Learning for Market Microstructure Analysis

Three times for Prediction

The Impact of Machine Learning on Financial Modeling

Root Cause Analysis Use Case with the new O1 Reasoning Model

Building Our Predictive AI Model for Commercial Credit Risk

The Data and AI Life Cycle…Correlation?

Evaluating RAG Applications - A Deep Dive

October 02, 2024