20 Risk Clinic - Adrift in a sea of data
Own Picture

20 Risk Clinic - Adrift in a sea of data

Model monitoring is a component of the standard model (risk) management cycle that will only gain importance as financial institutions implement more machine learning, data science and AI applications. It will increasingly (have to) complement model validation as an essential pillar of model risk mitigation, as one starts using more opaque and complex models and driven by the need to timely react to rapidly evolving operational circumstances, e.g. through more frequent model recalibration.

Model monitoring (in risk models in a narrow sense) can be defined as the continuous backtesting of the model performance as the data comes in, to ensure it keeps on being fit for purpose. How often do we see a loss in the trading book exceeding the VaR calculated at the close of business the day before? How does the observed default rate (for a set of debtors with a similar risk profile) correspond to the forecasted default rate (for the expected point in the business cycle!)? How do the true positive/false positive rates compare between the male and female population e.g. in lending decisions? The PRA’s supervisory statement 1-23 defines model monitoring as the ongoing testing to confirm that the model continues to perform as intended, referring back (at least) to the criteria that were judged relevant during model development. All the above examples can be labelled outcome monitoring. More often than not, this outcome monitoring will lead to the need to understand the evolution of the inputs to the model (input monitoring), both in order to assess the need to change the model, or to actually identify the management steps one can take on the back of the monitoring signal. This risk clinic does nothing more than rephrasing the above few observations in formulae.

Another aspect of monitoring is the frequency at which it happens. In the examples above, actual transactions accrue too slowly to draw statistically meaningful conclusions on an ongoing basis (e.g. VaR backtesting). This does not mean however that one VaR excess when observed is not a meaningful signal, it definitely triggers higher alert, but it does not create in an of itself a challenge to the model. In contrast to this, model driven process (risk) management such as fraud and transaction monitoring, automated decision-making in the context of consumer credit, FX hedging over the weekend for international payment providers, algorithmic trading, … require more frequent and real-time monitoring as the operating environment can evolve quickly, potentially requiring recalibration or even more fundamental remodelling. Moreover, in those more real time environments, monitoring can be linked to escalation triggers in the operational processes, e.g. to ensure real time human intervention when the model hits an instance outside its remit, or to shut down a trading algorithm that goes rogue.

Credit risk PD models are often presented as modelling challenges where a probability (of default) is estimated in function of explanatory variables or features, for instance using a logit functional relation between a linear combination of the features and the default probability. This presentation hides the fact that we are in reality taking a view (through the model) on the joint distribution of the features and the target variable (e.g. default, fraud, …):

Simply said, if we are applying our model to newly presented instances to estimate the target y, we tacitly assume that they are drawn from roughly the same population that the training sample hopefully adequately covered, and that the relationship between the features and the target is stable. Monitoring aims to understand this drift of the joint distribution. Sometimes the name dataset drift is used for the change of the joint distribution[1].

Mathematically, Bayes’ theorem helps distinguish the two components:


The joint distribution is determined by the distribution of the features, and the likelihood that any set of features lead to the target event being realised.

Modelling in fact means that we generalise (or approximate ….) the conditional probability representing the relationship between features and target:

Decomposing the joint distribution this way, allows to distinguish different components of monitoring, and naturally leads to the questions to address in the monitoring. When the model supports a decision with a human-in-the-loop making the final call, monitoring should look at both the model output only, and the value as overridden by the human decision maker.

Covariate drift refers to changes in the distribution of the covariates, or the features, or the risk drivers in a risk context, of the population (more precisely, of the sample you get through the door). It pertains to changes in

A popular metric in the world of credit models, is the PSI, population stability index, where the distribution of a current sample over a binned continuous variable is compared to the distribution of the training data set over that same variable. The variable is often chosen to be the actual score coming out of the model, aggregating the changes in the different covariates into their impact on the score, and so using the actual model in the test. Mathematically, the PSI is close to the information value that is equally prevalent in credit modeling, they are both symmetrised forms of the Kullback-Leibler divergence. It suffers hence from the same drawbacks, namely dependence on the actual granularity of the binning and a sparsity of literature on what actually represent good threshold values of the PSI on which to base a decision.

There are though relevant monitoring tests one can (and should) do on the covariates themselves, without using the model, such as monitoring their marginal distributions, their volatilities and correlations, the characteristics of the missing data (i.e. are data issues evolving and are they in any way systematic?), … These are prerequisites to assess whether degrading model performance is due to changing populations or a changing relationship between the covariates and the target.

Also, checking whether any given instance to which the model is (going to be) applied, is not an outlier relative to the training data set, should be common practice to ensure that the model is applied in its domain of applicability on an ongoing basis. Definitely in a ML/AI world, the role of continuous monitoring includes ensuring this real time guardrail.

Concept Drift or Model Drift happens when the relationship between the covariates and the target changes. I.e.


has changed but the covariate distribution P(x)?has stayed the same. This will generally show up as degrading performance of the standard model characteristics.

Finally, there is prior probability drift, a shift in the likelihood of the target itself (think of overall higher default rate during an economic downturn). This means that

has changed. Again in credit modelling, notice that the split that the EBA set out between risk differentiation (the concept or model) and the risk calibration (prior probability drift) fits nicely in this thought framework. The PiT and TtC debate in IFRS 9 or CECL credit models could perhaps be phrased in a more controllable way in this framework too. There, on the contrary, some of the time dependent probability of default is often captured in the model/concept, adding macro economic variables to the covariates.

Notice we left it vague whether the distributions we are handling are merely empirical, defined by histograms, or whether assumptions are made on the shape of the distribution(s). In the latter case, any monitoring (and validation!) should be cognizant of the assumptions made in specifying a shape. What if shape of distribution changes?

In a podcast on climate modelling, I recently heard Gavin Schmidt state something along the following lines, when asked whether the weather experience of the last years is evidence that the climate has become more erratic: “Definitely, the average temperature has gone up. And most places have less cold days than we used to. But to conclude whether this is because the entire distribution shifted upwards, or whether it actually has changed variance, is something for which we don’t have enough data yet”. Going from monitoring to inference is a delicate journey!

Appendix

We give here some key considerations of the PRA’s SS 1-23 on Model Risk Management where it comes to model monitoring.

-????????? Model monitoring is an activity that consists of several components that must be clearly defined and standardized to ensure it maximizes its value: criteria used to measure model performance, trigger/decision thresholds, systematic approach to recalibration or redevelopment, processes for root cause analyses to understand performance deterioration.

-????????? Within the three lines of defence governance structure, model monitoring can happen at different lines. For instance, monitoring the covariates of the population naturally sits close to the business using the model for its decisions. Assessing concept drift, likely is a more technical endeavour and can be lodged more in the 2nd LoD (for example …).

-????????? For third party vendor models, model monitoring takes on more importance still, given that black box models limit the model risk management that can be achieved through model validation.

-????????? The Supervisory Statement helpfully enumerates the objectives of model monitoring:

(i) ensure that parameter estimates and model constructs are appropriate and valid;??????? CONCEPT DRIFT

(ii) ensure that assumptions are applicable for the model’s intended use; DATASET DRIFT

(iii) assess whether changes in products, exposures, activities, clients, or market

conditions should be addressed through model adjustments (CONCEPT DRIFT), recalibration (PRIOR PROBABILITY DRIFT), or redevelopment, or by the model being replaced; DATASET DRIFT

(iv) assess whether the model has been used beyond the intended use and whether

this use has delivered acceptable results. COVARIATE DRIFT

-????????? A range of tests should form part of model monitoring, including benchmarking, sensitivity testing, analysis of overrides, parallel outcomes analysis.

-????????? The lessons learnt and conclusions from the monitoring should be reported at the appropriate levels in the organisation.

?

References

?

  1. Jose G. Moreno-Torres, Troy Raede, Rocio Alaiz-Rodriguez, Nitesh V. Chawla, Francisco Herrera, A unifying view on dataset shift in classification, Pattern Recognition 45 (2012) 521–530.?
  2. Bank of England PRA, Model Risk Management Principles for Banks, Supervisory Statement 1/23, May 2023.
  3. The European Banking Authority, EBA/GL/2017/16, 23/04/2018, Guidelines on PD estimation, LGD estimation and the treatment of default exposures.
  4. https://www.preposterousuniverse.com/podcast/2024/05/20/276-gavin-schmidt-on-measuring-predicting-and-protecting-our-climate/ . Global Climate Modelling, … that’s an area where model risk really matters.


[1] While the different types drifts are readily found back in the literature, there is very little consensus on the terms used to describe them. I have followed here reference 1. as this seemed a sensible proposition.

要查看或添加评论,请登录

Frank De Jonghe的更多文章