Making Credit Scoring “fair”
Nigel Kingsman, Holistic AI - 16th February 2022
What is fairness? A seemingly simple question and one for which there is no “right” answer. However, broadly speaking, many commentators would understand fairness to represent an individual having equal access to resources or opportunities irrespective of that individual’s socioeconomic status, gender, race, etc.. However, despite the imprecision with which we define fairness, it has nonetheless been an ideal that humans have petitioned for throughout history, leading to the UN’s Universal Declaration of Human Rights, an overarching set of principles promoting non-discrimination across a number of attributes including race, colour, sex, language and religious beliefs, amongst others. Such principles have been enshrined in law across a number of countries. In the UK, the Equality Act 2010 consolidated earlier acts whilst extending anti-discrimination law to cover nine protected characteristics.
The “slipperiness” of defining fairness has resulted in the scientific research community proposing a range of mathematically precise definitions of bias (we think of ‘bias’ as the quantifiable cousin of ‘fairness’). Looking across the literature, we see those bias definitions primarily falling into three groups:
Equality of outcome (which focuses on results being the same across privileged and unprivileged groups) and equality of opportunity (ensuring that privileged and unprivileged groups have the same opportunities to achieve positive results) are already well understood notions as part of the fairness debate. Equality of performance, however, seeks to ensure that the performance of a system or process is consistent between groups, with pulse oximetry measurement providing a good example of divergent performance between racial groups.
When it comes to systems that use artificial intelligence (AI) technologies, ensuring that such systems are “fair” has, together with other concerns around AI, become a priority for research and debate. With AI systems having a range of stakeholders, including persons directly impacted by such systems, system owners, regulators, the media and the public at large, it can be envisaged that there will be a desire for a number of notions of equality, captured via different mathematical definitions of bias, to be satisfied with respect to a number (and possibly all) of the protected characteristics.
This raises the following question: what happens to an AI system when we impose a number of notions of fairness over a number of protected characteristics?
To answer this question, my Master’s thesis investigated how performance of a credit scoring AI system is impacted as we seek to make the system free of bias simultaneously for a number of definitions of bias over a number of protected characteristics. Credit scoring is a particularly relevant domain for such an investigation given both the wide adoption of AI technologies in the credit scoring industry and the classification by the European Commission that AI credit scoring systems are “high-risk”, subject to mandatory requirements for trustworthy AI together with assessment procedures.
Over a number of credit datasets, it was found that, as we imposed each marginal equality requirement, the credit scoring AI system’s accuracy progressively worsened. As such, each time an additional bias was removed from the system, the system’s ability to correctly predict whether an individual was a good credit risk or a bad credit risk suffered. In the most extreme case, we observed the imposition of three equality requirements as reducing the accuracy of the system from a level greater than 90% to a level below 70%. Moreover, we found that, for some datasets, it wasn’t actually possible to arrive at a system for which bias had been fully removed across the three mathematical formulations of bias that we investigated (which covered the three groups of bias definitions discussed earlier) and two protected characteristics simultaneously.
We can see, therefore, that seeking to remove bias across the majority of the nine protected characteristics of the UK’s 2010 Equality Act, and doing so for a collection of definitions of bias, could lead to even more extreme reductions in system performance or, indeed, no system at all!?
Where does this leave the credit scoring industry? Does this result in credit providers simply charging more to offset for the expected losses (more lending to those who might not be able to pay back the loan) and forgone profits (reduced lending to those who can pay back the loan) resulting from de-biasing their credit scoring AI system? Doing so would necessarily make credit less attainable for everyone and likely regressive.
Or, does this result in a spectrum of credit providers some of whom de-bias their systems less whilst charging less for loans, and some of whom de-bias their systems more with a commensurate higher loan charge? Such an outcome would effectively funnel disadvantaged groups to the more expensive providers, and thus the credit landscape has not become more fair, rather the manifestation of the lack of fairness has simply been transformed.
Or, do we encourage the creation of new credit datasets? Datasets where the ‘outcomes’ are not historic credit decisions (invariably made by humans, and often marred by bias, unconscious or otherwise) but are instead the outcome (paid back or not paid back) of loans that were granted.?
At present, credit providers are struggling to generate these datasets as their credit scoring AI systems are trained on past data, and thus fairly-priced loans (without a premium attached) are withheld from disadvantaged groups.
So, we finish with the following questions:
These questions, and many others, are a subject for further research and debate.
Nigel Kingsman ([email protected]) forms part of the Solution Delivery team at Holistic AI. He previously spent over 20 years across a range of derivatives trading and structuring roles within the investment banking industry, most recently with a focus on highly-structured fund-linked and fund-wrapped product. He holds a Master of Mathematics degree from the University of Durham and a Master’s Degree in Machine Learning from UCL.
really good article Nigel Kingsman !