Machine learning and Artificial Intelligence can correct decades of inequity in financial services, if used responsibly
Asmau Ahmed
A few weeks ago, I applied for a mortgage, a process that thousands and thousands of women like me - ”divorced, immigrant and Black” - do every day. But this simple act of self-determination wasn’t always so straightforward. Shockingly, it wasn’t until 1974 that the Equal Credit Opportunity Act (ECOA) made it illegal for lenders to discriminate based on race, sex, marital status, religion, age, place of birth or prior residence, or assistance from social or public services. Until then, women were often denied access to credit in their own names; they could only do so through their husbands, a particularly bewildering and unreasonable predicament for any woman who was single, divorced or widowed.?
?
While long-awaited new laws began to widen the path to opportunity and access, they could not reach back across history. They could not halt the biases of loan officers or reverse hundreds of years of state-sanctioned and covert institutional discrimination that denied people equal pay, education and the loans necessary to build wealth. By the time these laws were enacted, the cycles of wealth and poverty had already become entrenched in the system, and the lived realities across groups could not be so easily erased.?
The flywheel of who gets wealthy and who stays poor was set in motion a long time ago and it’s still going
The rapidly increasing use of Artificial Intelligence (AI) and Machine Learning (ML) in the lending industries presents an opportunity to stop and reverse the flywheel, but without the placement of thoughtful controls, it could also very easily accelerate it. ML holds both promise and disaster.?
?
To design machines for a truly accurate, fair and ethical outcome, we first need to understand the origins of historical data used to train and build lending and underwriting decision models. Critically, we also need to correct for those obstacles and inequitable outcomes that live and thrive in the data used to make lending and underwriting decisions, even today.??
?
First, let’s take a minute to understand how predictive modeling works, and how it’s used in credit underwriting today. Predictive modeling is the practice of learning patterns from data in order to make predictions. Machine learning (ML)? is a particularly interesting and common set of modern predictive modeling methods. Unlike traditional methods that have been used in economics and statistics for decades, machine learning takes into account vastly more data, allowing it to make more greatly improved, accurate assessments. The process is straightforward: you feed massive amounts of data--on criteria such as income, credit utilization, number of on-time and late payments--along with outcomes you want to predict, for example, whether a loan was repaid--into a learning algorithm. The algorithm learns patterns in the data, predicts the outcome (in this case, whether someone repaid their loan) and creates a model that reflects those patterns.? When an individual applies for a loan, the model, based on the patterns the algorithm learned, assesses the likelihood the loan will be repaid and determines whether to grant or deny the loan.? For example, if you have sufficient down payment and your likelihood of repayment is above 99% as assessed by the model, you might be automatically approved.? The most relied upon model in banking is the FICO? score, which is used in all kinds of high-stakes decisions.
??
FICO, as designed, is flawed, at best.?
?
The FICO score, a credit scoring model designed by the Fair Isaac Corporation, was introduced in 1989 as an attempt to move past the overt and covert biases of loan officers who make lending decisions at their discretion. FICO’s website reads “[t]he FICO Score replaced hunches with calculations, and took prejudice out of the equation, literally. The score’s criteria for evaluating potential borrowers are focused solely on factors related to a person’s ability to repay a loan, rather than one’s ZIP code or social status.” In other words, FICO asserts that blind mathematical calculations, essentially algorithms, are impartial as compared to the biased human alternative.?
?
FICO’s algorithm considers five essential factors: debt owed (30%), payment history (35%), length of credit history (15%), credit mix (10%), and new credit (10%). At first glance, this seems both clear-cut and faultless, an absence of discriminatory undertones. However, at closer look, those same historical lending and wealth disparities manifest even here. Groups that have historically been denied access to credit are very likely to have little to no credit history over generations, new credit, or credit mixes. Payment history is determined on products that are traditionally less accessible to historically marginalized groups - like mortgages, auto loans, and retail credit accounts. Better representations of credit worthiness like cell phone, utility, and rental payments have historically been excluded. Additionally, the payment histories of those without large assets for such expenses as cell phone, utility, and rental payments are often excluded from consideration, eliminating better representations of credit worthiness and financial reliability.?
?
A 2021 NPR article reported that the average Black borrower’s credit score is about 60 points lower than the average white borrower’s. Lower scores translate to stricter loan terms and denials, For example, In 2018, the Reveal Group released a study that found that, in some neighborhoods, African Americans were denied mortgage loans 2.7 times more often than whites. Bloomberg reported in 2020 that there were stark disparities between whites and even the wealthiest African-American applicants, who were approved half as often for mortgage refinance loans by a large bank.?
?
Individuals deemed less creditworthy are subject to predatory loan terms, including punitive “payday” loans ”short term unsecured loans with exceptionally high interest rates required to be paid when the borrower receives their next paycheck” that keep under-resourced people in vicious cycles of poverty. This cycle of poverty, low pay and seemingly vindictive financial products results in keeping poor people poor, perpetuating a debt and poverty cycle that can last generations.
ML / AI can correct these inequities.
?With ML, we can make our financial system substantially more fair than it is today“without impacting bank profits or the stability of our economy.? AI provides the ability to run simulations of numerous different decision algorithms trained in many different ways, in order to find the one that is fairest. In addition, as mentioned above, machine learning algorithms allow us to create models that consider vastly more data than ever before, allowing predictions to be more accurate. Traditional methods utilized five to ten factors. Machine learning models can consider hundreds or thousands of factors, if not more. One Google language model remarkably uses over 1 trillion parameters. Importantly, accuracy is essential as loans are granted to people who need the ability to repay them. The number of algorithms, data augmentation techniques, and computational resources available today all tremendously exceed what has historically been possible, presenting new and powerful opportunities. Engineers that choose to ignore such advancements run the risk of amplifying the systemic bias in the historical data used to train models.?
Best practices to consider.
It’s important for corporations, institutions and boards to identify and mitigate bias. Teams working to create a more honest model can engage in best practices from enumerating the fairness risks that might be present in their model all the way up to modifying training algorithms to penalize unfair behavior.? Change can be daunting in any ecosystem, so it’s important to start thoughtfully and with the basics.?
Understand the fairness risks.
?
A good first step is to simply write down the tasks and problems that a model is meant to solve (and not solve), the fairness risks for the model and how likely those risks are to occur. Having this kind of foundational statement in place begins the journey of quantifying fairness problems and ultimately mitigating them.??
?
Use tools like datasheets and model cards to surface fairness concerns.
?
Once the fairness risks and tasks of a model are well understood, the second step is to produce datasheets for the data that were used to train the model. Datasheets are a standardized, relatively non-technical format for reporting on the characteristics of data. Producing a datasheet requires teams to walk through a basic understanding of their data, and be able to clearly communicate that understanding to others. Often, simply producing the datasheet will surface fairness concerns that might not have been visible before (e.g., representation of different groups in the data, missing data, errors in the data). Resolving these concerns then becomes the next step on the fairness journey.???
?
Similarly, it is good practice to produce a model card-- for the same reasons as it's a good idea to produce a datasheet. The model card is a standardized, low-tech, reporting framework to outline what a model does, how it was trained, and how it is evaluated. Again, it often is the case that simply producing a model card surfaces concerns that might not have been obvious previously (e.g., errors in how model evaluations were calculated).?
?
Use representative data.?
?
领英推荐
AI blindspots are particularly problematic in lending because most of the data is about white, male borrowers, and machine learning models have the potential to amplify bias in data. Ensuring that those datasets used to train models have adequate representation of gender, people of color and other under-represented groups helps to mitigate these blind spots.
?
This is especially important in financial services where 45 million Americans have no credit report or have too little data to generate a traditional credit score. Finding predictive data sources and data about underrepresented borrowers is central to improving equity. However, because many people from historically marginalized groups fear that disclosure will be used against them, they decline to provide their information resulting in a catch-22.
?
Equally important is representation in data used to validate or test models. Many consider accuracy of model predictions to be the most important metric, but accuracy can only be measured against ground truth. When the ground truth data used to evaluate models is not representative of everyone, a false sense of security can be created, and underrepresented groups can suffer as a result.??
Have a fair lending analytics team, separate from the model developers?
?
Some financial services companies believe in considering demographic and identity information to address bias, while others believe in blinding the data to avoid bias.? When demographic information is available, it makes it possible to test for disparate treatment and disparate impact, so that biases can be identified and mitigated. However, it is illegal to use demographic attributes, or variables that could be considered close proxies, in the model itself. For example, it’s illegal to deny credit based on race. This is considered disparate treatment under the ECOA and is prohibited by law. However, it is best practice to use race to test if the model is representative, or if the model is discriminating based on racial identity.
?
Some firms manage the potential for bias entering into the model development process by ensuring model developers are unaware of demographic attributes of the people represented in the data. The concern is that knowledge of sensitive demographic characteristics can allow unconscious bias to influence the model development process in subtle ways, even when modelers are trained to avoid it.??
?
In more overt cases of bias, variables given weight by some models include things like poor use of capitalization, writing in all caps, or poor spelling, negatively affecting other groups such as people with dyslexia and immigrants learning English as a second language.?
?
In financial services, a best practice is to have a separate fair lending analytics team trained to handle demographic information with care, to use it responsibly to evaluate models and suggest how they may be improved to create more demographic parity.???
?
It’s also important for the analytics team to review attributes used in predictive models in order to see whether they are correlated with a protected status. Using predictive models in the lending field can be fraught with issues such as the use of education information in models, which can inadvertently be used to identify an applicant’s gender or race and ethnicity. For example, applicants who went to a historically Black college could be viewed as higher risk by models, which would be a clear violation of fair lending laws.?
?
Identify the drivers of disparity with rigorous explainability models
?
Early model explainability methods attempted to identify how machine learning models cause differences in outcomes, but these methods were of limited usefulness because they lacked mathematical rigor. More rigorous methods have been developed which are able to accurately identify drivers of disparities by leveraging the mathematics of Nobel prize-winning game theorists. Using rigorous explainability methods is required in high stakes applications of AI like financial services.
?
Take the bias out of models
?
Yet another vital task in building AI models is figuring out how to optimize for fairness by identifying and removing bias out of models. Zest AI, for example, has developed adversarial debiasing, a technique where two machine learning models compete with each other in order to arrive at the most truthful and accurate assessment. The first model predicts creditworthiness and the second tries to predict the race, gender, or other potentially-protected class attributes of the applicant scored by the first model. This competition between models improves the methodology until the predictor can no longer distinguish the race or gender outputs of the first model, resulting in a final model that is both accurate and fair.?
?
Another technique is the use of synthetic counterfactuals, where a model is trained using synthetic data that might not ordinarily have existed in its training material. For example, we can train an image-understanding model to recognize that both men and women can be CEOs by making synthetic images labeled “CEO” that are identical except for the gender presentation of the CEO. By using counterfactuals, machine learning training materials can move from the use of historical data--which are rife with the historical blind spots, biases, and inequities therein-- to aspirational data, a circumstance that reflects a world-that-could-be: fair, equitable, and truly representative.
?
Augment the data with better predictors of credit-worthiness
?
Fast-growing financial services technology companies like Zest AI and Upstart assert that it is possible to use the power of machine learning to consider thousands of additional variables in making lending and underwriting decisions that determine creditworthiness. It could, for example, determine creditworthiness utilizing those non-traditional, heretofore ineligible, payment histories of services used by the everyday American--rental payments, cell phone, W-Fi, and utility payments. In so doing, millions of people would be closer to achieving solvency.
??
While this is certainly a best practice, care needs to be taken so that the additional variables considered by machine learning models are not also hidden proxies for race. Additional variables currently given weight in some ML models, for example, are academic performance, area of study, work history, and schools attended. At face value, these variables can seem non-discriminatory and may help open up credit opportunities to people from marginalized groups. However, they can also be problematic as they may actually be hidden proxies for race. For example, a credit risk model that considers which university you attended might appear neutral, until you consider that some universities only admit women or that historically Black colleges and universities have higher concentrations of African American students.? The same historical inequities that make status quo credit scoring unfair may also apply in these cases.?
?
Hire a diverse team
?
Crucially, there can be no real solutions or best practices without diversifying not just the data, but the people behind the data. This can be achieved by the essential imperative to have diverse teams of data scientists, computer scientists, engineers, and all who work behind the scenes to train models. Without that, AI blind spots are nearly impossible to find. Take one example, from Zest AI which hired a Black data scientist, Kasey Matthews, who noticed that she was often being misclassified as white by the BISG race estimation algorithm. Other people at the firm hadn’t noticed this fact. But because she did, she was able to address the problem and introduce a new race estimation method that significantly reduces misclassification rates. Her algorithm, ZRP, is now being tested by some of the world’s largest lenders to improve race estimation in their fair lending reviews.??
?
Add fairness to your board agenda
?
Finally, board members and C-suite executives have a fiduciary duty to ensure that their AI models, processes, and outcomes are fair and equitable. In addition to being humane and ethical, fair AI systems positively impact the corporate bottom line, saving money. A single fair lending enforcement action can cost a bank nearly 100 million dollars in fines alone, not including litigation costs and other penalties, to say nothing about the costs incurred for sullying their brand and reputation. It would be wise to create teams that audit all AI systems and machine learning data for fairness. The additional costs to the corporation would likely be significantly less in the long-term than the costs of not taking action. In this way, it is both ethical and profitable - a win for both corporations and society.
Castleigh Johnson