Machine Learning and AI for the Legal Professional
Mihnea Constantinescu
Deputy Governor | Economic Analysis and Policy | Machine Learning | Digital Transformation
based on the 2024 Latvian Law Institute Guest Lecture (https://www.lti.lv/jaunumi); article originally published in the October 2024 Latvian Supreme Court Bulletin
Introduction
The journey of machine learning and artificial intelligence towards becoming digital extensions of our eyes, ears, and brain has been a long and gradual process, marked by decades of incremental yet unsteady progress. This technological evolution began slowly, with early ML research in the 1950s and 60s laying the theoretical foundation but struggling to produce practical applications. For decades, the field progressed in fits and starts, alternating between periods of optimism and "AI winters" where progress seemed to stall. The development of more sophisticated sensors – our digital eyes and ears – was similarly incremental, with early versions being expensive and limited in capability. Computing power, while growing exponentially as predicted by Moore's Law, took decades to reach the levels necessary for human-useful applications.
Perhaps most significantly, the refinement of algorithms capable of replicating aspects of human thinking was a painstaking process, building upon and drawing from generations of research across multiple disciplines. It's only in recent years that we've seen these various strands of development converge and mature, finally accumulating to a stage where AI systems can meaningfully complement human capabilities. Today, after this long gestation period, we have AI that can see (through advanced computer vision), hear and talk (via sophisticated speech recognition), and think (through complex data analysis and decision-making algorithms) alongside us, expanding the boundaries of what we can perceive, understand, and accomplish. AI is increasingly shaping professional practices, the legal, financial, public policy all the way to the medical field, from autonomous navigation on the battlefield to predictive justice.
Formally, machine learning (ML) is a subset of artificial intelligence that focuses on developing algorithms and statistical models that enable computer systems to improve their performance on a specific task through learning. These models are designed to recognize patterns in data (sounds, image or text) with applications in forecasting, making decisions regarding optimal choices in the face of complex cost-benefit profiles, and prediction of outcomes in individual or group settings.
At its core, machine learning is about parameter estimation. When we say a model "learns," we're referring to the process of adjusting its parameters to better fit the data observed. This process is more akin to a statistical exercise than human learning. The term "learning" is thus an abuse of terminology in this context, a purposely twisted association on the lowest denominator. But what do machines learn and how can this be useful? This article will introduce the basics of machine learning and artificial intelligence via legal analogies
Can Machines Learn?
This section will introduce the usefulness and risks of using a machine learning model in a concrete example, assessing salary fairness. This will illuminate the non-trivial interplay between technical aspects of a machine learning model that understands the world as a series of 0s and 1s and the legal practice, where words carry subjective meaning shaped by the listeners moral values.
Machine learning models offer a versatile set of tools for our example of examining salary fairness due to their ability to analyze vast amounts of data and identify patterns that might elude human observers. Where a manual analysis might be limited to a handful of cases or rely on broad generalizations, a well-designed machine learning model can process thousands or even millions of salary data points, accounting for numerous variables such as experience, education, performance metrics, and demographic factors. This comprehensive approach allows for a more nuanced understanding of salary structures across an organization or industry, potentially revealing subtle biases or discrepancies that traditional methods often miss.
The efficiency of these models also makes it feasible to conduct regular, large-scale analyses that would be prohibitively time-consuming and costly if performed solely by human analysts. However, the power of these tools comes with significant risks. If the underlying assumptions of the model are flawed, or if the training data is not representative of the broader population, the results will be useless or worse, they would reinforce existing biases. For instance, a model trained on historical salary data might perpetuate past discriminatory practices if those practices are embedded in the training set. Similarly, if certain groups are underrepresented in the data, the model's conclusions about fairness may be skewed. The rigorous scrutiny of the model's assumptions, careful validation of the data's representativeness, and a critical examination of the results in the context of broader social and ethical considerations require that the legal professional, understands in broad lines the technical limitations of a model in light of the existing legal framework.
In machine learning, a model is akin to a legal framework or statute. Just as a law simplifies complex social interactions into a set of rules, a machine learning model simplifies reality into a mathematical representation supposed to show the relevant while purposely ignoring the irrelevant. For instance, a salary prediction model, like our linear regression example below, is similar to a sentencing guideline that considers various factors to determine an appropriate sentence. Already here we see the conceptual limitations of a machine learning model – reality is shaped by the question at hand. If the goal is to understand fairness (defined eventually by the legal professional), then relevant and irrelevant is defined in terms of this goal. It often happens that the irrelevant (excluded from the mathematical model for reasons of efficiency) end up being relevant but not accounted for.
Variables are the inputs to our model, similar to the facts of a case in law. In our salary fairness example, variables might include years of experience, education level, or job performance metrics. Just as different facts can influence a legal outcome, these variables influence the model's prediction. Along with variables, we also have parameters. These are the components of the model that are computed from data, analogous to precedents in case law. Just as legal precedents are refined over time with new cases, these parameters are adjusted as the model is exposed to more data.
The process of determining the best parameters for a model is called training or learning. This is similar to the process of legal education and gaining courtroom experience, where one learns how to interpret and apply the law to various situations. When a trained model is applied to new data to estimate an outcome, it's making a prediction. This is analogous to applying established law and precedents to a new case to determine its likely outcome.
Let's consider the linear regression model, the simplest of ML models, to predict salary based on years of experience. This model indicates there is a relationship, assumed to be linear (see Figure 1 below), between how many years of experience a worker has and their salary. The mathematical formula is already structuring the reality of the relationship and indicating the relevant variable determining salary is the number of years of experience.
Given we expect more years of experience to be associated with a higher salary, the parameter ?will be a positive number but this needs to be calculated using actual data – that is, the model needs to be trained. Salary is the dependent variable we're trying to predict; Years of Experience is the independent variable (also called feature). The key parameters are
In this model, β? and β? are the parameters that the model needs to identify. Mathematically, this model is expressed as ?
?
The "learning" in machine learning refers to the process of finding the best values for the model's parameters (in our case, β? and β?) to make accurate salary predictions. This process aims to make the model's predictions as close as possible to the actual salaries in our dataset
The model begins with initial guesses for β? and β?. These could be random numbers or zeros but most of the time they will be sample averages of Salary and Years of Experience. Using these initial values, the model predicts salaries for all employees in the dataset and compares its predictions to the actual salaries. The difference between predicted and actual salaries is called the "error" denoted in our model as .
Based on these errors, the model slightly changes β? and β? to try to reduce the sum of all errors in the sample. This process is repeated many times, each time the model aiming to get a little bit better at predicting salaries. This process continues until the improvements become very small or a pre-set number of attempts has been made.
The goal of this process is to find the values of β? and β? that result in the smallest overall error in salary predictions. In technical terms, this is called "optimization" - finding the best solution to a problem. The specific type of optimization used here aims to "minimize" (make as small as possible) the sum of the squared errors[1] for all predictions. By going through this learning process, the model gradually improves its ability to predict salaries based on years of experience. The final values of β? and β? represent what the model has "learned" about the relationship between experience and salary in the given dataset.
A numerical example will indicate how this may be used. Suppose at the end of training the model, the following parameters are obtained:
We can then predict Salary as a function of Years of Experience by simply replacing the number of years of experience in the formula and conducting the calculations. A young graduate with no experience would earn on average 1000 Euros per month (1000 + 50*0) whereas an experience professional with 10 years’ experience would earn on average 1500 Euros per month (1000 + 50*10). Figure 1 shows the sample (as scatter plot) and the trained model as the dotted line. Observed deviations from the average are the errors made in prediction. For the same value of Years of Experience, the model always return the same average Salary. Even if the graduate has a formal degree or no degree whatsoever. If for an employee with formal education and no experience we observe a salary level that is higher than the salary of an employee without formal education and no experience, does this mean the first one is unfairly high as compared to the second one?
领英推荐
?Machine Learning Limitations
Can we employ this model to judge fairness of salary allocation in a company? There are many concerns related to both how the model operates as well as to the underlying data. In technical jargon, many modeling assumptions must hold for the model parameters to represent the relationship between the two variables. The model limitations become quickly apparent once we observe that compensation is related to many other variables such as years of education, tenure with the company, performance as indicated by yearly evaluation. We are omitting from the mathematical representation important variables that play a role in establishing fairness of salary.
In the area of machine learning, omitted variable bias is a critical concept that has a clear parallel in legal reasoning. Imagine a judge who, in a complex case, bases their decision solely on one piece of evidence while ignoring other relevant factors. This oversimplification could lead to a miscarriage of justice, much like how omitted variable bias can skew the results of a machine learning model. In our salary fairness context, our model predicts fair compensation based only on years of experience. While experience is undoubtedly relevant, this single-variable approach fails to account for other crucial factors such as education level, job performance, or specific skills – akin to a judge overlooking key testimonies or pieces of evidence.
The consequences of this omission are severe: the model will suggest that a salary disparity based on experience alone is fair, when in reality, it is masking underlying discriminatory practices related to education access or performance evaluation biases. Just as a fair legal process requires consideration of all relevant facts and contexts, a fair algorithm must account for a comprehensive set of variables that influence the outcome. Omitting critical variables not only reduces the model's accuracy but can also perpetuate or even exacerbate existing biases, leading to decisions that appear data-driven and objective but are fundamentally flawed. Therefore, when developing models for sensitive applications like salary fairness analysis, including all relevant variables is very much like thorough legal process which ensures all pertinent evidence is presented and considered before reaching a verdict. How do we know which variables are relevant?
The Complexity of Real-World Applications: Omitting Variables and Unknown Causal Relationships
While the single-variable salary model serves as an instructive example, its structure using only one variable is purposely oversimplified to illuminate the challenges faced in real-world machine learning applications. In practice, we often find ourselves in a context like a legal investigation with incomplete evidence and unclear connections between the facts at hand. Unlike the above simplified example, real-world scenarios rarely present themselves with a clear, comprehensive list of all relevant variables.
In a more realistic salary fairness model, we must consider experience, education, and performance metrics as already identified in the economics and sociology literature. Yet we could be overlooking crucial factors like networking opportunities, implicit biases in performance evaluations, or the long-term effects of early career choices. At times, we may know a variable is relevant, yet we may lack the ability to gather data about it. To complicate things further, the causal relationships between these variables are often intertwined, much like the intricate web of circumstances in a complex legal case. A person's education might influence their job performance, which in turn affects their salary, but their family background could have influenced their educational opportunities in the first place. This web of causality is difficult to untangle and even harder to quantify in a model. The risks here are several-fold: we might omit variables that are crucial to fair decision-making, we might inadvertently include variables which encode societal biases into our supposedly objective model, or we might include variables which are irrelevant. These risks are further magnified by the fact that the relevance and causal relationships of variables can and do change over time and contexts, much like how legal precedents may evolve or apply differently in various jurisdictions.
Even well-intentioned attempts to create fair and comprehensive machine learning models can fall short, many times leading to decisions that appear data-driven and impartial but are fundamentally flawed or biased. This reality underlines the critical need for ongoing scrutiny, diverse perspectives in model development, and a humble acknowledgment of the limitations of our models – especially when they are leveraged to reach consequential decisions affecting lives and livelihoods.
As indicated also during the presentation, these issues can come in direct contradiction with key legal principles. The COMPAS solution used a classification model to predict both the likelihood and degree of recidivism in the US. The variables used in the model were selected based on several academic studies yet inadequately accounted for social context and causality in its prediction function. The algorithm treated each variable as an independent predictor without considering the complex interplay of social, economic, and systemic factors that contribute to criminal behavior and recidivism. The system might identify for example a correlation between unemployment and higher recidivism rates without considering the underlying causes of unemployment in certain communities. The machine learning algorithm learned a pattern, but it was not equipped to understand what causes that pattern. Together with the biased sample this led to individuals from overrepresented groups in the training data to be predicted more likely as high-risk, regardless of their actual, observed likelihood of reoffending.
Data quality plays an equally important role. Ensuring the dataset is representative and free from sampling bias is vital for a model to produce reliable predictions. If past discriminatory practices influenced hiring, promotion and salary levels, the model may simply “learn” these biases and reflect them in the parameter values. The same holds true for the classification model of recidivism.
In the legal context, this might involve curating extensive databases of case law, statutes, and legal documents while ensuring they represent a diverse range of legal scenarios and outcomes. But also, depending on the particular question, on extending a given dataset with relevant but not easily accessible datasets (such as income, investment or tax payments). At times, such datasets may not be publicly available or even if available, they may be prohibitively expensive.
?
The Legal Frontier
The legal profession is witnessing the fast adoption of a new breed of artificial intelligence tools, the Large Language Models. These digital constructs of language, trained on vast amounts of text, possess a previously unavailable ability to process and generate human-like language. Much like a prodigious legal intern, these models can rapidly sift through and make sense of enormous volumes of text. They have already found application in the legal field, not without significant risks and shortcomings[2].
In contract analysis, LLMs offer the prospect of quick document review. As a digital assistant, LLMs are capable of ingesting over a 100-page contract in only a few seconds, highlighting potentially problematic clauses or inconsistencies. But these constructs of language speak better than they think. The model's understanding, while broad, lacks the nuanced comprehension of legal context that a generally a seasoned attorney possesses. There is a real risk of overlooking subtle but critical details, the model's recommendations inadvertently reflecting biases present in its training data, perpetuating outdated or discriminatory legal practices.
For legal research, these models boost lawyers’ ability to find relevant cases and statutes. They can quickly identify patterns across thousands of legal documents, uncovering precedents that might have been missed in decades of legal work. There is nevertheless a risk of the model misinterpreting legal concepts, leading to flawed research outcomes that undermine entire legal strategies.
The application of these models in case outcome prediction is perhaps the most contentious. By analyzing patterns across past cases, these tools claim to offer insights into likely judgments. This approach treads dangerously close to reducing the complexities of law to mere statistical probabilities. It fails to account for the unique circumstances of each case, the evolving nature of legal interpretation, and the fundamental role of human judgment in the legal process. There is a real possibility that overreliance on such predictions will lead to a self-fulfilling prophecy, where a model's output influences legal strategies in ways that increase the likelihood of its predictions coming true.
In due diligence processes, the ability of these models to rapidly process vast amounts of documentation is an attractive feature. In scenarios like corporate mergers, where time is often of the essence, these tools could significantly expedite the review process. However, many deficiencies are already visible. Users report instances where models have missed critical information buried in complex legalese or failed to flag unusual terms that don't fit neatly into predefined categories. More recent technical solutions (Retrevial Augmented Generation) are nevertheless solving most of these issues. There is also the concerning possibility of these models being fooled by deliberately obfuscated language, potentially missing red flags in cleverly worded documents. A very large research area currently is developing in adversarial writing, same as it does in image processing to make certain patterns not easily detectable by machines.
As also reiterated during the presentation, at their core, LLMs are sophisticated pattern recognition systems. They do not truly "understand" legal concepts in the way a human lawyer does, but they predict the most likely text based on patterns observed in their training data, much like a lawyer might complete a colleague's sentence based on shared knowledge of legal jargon. This fundamental limitation means they can sometimes generate plausible but incorrect information, especially when dealing with novel or complex legal scenarios.
These models also raise significant ethical and privacy concerns. The vast amount of data required to train them may include sensitive or privileged information, raising questions about confidentiality and data protection.?
Advanced language models already offer powerful capabilities that are delivering tangible benefits to legal work, from streamlining research and document review to enhancing contract analysis and drafting. These technologies have demonstrated their ability to increase efficiency, reduce costs, and improve accuracy in various legal tasks. While the concrete advantages are becoming evident, ongoing experimentation and careful evaluation remain a must to fully understand and mitigate potential risks.
These tools improve their performance rapidly why expanding the list of tasks they can perform at least as good as a legal professional. They should be viewed as sophisticated aids to augment human expertise rather than replace it, much like how legal databases and search engines have become indispensable yet complementary to skilled legal professionals. The legal profession may benefit by actively engaging with these technologies, leveraging their strengths while remaining aware about their shortcomings. This proactive approach will allow for the development of best practices and ethical guidelines that ensure the fundamental principles of justice remain at the center of legal practice. Today’s benefits tilt the balance towards the need for innovation in the legal field. Safeguarding the integrity and human-centric nature of the law will ensure that the associated risks will be properly managed.
[1] Usually, ML models minimize the sum of squared errors. This is a measure of the model's overall accuracy, calculated by adding up the squares of all the individual errors. We square the errors to ensure that negative and positive errors don't cancel each other out.
?