AI in Judicial Decision-Making: A commentary of Recent Judgments (COL/NL)

AI in Judicial Decision-Making: A commentary of Recent Judgments (COL/NL)

Recent rulings in Colombia and the Netherlands have faced with the implications of using AI tools like ChatGPT in judicial decision-making. In this post I would like to share some thoughts and questions for us to discuss further in this topic.

?Context: Summary of Both Cases

?Colombia

The Constitutional Court of Colombia has reviewed a case involving a judge in Cartagena who used ChatGPT to draft the reasoning for a sentence in a tutela (guardianship) concerning the fundamental rights of a minor. In the first instance, the judge used ChatGPT to assist in making the decision, asking the AI several questions about legal obligations and precedents regarding fee exemptions for children with autism. The chatbot responses were included in the judgment, although the judge clarified that the AI’s input was used to expedite the drafting process, not to replace judicial reasoning. (Here you can find more details about the first instance case)

?The Netherlands

A Dutch lower court judge in Nijmegen admitted to using ChatGPT to gather information while deciding on a case. The judge questioned the AI chatbot about solar panels and used the generated answer in the verdict as an information source to determine compensation in a legal dispute between two homeowners. The district judge used ChatGPT to estimate the average lifespan of solar panels and the current average price of electricity.

?

Commentary Considering Both Decisions

?It's important to highlight that the Dutch case is likely the first court decision that explicitly acknowledges the use of an LLM (Large Language Model).[1] However, the judgment does not address why the judge chose to use this source over others or how it was used. In contrast, the Colombian case involves a decision and insights from the Constitutional Court about a judge using an LLM for similar purposes: as an information source.

?LLMs like ChatGPT are increasingly being used in various professional settings, including the legal field. However, it's crucial to understand that LLMs are not reliable sources of factual information, but rather sophisticated text generators that produce plausible-sounding responses based on patterns in their training data. While they can be helpful tools for brainstorming, drafting, and summarizing, they should not be treated as authoritative sources of legal information or precedent.

The appeal of LLMs to judges and legal professionals lies in their ability to quickly process and synthesize vast amounts of information, potentially saving time and effort in legal research and writing. However, this convenience comes with significant risks. LLMs can produce confidently stated but entirely fabricated information[2] and may not always be up to date with the latest legal developments, considering their training data cutoff dates. Their outputs can also be biased or inconsistent, reflecting biases in their training data or peculiarities in their generation process.

For these reasons, it's important not only that judges disclose their use of LLMs but also demonstrate that such use is proportional and appropriate for the case at hand. Judges should explain why they decided to consult chatbots rather than other sources. The Colombian Constitutional Court emphasized that judges may use AI systems in a reasoned and considered manner, ensuring the protection of fundamental rights and adhering to ethical guidelines. However, AI should not replace essential judicial tasks that require human reasoning, such as interpreting facts, evaluating evidence, and making decisions. Using AI for these core tasks could violate judicial autonomy and due process.[3]

As professor, I personally emphasize to students the importance of diligent research methods. While AI tools can be useful, the main research should be conducted using more legitimate and accurate sources. I also stress the importance of disclosing AI use and conducting a conscious analysis of why it's needed and whether it's the most efficient alternative in terms of cost and benefits.

?Measuring the threshold at which LLMs transition from being assistant tools to authoritative sources is challenging and context-dependent. Factors to consider include the frequency of LLM use, the extent to which outputs are verified against primary sources, and whether LLM-generated content is explicitly cited in legal opinions. A concerning indicator would be if judges begin to prioritize LLM outputs over established legal sources or fail to critically evaluate LLM-generated information. Some advocate for regular audits of judicial opinions for unverified LLM-generated content, ?would this help monitor this trend? ?would this create more work?

To maintain the integrity of the legal system, it's crucial that judges and legal professionals receive thorough training on the capabilities and limitations of LLMs. They should be encouraged to use these tools as supplements to, not substitutes for, traditional legal research methods. Clear guidelines should be established for the appropriate use of LLMs in legal settings, emphasizing the need for human oversight, fact-checking, and transparency about AI assistance. This aspect was emphasized by the Colombian Constitutional Court when mentioned that the principle of transparency imposes a duty on judges to clearly explain the use, scope, and location of AI-generated results in their proceedings or decisions. The principle of responsibility requires that the official using the AI tool be trained in the subject matter, fully understand its risks, be able to account for the origin, suitability, and necessity of AI use, and, most importantly, verify the information provided by the AI.[4] In addition, the Court mentioned that judges using this tools must establish the reasons why the AI system should be used, i.e. a need and suitability analysis of the system.[5]

In my view, the Court's approach was missing a more direct stance on the use of AI in the judiciary. The Court should have emphasized that promoting AI in the judicial system does not mean relying on generic chatbots, but instead utilizing more sophisticated tools that assist with background work. Regarding legal information and information sources, a thorough cost-benefit analysis should be conducted. This analysis should consider the high energy and water consumption associated with each AI prompt, especially when judges can research these aspects using other information sources that are less expensive and more effective.

In addition to the inaccuracy of outputs by the LLMs, recent research has identified a phenomenon called ‘model collapse,’ where the use of model-generated content in training causes irreversible defects in subsequent models.[6] This effect can lead to the disappearance of tails of the original content distribution and may result in models misinterpreting reality over time.[7] This underscores the importance of maintaining diverse and genuine human-generated data in training sets.

?Surprisingly, in the Dutch case, the judge never explained why they decided to use this chatbot, how it was employed, or what specific questions were asked. This is particularly noteworthy given that the Netherlands introduced the Fundamental Rights and Algorithm Impact Assessment (FRAIA) in 2021.[8] The FRAIA is designed to help identify risks to human rights in the use of algorithms and to implement measures to address these risks. In this case, the judge did not appear to use this guide to substantiate why this AI tool was necessary or why it was considered a better source of information than traditional alternatives. This omission raises questions about the decision-making process and the adherence to established guidelines for algorithmic use in judicial contexts. On the other hand, the Colombian Court appears more forward-looking, offering an in-depth analysis of AI’s potential and issuing extensive guidelines for its use. It stresses that AI should augment rather than replace human judgment, aiming to integrate AI into the judicial system while protecting fundamental rights. The Constitutional Court stated that in the case, due process was not affected because the interaction with the chatbot occurred after the judge’s decision. The court established that AI should be used following guidelines and principles such as transparency, due process, and accountability.

However, it's worth questioning why judges spend time and resources on chatbots to find information when they could be using more relevant and reliable research tools available to them. Chatbots may ultimately remove some of the personal touch and character from judicial decisions, which is an important aspect of the connection between citizens and the judicial system. Efforts should focus on other tools that help with administrative duties such as translation, labeling, and classification, but chatbots are certainly not appropriate sources for legal information, especially considering the potential for incorrect data on the internet that these models may use in their training.

?As a personal conclusion, while AI tools like LLMs have the potential to assist in the legal field, their use must be carefully regulated and monitored. Judges and legal professionals should prioritize traditional, reliable sources of information and use AI as a supplementary tool only when appropriate (necessity and proportionality).


[1] At least the first case in which the judge openly disclaimed the use of ChatGPT.

[2] So called hallucinations. The term "hallucination" has been criticized by Usama Fayyad, mentioning that this term misleads and personifies large language models, and that it is vague. Stening, Tanner (10 November 2023). "What are AI chatbots actually doing when they 'hallucinate'? Here's why experts don't like the term". Northeastern Global News. Retrieved 4 August 2024.

[3] Constitutional Court of Colombia. Ruling T-323 of 2024. Pars. 295-296.

[4] Ibid. Pars. 297-299.

[5] Ibid. Pars. 371-372.

[6] Ilia Shumailov and others, ‘The Curse of Recursion: Training on Generated Data Makes Models Forget’ [2024] Computer Science. <arXiv:2305.17493>

[7] As an example of model collapse, consider a scenario where an AI model is initially trained on accurate historical data about World War II. In subsequent generations, this model is used to generate content that is then incorporated into the training data for newer models. Over time, subtle inaccuracies or biases in the generated content might be amplified. For instance, if the model occasionally generates content mentioning a disproportionately high number of Black German soldiers in World War II (which is historically inaccurate), this misinformation could become more prevalent in later training sets. As this process continues across multiple generations of models, the 'tail' of the distribution representing accurate information about the racial composition of the German army might gradually disappear.

[8] Government of The Netherlands. ‘Fundamental Rights and Algorithms Impact Assessment (FRAIA)’. <https://www.government.nl/documents/reports/2021/07/31/impact-assessment-fundamental-rights-and-algorithms>

Diego Beltran Avila

CEO GroupLegalTic -Asesores Legales en Tecnología | Gestión de Riesgos en innovación digital | Protección datos | Delitos informáticos | Integrante Observatorio I.A. OspIA Espa?a | Blockchain Legal | Docente y Speaker.

4 个月

Hello María Lorena, a pleasure to greet you. Your article is interesting, and thank you for sharing it. It has motivated me to make the following comment regarding another perspective we might be observing in the judicial system. https://www.dhirubhai.net/posts/diegobeltranavila_ia-activity-7250865471544070145-pv-3?utm_source=share&utm_medium=member_desktop

回复
Yuri Kozlov

Co-founder & CEO at JudgeAI x Kozlaw | Lawyer & Legal Algorithm Developer

6 个月

Maria Lorena Flórez Rojas may be interested, we at the JudgeAI x Kozlaw project deal with the automation of court decision-making, including fighting the "black box", making the decision-making process as transparent as possible. We are now at the early prototype stage.

要查看或添加评论,请登录

Maria Lorena Flórez Rojas的更多文章

社区洞察

其他会员也浏览了