Can AI be trusted for legal tasks ?
Dr. Klemens Katterbauer
Research Advisor in AI/Robotics & Sustainability (Hydrogen and CCUS) - AI Legal Enthusiast
Artificial intelligence (AI) techniques are rapidly transforming legal practice. Roughly 75% of attorneys intend to use generative AI in their job, whether for creating legal memoranda, evaluating documents, making contracts, or sorting through mountains of case law. Are these instruments, however, trustworthy enough for everyday use?
Evidence shows that large language models sometimes “hallucinate” or makeup facts. A New York attorney was sanctioned in one well-known instance for using fake cases created by ChatGPT in a legal brief. Numerous other cases of a similar nature have subsequently been made public. Chief Justice Roberts took notice and cautioned attorneys against hallucinations in his 2023 annual report on the judiciary.?
Retrieval-augmented generation, or RAG, is viewed and pushed as the way forward in all industry sectors to lessen hallucinations in domain-specific situations. Leading legal research businesses that rely on RAG have introduced AI-powered legal research products that promise “hallucination-free"?legal citations and “avoid"?hallucinations. RAG systems aim to provide more reliable and accurate legal information by combining a linguistic model with a database of legal papers. However, providers have not offered concrete proof of these assertions or a clear definition of?"hallucination,"?so judging their dependability in actual situations is challenging.
In understanding the issues with hallucinations, there are several types of research questions. The main research questions can be divided into doctrine, case holdings, or the bar exam. Secondly, there are questions about jurisdiction or time-specific issues, such as circuit splits and recent legal changes. Another area of questioning is false premises, which imitate questions from users who incorrectly understand the law. The final question arises from factual recall questions, which are questions about straightforward, objective facts that do not require legal interpretation.
These systems are capable of two types of hallucinations. First, an AI tool’s response could be inaccurate, implying that it could give a false description of the legislation or a factual error. Second, a response could be based on incorrect information—the AI tool accurately explains the legislation but cites a source that contradicts its assertions.
The second kind of hallucination might be even more harmful than the direct fabrication of legal cases, considering the crucial role of reliable sources in legal study and writing. Even while a reference may be?"hallucination-free"?in the strictest sense that it exists, other factors also come into play. Legal?AI's?primary benefit is expediting the laborious process of finding pertinent legal materials.?
Users may be misled if a tool offers sources that appear reliable but are, in fact, irrelevant or contradicting. They can have too much faith in the tool’s results, which could result in incorrect legal decisions and conclusions.
领英推荐
Retrieval-augmented generation (RAG), which many have hailed as a potential cure for hallucinations, underpins these new legal AI tools. Theoretically, RAG enables a system to produce the correct response by retrieving the pertinent source material. In actuality, though, we demonstrate that even RAG systems are not impervious to hallucinations.?
There are a variety of issues that are specific to RAG-based legal AI systems and lead to delusions. Legal retrieval is complex to start. Finding the right (or best) authority can be difficult, as any lawyer will attest. In contrast to other fields, law is not solely based on verifiable facts.?Instead, it is developed over time through the drafting of judgments?by judges.
This makes it challenging to determine the collection of documents that provide a clear answer to a query, and occasionally, hallucinations happen for no other reason than the retrieval process of the system malfunctions.
Secondly, even in cases where retrieval is successful, the document obtained may not be relevant. The American legal system is characterized by a lack of uniformity in its rules and precedents across different jurisdictions and historical periods. As a result, materials that appear relevant based on their semantic similarity to a particular query may not be relevant for reasons specific to the law. As a result, there is the witnessing of hallucinations when these RAG systems are unable to determine whose authority is binding. This is especially troublesome because legal study is the most critical area where the law changes. For example, one system misquoted the Dobbs decision to invalidate the “undue burden” threshold for abortion restrictions as sound law.?
?Third, there are dangers in legal contexts due to sycophancy, which is AI’s propensity to concur with?users'?false presumptions. For example, a system naively accepted the assumption of the question. In this case, the assumption was that Justice Ginsburg dissented from the Obergefell case, establishing a right to same-sex marriage. The system then responded by stating that her position on international copyright was the reason behind her decision. In fact, Justice Ginsburg did not rebel in the Obergefell case, and copyright was not at all relevant to the decision. Despite such a response, the outcomes are encouraging. Our experiments demonstrated that both computers could traverse queries based on erroneous premises.?
However, there can be severe repercussions when these systems do accept false user claims. This is especially true for individuals who want to utilize these tools to help pro se, and under-resourced litigants have more access to the legal system.
Ultimately, these challenges outline the necessity of thorough and open benchmarking of legal AI technologies. In contrast to other fields, the application of AI in law still needs to be revised. Most tools do not offer systematic access, share very little information about their models, or even present evaluation findings. This opacity makes it quite difficult for lawyers to obtain AI products. Paul Weiss, a large legal firm, tested a product for almost a year and a half without developing?"hard metrics"?because the process of verifying the AI system was so complex that it?"makes any efficiency gains difficult to measure."?Responsible adoption is challenging because of the need for more rigorous evaluation metrics, particularly for practitioners with fewer resources than Paul Weiss.?
The inability of attorneys to uphold the standards of professionalism and ethics is another danger posed by the need for more openness. The bar organizations of Florida, New York, and California have all recently published guidelines regarding the supervisory obligation of attorneys over work products generated by artificial intelligence programs. Furthermore, as of May 2024, over 25 federal judges had issued standing orders directing lawyers to report or keep an eye on the use of artificial intelligence. It might be easier for attorneys to fulfill these obligations with access to assessments of the particular instruments and openness regarding their creation. On the other hand, because of the prevalence of hallucinations, attorneys might have to double-check every assertion and reference that these tools produce, which would undermine the purported productivity benefits that legal AI tools are meant to offer.