RAG: Why Does It Matter, What Is It, and Does It Guarantee Accuracy?
Thomas G. Martin
CEO + Founder at LawDroid, Work Smarter with AI / ABA Legal Rebel + Fastcase 50 / Generative AI Speaker, Professor, Author, Philosopher, Coder, Lawyer | Subscribe to newsletter for thoughts on AI + Law
Thanks again to you dear reader! You’re dear to my heart. ??
As for you noobs, make yourself comfortable, the beer is in the fridge. ??
This newsletter, LawDroid Manifesto , is here to keep you in the loop about the intersection of AI and the law. Please share this article with your friends and colleagues and remember to tell me what you think in the comments below.
If you thought my articles about prompt engineering and hallucinations were nerdy, well then strap yourself in! I’m going to take you into the world of “retrieval augmented generation” or “RAG” for short. If you’ve been following the discussion about the use of AI in the law, you probably heard the term and may have wondered what it means. I thought I’d perform a public service of explaining it in detail because it’s something that sometimes get glossed over. And, if you’re talking to AI vendors, you want to understand it. The reality is that it may sound overly technical, but it’s really like giving AI the ability to take an open book exam. Who would score better, someone taking an open book exam or a closed book exam? Makes sense, right?
If this sounds interesting to you (then you get a pocket protector my friend as you are now nerd-certified), please read on…
Why Does RAG Matter?
Let’s start with answering the “Why?” question first instead of the what.
“Why does retrieval augmented generation matter to me?” is likely the first question that comes to mind, especially for what appears to be a somewhat technical subject. When addressing lawyers’ concerns about hallucinations, retrieval augmented generation (RAG) has been cited by legal research companies as the solution.
Casetext claimed that its use of retrieval augmented generation “eliminat[ed]” hallucinations and that its flagship product CoCounsel was the “world’s first reliable AI legal assistant” (Casetext, 2023). Thomson Reuters, which acquired Casetext, stated that RAG “avoid[ed]” hallucinations (Thomson Reuters, 2023). LexisNexis claimed that RAG guaranteed “hallucination-free” legal citations (LexisNexis, 2023).
However, doubt has recently been cast on these claims by a study, “Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools,”1 written by researchers from Stanford and Yale intent on assessing the accuracy of AI-powered legal research tools. “[W]e find that the AI research tools made by LexisNexis (Lexis+ AI) and Thomson Reuters (Westlaw AI-Assisted Research and Ask Practical Law AI) each hallucinate between 17% and 33% of the time.”2 By comparison, GPT-4’s hallucination rate is 43%.
By the way, for a whirlwind tour of hallucinations, read my other article: “Hallucinations: What Are They, Why Do They Happen, How to Fix Them? ”
Given the considerable disparity between legal research providers’ RAG-powered claims to be hallucination-free, on the one hand, and testing of those self-same tools revealing hallucination rates between 17 to 33%, on the other hand, it begs the question for the legal practitioner: What is RAG and does it work?
What is Retrieval Augmented Generation?
Retrieval Augmented Generation (RAG) is a technique that enhances the output of a large language model (LLM) by incorporating relevant information from external knowledge sources, such as a web pages, documents or a database.3
RAG is Like an Open Book Exam
RAG is like giving an LLM an open book exam. The LLM can “read” relevant material before answering the question. The opposite, directly asking an LLM a question without the advantage of RAG or any further information, is like a closed book exam. The LLM cannot reference any external materials when it develops its answers.
To continue with the analogy for a moment (because it is weirdly appropriate and descriptive of how LLMs work), let’s ask this question: What does a student rely on when writing a closed book exam?
In a closed book exam, a student would answer exam questions based on what they remember being taught in the class and generally from their education up to that moment in time. Likewise, an LLM, like GPT-4 was trained on 1.76 trillion parameters of information.4 The training data likely includes a diverse range of web pages, books, articles, and other texts, but the specifics have not been made public by OpenAI. This forms the model’s “parametric memory” (because those facts are stored in its parameters or weights created from its training) and this is why you may have heard of different models having knowledge cut off dates. For example, the most recent knowledge cut off for GPT-4 is April 2023 and October 2023 for GPT-4o.
Who Invented RAG?
Like many innovations in AI, RAG was not invented by a single individual, but rather emerged through collaborative efforts of a number of researchers. The key paper that introduced and formalized the RAG technique was: "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks."5
Patrick Lewis, the lead author, who now leads a RAG team at Cohere, acknowledged that while his team coined the term "RAG," the underlying concepts built on previous work in information retrieval, question answering, and language modeling. Lewis also acknowledged that Retrieval Augmented Generation was a horrible name choice: “We definitely would have put more thought into the name had we known our work would become so widespread… We always planned to have a nicer sounding name, but when it came time to write the paper, no one had a better idea.”6 ??
From its origin, RAG was conceptualized as a technique to minimize hallucinations and make LLMs more reliable. With the use of RAG, an LLM “is more strongly grounded in real factual knowledge [and that] makes it ‘hallucinate’ less with generations that are more factual, and offers more control and interpretability.”7
How Does RAG Work?
Now that we have a notion of what RAG is, and why it’s important to know something about it, let’s see how it works.
RAG Step-by-Step
This technique is particularly useful for applications like customer support chatbots, internal Q&A systems, and other scenarios where access to current, domain-specific information is crucial.
RAG Benefits
Key benefits of RAG include:
领英推荐
RAG addresses several challenges of LLMs by grounding responses in authoritative, up-to-date sources. It allows organizations to leverage their own data without retraining the entire model, making it a cost-effective approach to improving LLM outputs for specific domains or internal knowledge bases.
RAG Challenges
The main challenges of RAG include:
To address these challenges, solutions are being explored, including document hierarchies, knowledge graphs, recursive retrieval methods, self-criticism8 , and combining RAG with other techniques like prompt engineering and fine-tuning9 .
For a comprehensive backgrounder on prompt engineering, read my other article: “Prompt Engineering: What Is It, Why It’s Important, and Is It Obsolete? ”
Does RAG Guarantee Accuracy?
No, retrieval augmented generation does not guarantee accuracy and it does not eliminate hallucinations. However, RAG does a very good job of minimizing hallucinations and making LLMs reliable enough for many applications.
RAG Improves Reliability
RAG helps minimize hallucinations in LLM responses through several key mechanisms:
By leveraging these mechanisms, RAG significantly reduces the likelihood of hallucinations in LLM responses, improving the overall reliability and trustworthiness of AI-generated content.
Reliable Enough for the Risk?
Any analysis of LLM accuracy and reliability must ask: Reliable enough for what?
The framing of the debate in legal to date has focused almost exclusively on “bet the farm” and “cut the blue wire or red wire”-type uses of LLMs. While requiring 100% accuracy may make sense when asking an LLM to moot questions relating to the calculation of the statute of limitations or sentencing for capital murder, it is not reasonable for low or medium risk applications that also exist within the legal context.
There are many applications of RAG that are perfectly suited for the level of risk presented by LLMs’ current state of the art. To assess if an LLM is fit to purpose, we must classify the risk of the use case: Is it low risk, medium risk or high risk?
Here are some working definitions10 I have developed:
LLMs can be used in legal for much more than just narrow, deep legal research the consequence of which is catastrophe if mishandled.
Low risk LLM use in a legal practice may include translating text from one language to another, checking for grammar errors, helping to draft an email or letter, drafting an outline for a blog post, formatting legal citations, enabling natural language search of case law, or generating an image. There are a myriad of other low risk uses.
Medium risk LLM use in the legal context may include creating a draft of a contract or pleading, redlining a contract, providing general legal information and answering FAQs (such as, what are the steps to get divorced?, how do I get a restraining order?, how much does it cost to get a will?), summarizing documents (such as deposition transcripts, financial statements, emails and texts), or briefing cases. There are of course many other possibilities.
Sometimes good enough is good enough.
A couple of considerations make LLM usage more palatable for medium and even high risk applications within the legal context:
Closing Thoughts
While retrieval augmented generation is not a panacea that guarantees 100% accuracy and eliminates all hallucinations, it represents a significant step forward in making LLMs more reliable and grounded in factual, up-to-date information. By augmenting LLMs with external knowledge, RAG helps to guide generation, reduce reliance on outdated pre-trained knowledge, and provide transparency through source citations.
Crucially, the question we must ask is not simply whether RAG delivers perfect accuracy, but rather whether it is reliable enough for the level of risk presented by specific use cases. In the legal context, there is a wide range of potential applications for LLMs, from low-risk tasks like language translation and grammar checking to medium-risk activities such as contract drafting and document summarization. By classifying the risk level of each use case and implementing appropriate human oversight, law firms can safely harness the power of LLMs to streamline workflows and enhance productivity.
Moreover, the legal profession's ethical obligations and federal rules mandating lawyer supervision of AI-generated work product provide an additional layer of quality control. By exercising independent judgment and carefully reviewing the output of RAG-powered LLMs, legal professionals can catch and correct errors, mitigating the impact of any remaining hallucinations.
The development of shared industry benchmarks for AI accuracy represents a critical step towards fostering trust and transparency in the use of LLMs within the legal sector. By establishing objective standards for evaluating the reliability of RAG-powered systems, legal professionals can make informed decisions about when and how to deploy these powerful tools.
As the technology continues to advance, retrieval augmented generation, coupled with human expertise and robust accuracy benchmarks, holds immense promise for transforming the practice of law.
By embracing these innovations while remaining mindful of their limitations, the legal community can harness the power of AI to deliver more efficient, effective, and accessible legal services to clients. Let’s go!
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
3 个月The hallucination rate in RAG models, while concerning, highlights the crucial need for robust evaluation metrics that go beyond simple accuracy. Exploring techniques like adversarial training and knowledge distillation could significantly improve factual grounding in these systems. It's fascinating to consider how incorporating legal reasoning frameworks, such as those based on case law analysis, might further enhance RAG's ability to generate legally sound outputs. What are your thoughts on integrating explainability techniques into RAG models to provide transparency into their decision-making processes?