AI Quality: A 12-Point Framework for Legal AI

AI Quality: A 12-Point Framework for Legal AI

This is part 1 of a 3-part series on takeaways from the AI Quality Conference, which took place in San Francisco on June 25, 2024.

Just back from the AI Quality Con in Silicon Valley, I'm buzzing with insights on why AI quality is the next big frontier for legal tech. Spoiler alert: it's not just about accuracy!

The AI Quality Challenge in Legal

We've all heard the horror stories – like the Mata v. Avianca case where AI hallucinated case citations. It's no wonder a recent LexisNexis survey found 86% of legal pros concerned about AI trustworthiness (LexisNexis 2024 Investing in legal Research Survey). A study by the Stanford Human Centered Artificial Intelligence lab published on May 23, 2024, while criticized for its methodology, recently highlighted the need for benchmarking and public evaluations of AI tools in law. But here's the kicker: AI quality is a multi-faceted beast, and understanding its complexity is key to harnessing its power responsibly.

The 12 Pillars of AI Quality for Legal

Buckle up, because we're about to dive into a framework that's going to change how you think about the quality of AI systems and the implications for firms and legal tech:

1. Foundational Model Quality: This is about the core AI model's capabilities. In legal, we need models that can handle complex reasoning and nuanced language. The choice between large, general models and smaller, specialized ones can significantly impact performance and efficiency.

Insight: Consumers today have several high-quality foundational models to choose from, and model quality has seen a measurable increase. Vals.AI, a new startup that evaluates language models on a range of legal tasks, shows the performance improvement of flagship closed and open source models from OpenAI's GPT-3.5, which kicked off the Generative AI craze, to the latest releases of GPT-4o and Anthropic's Claude 3.5 Sonnet, which has set the new standard for model quality. Smaller models are also starting to emerge in Legal, such as KL3M by 273 Ventures, touted by its creators as the first large language model family trained from scratch on clean, legally-permissible data.

2. Data Quality: For legal AI, this means curated, authoritative databases. High-quality data is crucial for preventing hallucinations and ensuring accurate information retrieval. It's the difference between citing a landmark case and a non-existent one.

Challenge: One of the biggest hurdles faced by legal organizations today is leveraging internal data – doing so introduces an entirely new layer of quality with its own set of failure points.

3. Retrieval Quality: This pillar focuses on the AI's ability to find relevant information. In law, where precedent is king, the ability to quickly and accurately retrieve pertinent cases or prior agreements is invaluable.

Key Point: Retrieval is a core component of Retrieval Augmented Generation (RAG) systems, which improve AI quality by adding relevant data as additional context to steer AI towards high-quality outputs.

4. Chunking and Indexing: This is about how legal documents are loaded and processed for AI consumption. Proper chunking ensures that context is maintained, which is crucial when dealing with complex legal arguments or lengthy contracts.

5. Embedding Quality: Vector embeddings are numerical representations of text that capture semantic meaning, allowing the AI to understand and retrieve relevant information based on contextual similarity. High-quality embeddings allow AI to grasp the subtle nuances of legal language, improving overall performance in legal tasks.

6. Contextual Understanding: Legal AI must understand context to provide relevant answers. This pillar ensures AI systems maintain context, crucial when dealing with intricate legal matters.

Research Note: Studies have observed a "lost in the middle phenomenon" implying that, like humans, LLMs are more likely to recall information positioned at the start or end of an input, with a tendency to overlook content in the middle. As the context windows of language models expand into the hundreds of thousands and millions in the case of Google's Gemini Pro 1.5, there are tradeoffs to consider between trying to load large amounts of information into the model's immediate context (akin to working memory) versus retrieving information as needed using RAG.

7. Ethical AI: Beyond technical quality, ethical considerations like bias mitigation are critical. This ensures AI systems support fairness and justice, crucial in the legal domain.

8. Evaluation and Metrics: How do we measure success in legal AI? This pillar is about developing robust frameworks to assess AI performance on legal tasks, ensuring we're measuring what truly matters.

Frontier Development: Last week, OpenAI introduced CriticGPT, a model trained to catch errors in ChatGPT's code output. Training small language models to address specific performance failures came up repeatedly at AIQCON and is an emerging focus of providers seeking to improve the performance of their applications.

9. Integration and Deployment: Legal AI doesn't exist in a vacuum. This pillar focuses on how AI systems integrate with existing legal workflows and technologies, ensuring smooth adoption and maximum utility.

Pro Tip: Be aware that differences in how foundation models are incorporated into products can impact quality. One of the best ways to check a model's performance in a specific product is to incorporate raw foundational models into workflows, provided it can be done securely.

10. Privacy and Security: In law, confidentiality isn't just good practice – it's an ethical imperative. This pillar ensures AI systems protect sensitive client information and comply with data protection regulations.

Alternative Solution: Open-source models like Meta's Llama3, which can be hosted privately and even run locally without an internet connection, offer a balanced tradeoff of quality, value, and security that might appeal to certain firms.

11. Explainability and Transparency: AI decisions in legal contexts must be interpretable. This pillar is about ensuring AI can "show its work," crucial for building trust and meeting ethical standards in legal practice.

Progress: The ability of AI products to return citations with the opportunity to check the LLM's output against source documents is an important first step. As recently emphasized by Jean O'Grady addressing the legal research community, lawyers cannot avoid reading and cite checking any case that is recommended using a commercial Generative AI research tool. In part to avoid responsibility for errors made with their products, AI providers are advised to design user experiences with these professional responsibilities at the center.

12. Continuous Improvement: The law evolves, and so should our AI. This pillar focuses on systems that learn and adapt, staying current with legal developments and improving based on user feedback.

Why This Matters to You

Whether you're a BigLaw partner, in-house counsel, or legal tech enthusiast, understanding these pillars is crucial. They're not just theoretical – they're the building blocks of trustworthy, effective, and ethically sound AI in law.

What's Next?

In my next post, we'll dive deeper into specific implications of AI quality for firms and the legal tech community. Then, we'll wrap up with actionable steps for implementing quality controls in your practice or organization.

Part 2 - Reframing AI Quality for the Legal Mind

Part 3 - Implementing AI Quality in Practice

The legal industry stands at a crossroads with AI. By mastering these 12 pillars of quality, we can build AI systems that don't just assist us – they elevate our entire profession.

Stay tuned, and let's navigate this brave new world together!

#AIinLegal #LegalAI #LegalTech #FutureOfLaw #AIQuality

P.S. Which of these pillars intrigues you most? Drop a comment – I'd love to hear your thoughts!

Woodley B. Preucil, CFA

Senior Managing Director

4 个月

Laurent Wiesel Very well-written & thought-provoking.

Knut-Magnar Aanestad

Steps ahead on Legal AI and Next level on Contracting I Legal Engineer & Partner at Saga & Owner at Maigon

4 个月

Love this Laurent Wiesel! I see providers of Legal AI tools refer to quality improvements without having a thought of what that means in a legal setting.

Victoria C. Albrecht

CEO at Springbok AI, makers of #SpringLaw. Making Generative AI work for you.

4 个月

A small word of caution from the fine print: “Smaller models are also starting to emerge in Legal, such as KL3M by 273 Ventures, touted by its creators as the first large language model family trained from scratch on clean, legally-permissible data.” What most people don’t (know because the devil is in the detail) is that 273’s performance benchmark for their model is actually GPT 2.5. In other words, the quality is far far worse than GPT4 or even 3.5 out of the box. The same has been found with countless other companies who tried to find tune or build their own large language models, for example Bloomberg. That’s because building large models is extremely hard, extremely expensive, and does not guarantee results. And doing so with a small set of “clean data” is a nice idea but somewhat of a myth. The quality of the model matters, but it’s really important to not look at a companies’ own benchmark claims but rather cross market performance assessment at the same level. GPT 4 and Claude tend to outperform across the benchmarks - this will surely change over time. I suspect open source models like llama will get better. But industry-specific models are marketing more than anything else.

Ben Wightwick

???? Workflow & Process Automation | Husband | Father | Chief Commercial Officer @Autologyx | Advisor | Investor | Collaborator | Former HighQ'er | Advocate | Innovator | Listener of music | Not always in that order

4 个月

Great piece Laurent.. You should read this Ben Stoneham, MSc

回复
Josè Alemán

Senior Research Analyst - Research and Knowledge Services

4 个月

Laurent thank you so much for this! This is what we have been waiting for. Some sort of standardization on AI output results. Looking forward to part II and III.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了