Beyond Wrappers & RAG: Layout-Aware Models & Fine-Tuning in LegalTech
Jose L Sampedro Mazon
Enterprise Tech | AI LegalTech Founder, Oracle Cloud BD Director, BCG Manager | INSEAD MBA, Columbia JD
When I co-founded my first contract analytics start-up in the pre-GPT days, the focus was more on building a robust pre-processing pipeline grounded in deep domain expertise rather than on the NLP model itself. We were barely able to scrape enough data to classify provisions using traditional ML, let alone deep learning with RNNs. Nowadays, with pre-trained transformer models, there is a flurry of startups achieving more scalable, generalizable, higher performance solutions simply putting a wrapper over GPT-4. And as we add in Retrieval Augmented Generation (RAG), ‘hallucinations’ are gradually dropping too.
Traditional NLP models would also struggle to grasp the rich information provided by the visual layout of legal documents hidden in plain sight – indentations that reveal hierarchies and relationships between provisions, bold fonts that denote defined terms… visual cues that lawyers intuitively rely on all the time. There were some workarounds (mark-up languages, manually bounding boxes with basic OCR…), but none truly scalable or generalizable. Now we have large pre-trained models like Gemini and Claude 3 that can handle text, images, sound, and video.
LegalTech has come a long way. Yet, with most startups still building solutions based on wrappers over commercially available foundation models supplemented by limited RAG, we are barely scratching the surface.
In this article I argue that AI-powered solutions in LegalTech will need to be more specialized and tailored to reach the levels of performance expected by the legal industry before we can achieve mass adoption, re-visiting critical questions around optimal AI model architecture and how we custom-fit to the use case.
In particular, I will argue that the winning formula in legal document analytics will be the lesser known ‘Layout-Aware’ AI model architecture, specifically fine-tuned (at a parameter level), and supplemented by a growing community-sourced, high-quality legal data sets to boost RAG performance – balancing performance, cost, and accuracy.
Optimal Model Architecture: Language vs. Multimodal, Specialized vs. General
Large language models (LLMs) may miss the critical visual information in legal documents, affecting their performance. As previously mentioned, legal documents often convey important information through layout and formatting, such as indentation to define the hierarchy of legal provisions and bold font for definitional terms used throughout the document. LLMs can struggle with tasks requiring visual context, leading to potential inaccuracies. This is because LLMs only take embedded text as input, and layout/formatting gets stripped away during pre-processing, ignoring valuable visual cues.
We now have commercially available, general-purpose multimodal AI models like GPT-4o, Gemini or Claude 3, which go beyond text and images to include audio and video too. However, these models do not process visual and textual data together, which may lead to failure to capture the nuanced meaning of the layout in legal documents. Moreover, the versatility comes at the price of unnecessary complexity and more computational resources that are not justified for the specific task of contract review.
In contrast, layout-aware models, such as DocLLM, LayoutLM, and LayoutXLM, are specifically designed to capture and leverage the visual layout structure of documents, which is crucial for accurate legal document analysis. They are pre-trained on large datasets of document images and text, allowing them to learn the intricate relationships between text and layout without the need for extensive fine-tuning. These models also employ specialized architectures and attention mechanisms that capture the cross-alignment between text and spatial modalities, leveraging the spatial information that leads to more accurate extraction and analysis of legal information, making them ideal for legal document analytics.
Tailoring AI Models for Purpose: RAG, Fine-Tuning, and/or Domain-Specific Foundational Models From Scratch?
Once the optimal model architecture is selected, the approach to tailoring the AI model to our LegalTech solution can be viewed on a spectrum. On one end, we have wrappers over out-of-the-box commercial models like GPT-4, which offer impressive capabilities but may not meet the specific needs of the legal industry. On the other end, we have tailored solutions such as fine-tuning existing models with legal-specific data and RAG, or even creating foundational models from scratch.
Wrappers with Retrieval Augmented Generation
Building wrappers over out-of-the-box commercial models with RAG is likely the most common approach among LegalTechs today. A wrapper is essentially a user-friendly interface submitting engineered prompts on the backend to a commercially available AI model, and leveraging the inferences received back from it as part of the document analysis workflow reverting analytics insights to the user. RAG combines retrieval-based and generative-based models to enhance the relevance and accuracy of AI outputs by integrating domain-specific data. Unlike fine-tuning, RAG does not change the model parameters but adds a legal-centric data set to complement and cross-check foundational model outputs with a ‘trusted’ legal data set (the quality and breadth of which depends on the developer itself).
RAG has shown improvements in the relevance and accuracy of inferences, as well as reducing hallucinations. However, the efficacy of RAG depends on access to high quality legal data sets, and there are technical limitations on how much RAG can improve performance which, coupled with the high cost and complexity in integrating RAG effectively, are a significant limiting factor. Bottom line, it seems fair to say that, for the most part, the performance of this approach has not proven convincing enough for widespread adoption among the legal community to date, beyond early experimentation.
Fine-Tuning Existing Models Supplemented by RAG
Fine-tuning involves adapting a pre-trained model to a specific domain by training it on domain-specific data, leading to changes in the model parameters. Fine-tuned models outperform out-of-the-box models in domain-specific tasks, providing higher accuracy and contextual understanding. Fine-tuning pre-trained models on legal-specific data, supplemented by RAG, offers the best balance of performance and cost, and allows customization to specific legal tasks and datasets, making it a practical and scalable solution.
While not explicitly revealed, rumor has it that leading AI-first startups like Harvey, which recently partnered with Mistral, are adopting this approach to enhance their AI capabilities. By fine-tuning models with legal-specific data and supplementing them with RAG, these startups can achieve higher accuracy and relevance in their AI outputs, meeting the high standards of the legal profession.
Creating a Foundational Model from Scratch
Developing a foundational model tailored to the legal sector is an ambitious and, in my view, impractical approach - the cost and effort involved are prohibitive. Training a foundational model requires acquiring petabytes of legal data (including purchasing from private sources with little interest in selling it), and millions of dollars in computational resources – not just initially but on an ongoing basis as the model is updated and retrained. ?In theory, a specialized foundational model for legal could bring about performance improvements, but the delta to other approaches (if any) will remain highly uncertain until one is built and tested. It seems highly unlikely that we will see this in the venture scene any time soon.
Overall, as mentioned, most LegalTech startups today appear to be using wrappers of commercial models with RAG to enhance their AI capabilities. ?However, I believe we will see startups increasingly adopting layout-aware models like DocLLM and LayoutLM to leverage the visual structure of legal documents and fine-tuning them at a core parameter level for better accuracy and performance. The market is moving towards more specialized AI solutions that can handle the unique challenges of legal document analysis.
Given the sensitivity and criticality in handling legal matters, and the conservative nature of the legal industry, there is certainly need for higher accuracy and efficiency before such LegalTech products can become mainstream. As of today, fine-turned layout-aware models supplemented by high-quality RAG will be a desirable and even necessary step towards boosting LegalTech adoption.
That said, the landscape is evolving quickly. Whether you are a builder in LegalTech or a consumer of it, keeping a close eye on the ongoing advancements in AI model architectures and implementations will be crucial to pick the best solutions that balance performance, cost, and accuracy, making them safe, practical and scalable solutions for use in legal services.