A GPT to Help You Create the New eDiscovery Triangle
Jon Neiditz
Insightful Ideation by Hybrid Intelligences for Everybody, + Voices for the Strategically Silent!
I was excited to see the good writeup of Robert Keeling's important study of the effectiveness of GPT versus human review in eDiscovery, not only as the first great practical study of that critical topic, but because my year of immersion in GenAI enables me to challenge one of its fundamental assumptions in a way that only increases the study's significance! Back in 2006, when the eDiscovery changes to the Federal Rules of Civil Procedure stimulated the first wave of eDiscovery tech, there were many such studies; given the black box issues discussed below and because the technology assisted review (TAR) industry is so large and well-entrenched now, I am grateful for a study of this quality even a year-and-a-half after the emergence of GPT-4, and hope it and this response will be the start of many more.
The Keeling Study is Better than it Knows
The Keeling study highlights GPT-4's ability to analyze and code documents for responsiveness, comparing its performance to that of human reviewers. The study utilized a representative sample of 1500 documents from a closed case involving the Anti-Kickback Statute, with 500 documents previously identified as responsive and 1000 as non-responsive by human reviewers. GPT-4 was tasked with evaluating each document individually against a set of review instructions, mirroring those used by human attorneys in the original case review. The AI's performance was assessed using a scoring system for responsiveness ranging from negative one to four, with explanations required for each responsiveness determination citing specific text from the documents.
The study finds that GPT-4 performs well, especially when it "is confident" about the content of the documents. Where GPT-4 is highly confident, it demonstrates strong accuracy in identifying non-responsive and responsive documents. However, in the study results GPT-4 faces challenges with documents that have ambiguities or require additional context, such as family member documents, short responsive messages or documents that had responsive attachments. The study concludes in part that GPT-4, in its current form, is best suited for paring down large volumes of documents, which can then be reviewed using traditional tools and manual human review.
The writeup is admirably transparent about the probable reasons for GPT's failures on the more ambiguous and context-dependent documents:
A large number of these errors can be attributed to the fact that additional information provided to attorneys during the course of the review was not part of the initial review instructions provided to GPT-4. Once the prompt was revised, there was a much smaller number of documents scored in this range, which led to higher overall recall and precision. This result is not unexpected since attorneys are able to follow up and ask clarifying questions as they arise—a uniquely human task that GPT-4 cannot do.
For a year now I have been maximizing transparency, clarity and iterative learning in working with GPT-4, first through dialogue and prompt engineering, then through "custom instructions," and now through the training and fine tuning enabled in creating GPTs. When I give GPTs tasks, they generally convert them into step-by-step plans and tell me those plans, usually prompting refinement. They ask and seek clarification as the questions and ambiguities arise, and are sometimes also required to clarify the default assumptions they will use if I choose not to provide more information. They ask for examples of good work I might want to share. They pause and wait for confirmation or elaboration or examples. Then they produce the initial response, which is just the beginning of the clarifying dialogue. The purposes have included more relevant responses, more transparency (even if always to some extent fabricated) and more beneficial "uncertainty" as advocated by Stuart Russell. Thanks to Ethan Mollick for initial guidance back in the prompt engineering days.
Given this focus on generative AI's ability to learn through dialogue, iteration and clarification, I felt as if I were handed a skeleton key to the future of eDiscovery by the Keeling study writeup's above-quoted statement. Given the focus of the Hybrid Intelligencer since its birth on the "relevance engines" enabled by natural language processing (NLP) of generative AI in relation to the relatively static, keyword-centric limitations of TAR, I expect the Keeling study, the specific insights that follow here and the GPT I offer you below to impact first human review and eventually TAR in eDiscovery significantly.
My fundamental question for Robert Keeling and his (and our) partners at Relativity is whether by not fine-tuning and encouraging GPT-4 to ask for and benefit fully from clarifications, iteration and dialogue throughout the study, you might have been TREATING AN LLM AS IF IT WERE A TAR, a predictive or extractive AI to which you submit an inquiry and receive a predictable response to that inquiry, then clarify or revise the question if you want a different output, rather than a generative AI designed for dialogue producing evolving answers every time (even to the same question) based on evolving "understanding" afforded by NLP. Note in this regard that the methodology did include a second wave of reviews with clarifications, which significantly improved GPT-4's scores, but did not afford GPT-4 opportunities for iterative learning through dialogue or seeking clarifications as questions arose which the human reviewers had enjoyed. In challenging the assumption of the Keeling study stated above by systematically exploring how a fine-tuned generative AI -- through its NLP powers and ability to benefit from clarification and dialogue -- can significantly enhance eDiscovery, this Intelligencer edition may open the door to a new study building on the Keeling study, with significantly better results for generative AI. It does so below by providing a eDiscovery specialist GPT fine-tuned to seek clarification through dialogue, so you can begin already to explore the extent to which NLP permits a partial bridge to the subtler, informal interaction and learning of humans.
Roles for GenAi in a New eDiscovery Triangle
Static vs. Dynamic Approaches in Document Review: Traditional TAR tools rely on predefined keywords, rules, and algorithms to identify relevant documents. This method, while efficient in handling large volumes of data, can require retraining or manual intervention given emerging details, evolving cases, and new areas of relevance. TAR systems are limited by the initial input and cannot dynamically update their understanding based on new information or context. By contrast, LLMs are not confined to initial parameters and can engage in an iterative process, continuously refining their search and analysis based on new data and feedback, improving their creativity and/or accuracy over time.
Seeking Clarification and Context: LLMs can engage in real-time, interactive dialogues with users, allowing for a more nuanced and context-aware approach to document analysis in eDiscovery. In practical terms, this means LLMs can process inquiries, interpret the underlying intent, and respond in a manner that resembles human conversation. This interactive dialogue is particularly valuable in scenarios where the relevance of information is more subtle and context-dependent. With each interaction, the LLM can adjust its search parameters, leading to increasingly relevant results. For instance, if initial search results are not entirely on point, a user can provide feedback or additional context, prompting the LLM to refine its search strategy accordingly.
Analysis Beyond Keywords: One of the most significant advantages of LLMs in eDiscovery is their ability to understand and identify implicit concepts and relationships within the text, going beyond word matching. LLMs can uncover relevant documents and semantic relations between them that might not contain any particular terms but are contextually pertinent to the matter.
Streamlining the Review Process: The adoption of LLMs in eDiscovery can significantly streamline the document review process. By efficiently sorting and categorizing documents based on their relevance, LLMs can reduce the time and effort required for manual review.
Making eDiscovery More Accessible and Intuitive LLMs democratize access to advanced eDiscovery tools, making them more accessible and intuitive for a broader range of legal professionals, including those in smaller practices or with limited technical expertise. The user-friendly nature of LLMs lowers the barriers to entry, allowing more legal professionals to leverage advanced eDiscovery techniques without requiring specialized training or technical skills.
Potential for Significant Cost Reductions: The effectiveness of LLMs at excluding false positives flagged by TAR has the potential to reduce costs associated with both not only human review, but TAR.
Fear and Loathing of GenAI in eDiscovery, or Why Human Review Will Be Changed First
The greatest challenge and irony of building the new eDiscovery triangle, of course, is that the very dialogic superpowers that secure generative AI its corner in the triangle make it appear to be a very opaque black box in this area of big data in which litigants have always been required to "show their work." Movement from negotiating keywords with the other side to trusting dynamic algorithms not fully understood by even the creators of the foundation models will never be a full replacement in the foreseeable future, given the current state of explainable AI (XAI) for LLMs. For that reason, TAR's corner of the triangle is secure as well, and the uses of generative AI in eDiscovery must be carefully and strategically chosen. The Keeling study is wise, therefore, to focus on the top corner, the even more inscrutable human reviewers and quality controllers. We know even less about how their minds are working at any point than we do about the generative AI's processes, so we can focus instead on measuring quality through outcomes as the Keeling study does. I therefore expect more replacement of human reviewers than replacement of pre-GenAI TAR by the current wave of GenAI, although there will be many specific opportunities for both. To help you find and make the most of those opportunities, I offer you a GPT.
领英推荐
Introducing a GPT to Open Doors: eDiscovery Ideator
My excitement regarding the Keeling study and where we can go from here led me to create a GPT stocked with lots of knowledge about the potential for generative AI in eDiscovery and designed to keep an intelligent eye opened for more. Welcome to eDiscovery Ideator, a GPT designed to provide you with an endless, iterative supply of ideas about how generative AI can help with a case or with your eDiscovery program more generally. N.B.: DO NOT ENTER CLIENT INFORMATION UNTIL ONE OF US IMPLEMENTS THIS IDEATOR ON A SECURE PLATFORM WHERE NO EXFILTRATION HAPPENS, EITHER TO INCORPORATE TRAINING DATA OR TO DO HARM/ABUSE MONITORING.
Maximum Transparency, Explainability and Dialogue
Future Use Cases
For now, at least until it is stored on a secure platform, eDiscovery Ideator is primarily a source of ideas for what you will be able to with GenAI. In that spirit -- and ignoring some current practical limitations you may be facing like token size -- let’s explore a series of practical scenarios that demonstrate how generative AI could be a game-changer in real-world legal situations.
Scenario 1: Complex Commercial Litigation
Scenario 2: Intellectual Property Dispute
Scenario 3: Regulatory Compliance Investigation
Scenario 4: Personal Injury Case Document Review
Scenario 5: Cross-Border Mergers and Acquisitions Due Diligence
Conclusion: Why a Triangle?
You might well expect GenAI to become a barely visible part of TAR, to be absorbed into TAR as it will be into so many other areas of technology and the economy that it "vanishes," in which case, the triangle I have proposed becomes a line or binary. Some of you use TARs a lot more than I do, but I respectfully suggest that we might not want that to happen yet in eDiscovery and other areas of data governance, in part given the need to "show your work" in eDiscovery -- and increasingly in privacy, data protection and AI regulation -- and in part given the benefits of a human working with a team of expert GenAIs that I emphasized in connection with planning in the last edition. We may in short be better off if we keep the line between the predictive, extractive AI and the generative AI visible for the moment, so that we can choose how much assured fidelity to databases of truth and how much creativity and "understanding" we need in each inquiry or dialogue. Hence a triangle.
Partner at Redgrave LLP | Speaker | Author
1 年Very interesting comment! To your question, we did not engage in a dialog with the LLM. Instead, we altered/expanded the initial prompt based on our initial results, adding more detail - mostly detail that the underlying case team had provided to the contract attorneys as the review progressed in response to contract attorney questions. In many ways, an ongoing dialogue with the LLM is more "black box" to the other side, so it raises interesting questions of performance/efficacy versus transparency.
VP of Product @ Hanzo
1 年Jon this is fantastic and I really love seeing the evolution occurring here in E-Discovery.
Attorney, AI Whisperer, Open to work as independent Board member of for-profit corps. Business, Emp. & Lit. experience, all industries. Losey.ai - CEO ** e-DiscoveryTeam.com
1 年I think many of us abandoned the single training approach for TAR long ago that you mention here. “you might have been TREATING AN LLM AS IF IT WERE A TAR, a predictive or extractive AI to which you submit a single inquiry and receive a single output, rather than a generative AI designed for productive, iterative dialogue.” TAR done correctly is almost never “one and done.” It is an evolving iterative proceess. See eg writings on concept drift, aka relevance drift. So there is no big change to start using Gen AI in the multimodal search proceds. Right Jim Sullivan? Jim Sullivan doing that now commercially using Relativity. We worked together at NIST Trek Total Recall in 2015 and 2016 to prove these TAR methods. With the new tool of GPT it shoukd be even better.