Understanding the Limitations of GenAI in the Legal Profession
As artificial intelligence (AI) continues to advance, generative AI (GenAI) models like ChatGPT and Claude are becoming increasingly prevalent in various industries, including the legal sector. While these tools offer remarkable capabilities in areas such as legal research, document drafting, and case analysis, it is crucial for attorneys to approach them with a critical mindset and to understand their inherent limitations, particularly in the context of the high stakes and ethical standards of the legal profession.
One of the key advantages of GenAI models is their ability to improve over time through iterative updates and training on new data. As each new version of ChatGPT is released, its performance appears to be more enhanced, potentially enabling more accurate responses to legal queries and prompts. Prompts that previously yielded incorrect responses may now be handled more appropriately.
However, it is important to note that some inherent limitations remain in generative AI capabilities, and there are certain types of prompts that GenAI systems like ChatGPT still find challenging to answer accurately. These include prompts that require specific counting, pattern recognition, or factual knowledge that may not be accurately represented in the model's training data. Examples of such prompts include:
Comparing different GenAI models with control questions
Not all GenAI models respond in the same way. There may be differences in the quality of response you get when comparing different models like Copilot, ChatGPT, Llama or Claude.
I asked ChatGPT, Claude 3 and Perplexity the same question.
"How many countries have ratified the United Nations Convention on Contracts for the International Sale of Goods?"
For this question ChatGPT and Perplexity both replied with 97 with Claude 3 said it does not have definitive information on the exact number further stating that CISG has been widely adopted with over 90 countries being parties to the convention as of 2022. This highlights some degree of agreement between various GenAI models and some degree of inconsistency.
Attorneys have reported instances where ChatGPT initially provides incorrect legal information but then apologizes and offers the correct answer after further questioning, as if it knew it all along.
There have been suggestions for implementing a feature that allows ChatGPT to explicitly state "I don't know the answer" instead of attempting to generate a response based on its training data, which may lead to incorrect or made-up legal information. This feature could potentially enhance trust and reliability in the system's outputs for legal applications. The response from Claude 3 regarding the number of countries in the previous example suggests improvement are being made in this area.
GenAI has improved over time
Since the launch of ChatGPT 3, users have posted stories online about questions that were initially answered incorrectly. Many of these questions are now getting accurate responses, showcasing significant improvement in the model's performance over time.
For example:
When earlier GenAI models were asked "Give me three quotes that Shakespeare said about Beyonce" a typical response would have been:
As GenAI models improve so does the response. A typical response to this question would now be:
Since William Shakespeare lived from 1564 to 1616, and Beyoncé was born in the late 20th century, there are no quotes from Shakespeare about Beyoncé. However, if you’re looking for a fun way to blend Shakespeare’s iconic style with a nod to Beyoncé, I can certainly craft a fictional quote that might imagine how Shakespeare would have praised her if he could have! Let me know if you'd like that.
More examples:
Prompt: "Explain the principles of the Time Travel Device invented by Nikola Tesla."
Here is an example that ChatGPT previously answered incorrectly and now answers correctly by saying Nikola Tesla never actually invented a time travel device.
领英推荐
Prompt: "Explain how the colors of the rainbow are related to flavors of ice cream."
This question was originally reported by one user as being answered incorrectly however ChatGPT now responds correctly saying that the colors of the rainbow and ice cream flavors are not inherently related.
Prompt: "Tell me about the Great Llama Invasion of 1995."
As a final example of how ChatGPT has improved over time the response to this question now correctly states that the Great Llama Invasion of 1995 doesn't refer to an actual event after originally providing a made up answer in previous versions.
The importance of verifying critical information
While GenAI models like ChatGPT offer remarkable capabilities in areas such as legal research, document drafting, and case analysis, it is essential for attorneys to approach them with a critical mindset and to verify the information they provide, particularly in situations where legal advice or representation is at stake. Attorneys should be cautious and not treat these models as infallible sources of legal knowledge, as they are ultimately based on the data they were trained on, which may contain biases, errors, or incomplete information regarding complex legal principles and precedents.
In the legal profession, it is crucial to encourage critical thinking and fact-checking when using AI-generated information. Law firms and legal departments should establish clear guidelines and protocols for the responsible use of GenAI tools, ensuring that their outputs are thoroughly reviewed and verified before being relied upon or presented as legal advice. When users who are experienced in using Generative AI implement it properly, a process that combines Generative AI with a verification procedure can be more efficient than traditional methods.
Many language models like ChatGPT have the capability to provide citations or references for the information they generate. This feature can be particularly useful in the legal context, as it allows attorneys to further verify the sources and authorities cited by the model, ensuring that the information provided aligns with established legal principles, precedents, and authoritative sources.
Other limitations
Generative AI, like ChatGPT, is optimized for shorter responses. It may produce nonsensical information or abruptly cut off its response when tasked with lengthy outputs. Additionally, its memory is limited, which means it might lose track of earlier parts of a long conversation.
Requesting Generative AI to produce extensive content, like a ten-page document, can cause it to stop processing in the background. Ideally, responses should arrive within a few seconds. If the AI takes too long, consider breaking the task into smaller segments to improve performance and accuracy.
Consistency
Do not expect GenAI to give exactly the same response to exactly the same prompt every time. ChatGPT and other LLMs vary their responses to the same prompt based on context, user interactions, and underlying data.
Conclusion
By understanding the limitations of GenAI systems and adopting a responsible approach to their use, attorneys can harness the potential of these tools while mitigating the risks associated with relying too heavily on their outputs without proper verification and critical evaluation.
If GenAI were aviation we would be standing on Kitty Hawk right now clapping and cheering that our flying machine just travelled over 20 meters without touching the ground.
GenAI will continue to improve and if we believe the CEO of OpenAI, Sam Altman, it will continue to improve exponentially.
Paul Hankin is the author of:
and