The hyped and overlooked potential of chatbots
Recent developments
Quote of the week
It is the responsibility of intellectuals to speak the truth and expose lies. ― Noam Chomsky
What to make of all this
Most people are overestimating what chatbots can do. They forget they are working with a prediction model still very much in its experimental phase. This translates into all sorts of problems. One of them being legal professionals who still use chatGPT for legal research. I was completely dumbfounded that this is still happening even after the Avianca case got a lot of attention in the news and recent research revealed how 69% of legal prompts in ChatGPT 3.5 lead to hallucinated responses.
Meanwhile we see large law firms like Clifford Chance proudly announce they are embracing AI with Copilot. We should take these messages with a grain of salt. While purchasing Copilot comes with Microsoft's assurance that your company’s data won’t be used to further train the model, it will still hallucinate.
To get good results from a large language model (LLM) when researching or reviewing/drafting documents you need to apply what is called Retrieval Augmented Generation (RAG). With this technique you can tell the LLM where to look for specific data. Storing this data in a vector database will enable the LLM to retrieve only the relevant text, which enriches the output even more. Especially if the system allows you to see the sources that were used for a specific output.
Many law firms are yet to adopt RAG. This makes the news about Gemini 1.5’s ability to deliver high-quality outputs using a context window of 1 million tokens super exciting. Expanding the context window - roughly the maximum number of characters in an interaction with a chatbot- to such a size while ensuring accurate results is really difficult. Gemini 1.5 manages to meet this challenge, passing what experts dub the ‘needle in the haystack test’. Essentially, you can input the entirety of seven Harry Potter books into the system, which it processes in just a few moments. Then, by asking specific questions about a particular paragraph in one of the books, Gemini 1.5 can retrieve a detailed answer from that exact passage. Just imagine the potential applications of this feature within the legal field and what happens when you combine this huge context window with RAG. The sky is the limit.
But unfortunately most people have not been focusing on this breakthrough. Instead they have been worried about Gemini refusing to generate images of white people in an attempt to correct the model’s tendency to be biased against non-white people.
The technique that was causing this problem is called prompt transformation. It was originally developed to enhance the quality of generated images, without the need for prompt engineering skills. Essentially, when a user provides a brief, simple prompt for image generation, the chatbot expands this into a detailed description of the desired scene. This enriched prompt is then processed by the LLM to generate a more refined result.
The problem, of course, is that wrong assumptions can be made in this process, especially if you are directing the prompts in a certain political direction, like Gemini was doing. When you’re generating an image of say a car or a house this should not cause much of an uproar, but when you are asking for an image of the founders of the U.S. and you are getting an image with non-white people although the founders of the US were all white, this can lead to problems. In this case it sparked a cultural war over chatbots we’ve already been seeing on social media.
There are several solutions to this problem. One is to store user behavior and use this information to transform prompts in favor of their profile. If someone leans more to the left, simply generate a more liberal response and vice versa. Of course this solution only increases the filter bubble we all find ourselves in.
A much better solution is by being transparent about what has been added or changed to your prompt. And while we’re at it, let’s also be transparent about the data that was used to generate the answer, much like the food industry is transparent about what’s in our food.
In a few weeks Google will have hopefully fixed this issue and we’ll have all forgotten about it. But I promise you the 1 million token context window will have a much bigger and lasting impact. OpenAI and the other competitors definitely have some catching up to do.
Great insights on the current challenges and advancements in AI. The implications for the legal field are fascinating. How do you foresee the legal industry adapting to these rapid technological changes?