A bit about hallucinations
Venkat Ramakrishnan
Chief Quality Officer | Software Testing Technologist | Keynote Speaker | Corporate Storyteller
While LLMs are hot, their hallucinations are stark. For a casual user of the LLMs, they might seem to be minor mistakes which are pardonable as we always do with human beings' slips, more so since the LLMs are so polite and human-like in their conversations these days. But those small slips could be dangerous when we start to depend more and more on LLMs for our lives and work, and more so when we automate our work (without human oversight) based on LLM inputs.
A while back, when the Baltimore bridge collapsed because of the collision of a ship on one of its pillars, I was tracking the incident online that evening in my time zone, and I asked one of the LLMs to give me more information about the Baltimore bridge. That LLM is connected to The Internet in real time (not like chatGPT 3.5 whose information are not real time). Initially the LLM didn't seem to have any clue about the incident and didn't say anything about the collision. Eventually, two hours later, there was this mention that the bridge has collapsed, and the date of the collapse was given three months in the past! I was surprised and asked it 'Are you sure?' Then it apologized and gave the correct date and time of the collapse.
It may not sound as a big issue, but imagine if no human had been overseeing the output, and an automated script took that date and time input to take action in some fashion, say, out of my fertile imagination, some ship company's insurance processing. You can appreciate the financial and reputational distress that company would have been put into!
This is an example of an LLM hallucination. This is one type, and there are many others. I am not kidding when I say that organisations are relying more and more on automating based on LLM's outputs. We should be really worried about the quality of these outputs. As a Software Tester and someone who cares about Quality, I am.
领英推荐
Language processing is not simple. I was casually going through the various Python libraries on text processing and I was not at all impressed by the quality of their outputs. I appreciate and deeply respect the efforts that have gone into these open source projects, many of these done by top-notch university departments and their students, but the results are very disappointing.
I am actively working on how to expose these kind of problems and would be glad to join forces with others who are doing it too. If you are interested, give me a buzz and let's talk about it.