ChatGPT failures aren't "hallucinations"
As Large Language Models (LLM) have gained popular attention, much has been written about their potential benefits and risks. In these discussions, it has also been noted that LLM generated texts may include information that, while confidently stated, is completely incorrect. LLMs will generate references, article titles, quotes, dates, places, URLs, and other items that are non-existent.
When explaining these faulty constructions, many commentators have taken to saying that the LLM is "hallucinating". But what exactly does this mean?
Clinically, hallucinations are sensory perceptions that occur in the absence of actual external phenomena. In medical resources and popular descriptions, hallucinations are presented as disorders arising from causes such as neurological disturbances or drugs. Asserting that LLMs are hallucinating is to imply that they have formed a flawed model of the world due to an abnormal influence. But this is not what is occurring when an LLM manufactures a flawed statement.
Referring to LLM failures as hallucinations obscures the extent to which LLM are truly disconnected from reality, downplays the power of communicative statements, and gives LLM an unprecedented 'benefit of the doubt'.
Large language models have no connection to reality. They do not have any access to or any way of processing primary data, sources, or facts. No matter what prompt you provide, an LLM will alway respond with a text that seems most likely to exist given the set of documents it was trained on. LLMs are designed to generate plausible texts, not accurate ones. An LLM generated text containing a manufactured quote or a non-existent article reference is not an abnormality; it is simply a case when it is easier for a reader grounded in the real world to detect the tools' 100% focus on text plausibility and 0% concern for accuracy.
Referring to LLM failures as hallucinations also conflates perceptions and statements. Perceptions matter to the individual, but it is when they become statements that they acquire social, legal, and practical power. You are free to believe whatever you wish about your neighbor, but if you make a public statement accusing them a crime and materially damage their reputation you may be guilty of libel. When perceptions become statements they gain the power to change the world. LLM failures are significant not because they are 'in their minds', but because they are in their statements.
Since hallucinations are due to abnormal conditions and are fundamentally internal, characterizing LLM failures as hallucinations has the effect of encouraging us to give this technology 'the benefit of the doubt'. When we are told someone is hallucinating, we can nod understandingly and hope they get the help they need. At the same time, we doubt the veracity of anything they say and impose limits on their public speaking until they recover their normal faculties. It is this logic that commentators characterizing LLM failures as hallucinations are encouraging us to adopt for the technology.
However, this logic can't meaningfully be applied to large language models. If we accept that LLM are internally hallucinating, we should discount any statements they make and prevent them from engaging in influential communication, at least until they recover. But what does it mean to have a communication tool that we don't allow to make statements? Moreover, disconnection from reality is not an abnormal state for LLMs. It is how they are made. so they can't recover. Referring to LLM failures as hallucinations is at best a misleading distraction and at worst, a fundamental error which will lead us to incorrectly assess the true usefulness of this new technology.
When an LLM generates a flawed statement, it is failing. Like a calculator that doesn't add properly, it loses justification for existence. Being honest about this, both removes the sense of panic associated with the 'risks' and allow us to more clearly discover the true benefits of the technology.
Epilogue
I was unable to get either chatGPT or Google Bard to draft a blog post about LLM failures as hallucinations. No matter what they prompt, the resulting texts referred to LLM "making mistakes" and never used the words flaws, failures, or errors. Mmmmmm.....
Independent Fundraising Consultant
1 年Excellent post - thank you!
Assistant Professor of Library Innovation at University of Maryland iSchool
1 年Yes! Really enjoyed this.
Product Builder in Tech?Data
1 年Agreed. I think LLMs are great in creative fields. But need significant improvements before they can be “source of facts” or anything based on facts.