How Often Does AI Hallucinate?
John Andrews
Creative Problem Solver | Retail Co-Innovation Leader | Marketing Technologist
It Depends On The Platform.
Generative AI makes things up sometimes. Vectara, an AI search platform, has created an open-source Chatbot hallucination leaderboard on GitHub. Hallucinations ranged from 3% for Chat GPT 4 to over 27% for Google Palm 2 Chat.
This is predictably disconcerting for users and for businesses seeking to integrate these models into their workflows. We are used to computer systems being precise and the information that they provide is reliable, so when we get incorrect, false, or just plain made-up details, we are understandably uncomfortable.
Like humans, Generative platforms aren't precision systems. They are the sum of content on which they have been trained, which itself isn't always correct or factual and sometimes intentionally misleading. If we have learned anything from social media, 'facts', are open to interpretation. Even fact-checking is open to interpretation based on the entity checking the facts. Then, there is simply the need to dig deeper into AI-provided information, as the infamous legal brief submitted with case hallucinations demonstrated.
Vectara's approach is an exciting step towards building some safeguards and balances to the information being created by generative AI. How often are the platforms returning hallucinations from sets of known data and how skewed is the data being returned?
I imagine we'll see more models like this quickly developing, creating another dynamic. Who decides what is a hallucination and what is not? Similar to the factual debates in the social media realm, who is the arbiter of truth and on what information is that truth based?
领英推荐
Who decides what is a hallucination and what is not?
Cambridge Dictionary recently picked Hallucinate as its word of the year, referring to the AI reference of the word. Some people disagree with using the term as it anthropomorphizes AI but it has taken hold in the common lexicon. At this point, most users are aware that AI isn't infallible and will even agree with false premises when challenged. Most platforms are labeling this now as well and new AI instances like Snap are making light of the behavior by apologizing in advance.
The bottom line is AI is developing at a blistering pace and it will not only hallucinate but also be managed in ways that may or may not align with facts. As AI platforms begin to interact, this phenomenon will morph in unexpected ways. In this way, they are becoming more human than not.
Web3 Educator | Storyteller | Digital Growth Strategist
10 个月These are profound thoughts, I hope developers are looking into checking these errors to prevent the machines from morphing into us
Web3 Educator | Storyteller | Digital Growth Strategist
10 个月Very interesting topic
Storytelling; Analytics; Measurement; Multimedia Content Strategy | Teaching all of the above
11 个月We're dealing with this quite a bit in higher ed. You have to know what truth is before you can tell if it is what you are getting.
Meeting customer needs globally by managing of Professional Services, Technical Support, and Customer Success Management | Recognizing the Voice of the Customer and enhancing the Customer Experience
11 个月Hallucination is not purely random or uncontrolled event. If you are working on putting a LLM generative AI component into your workflow, you need to not only know the source of the model, but the context window size (how much the model can process in a single query), training/tuning methods, and other factors matter. You can completely design your solution so that hallucination is NOT a factor. That being said, one risk of relying on a supplier like OpenAI for a product like ChatGPT 4 is that they won’t divulge some of the key information you need for a more deterministic design and can change the tuning at any time, or can become unavailable or overloaded at any time.
Logistics Ambassador who is Logistically Obsessed | Co-Founder MonarKonnect
11 个月You know I read these post have a great thought go to comment and there is John's comments which echos my thoughts lol