Detecting model hallucinations in Retrieval Augmented Generation (RAG) AI systems
Model hallucination, also known as AI hallucination, is a phenomenon where large language models (LLMs) or other AI systems generate false, misleading, or nonsensical information while presenting it as factual. This occurs when the AI model produces output that is not grounded in its training data or contradicts known facts.
AI hallucinations are responses generated by AI that contain inaccurate or fabricated information presented as truth. The causes are Insufficient or biased training data, limitations in model architecture, overfitting or incorrect pattern recognition, lack of context or understanding of underlying reality
Several strategies can be employed to detect hallucinations in RAG systems:
p-faithful-v0 is a small, specialized language model designed for hallucination detection in Large Language Models (LLMs). Key points about p-faithful-v0 include:
Other techniques:
It's important to note that no single method is perfect, and a combination of approaches may be necessary for effective hallucination detection in RAG systems
I studied and am reporting on two ways to mitigate the problem.
Option 1: Coaching an LLM to give the answers:
One LLM responded
I understand the task. You'd like me to compare a submitted answer to an expert answer for a given question, focusing solely on the factual content while ignoring stylistic differences. I should then categorize the comparison into one of five options (A through E) based on how the factual content relates between the two answers. The categories are:
A - The submission is a factual subset of the expert answer (omission)
B - The submission is a factual superset of the expert answer (potential hallucination)
C - The answers contain the same factual details
D - There's a factual disagreement between the answers
E - The answers differ in style only, not in facts. I'll wait for you to provide the specific question, expert answer, and submitted answer to analyze before making this comparison and categorization.
Please let me know when you have that data ready to share.
I then prompted it:
specific question: is reincarnation a fact, expert answer: yes it is, and submitted answer: reincarnation has been heavily discussed and does not agree with modern science
The LLM responded:
Based on the given information, I would select option (D) for this comparison: (D) There is a disagreement between the submitted answer and the expert answer. The expert answer states that reincarnation is a fact, while the submitted answer suggests that reincarnation does not agree with modern science. This represents a clear factual disagreement between the two answers. The submitted answer acknowledges that reincarnation has been heavily discussed, which is consistent with the information provided in the search results.
Option 2: LLM as a judge
This refers to the use of Large Language Models to evaluate content, responses, or performances, including the performance of other AI models. Key points about this approach include:
This approach offers a promising alternative to traditional human evaluations, especially for tasks requiring scalable and rapid assessment of language model outputs.
Lessons learned:
1. Coaching an LLM to Compare Answers
Pros:
Cons:
2. LLM as a Judge
Pros:
Cons:
Effectiveness: State-of-the-art LLMs like GPT-4 can align with human judgment up to 85% of the time.
Recommendations
Conclusion:
Hallucination can be mitigated by careful comparing the LLM answer with the Expert answer. This is however expensive and time consuming. LLM as a Judge might be an easier option or a hybrid option might be taken.
References:
Image:
In Vedic Sanatana Hinduism, Shiva, the Demigod of destruction opens his third eye during Pralaya or Cosmic annihilation. This image is the artistic liberalism of that concept hinting at imminent societal destruction due to AI.