登录查看更多内容

Why and how often hallucination occurs in LLMs

Lumina - makers of Analytica

Our software will help you map out your problems, discover what matters the most, and make better decisions faster.

发布日期: 2023年12月14日

An interesting recent paper from Adam Kalai of 微软 and Santosh S. Vempala of Georgia Tech provides an interesting theoretical analysis of how often we should expect an ideal large language model (LLM) to "hallucinate" falsehoods. Simply stated, the rate of hallucinations is very close to the proportion of facts that appear exactly once in the training data.?

Their result follows from the concept of calibration in statistical estimation. If you are asked to repeatedly predict the probability of different events, we would say you are "well calibrated" if, among the times you state a 20% probability, the event ends up occurring 20% of the time, etc. LLMs are trained to predict a probability distribution over all possible textual completions. They generate their response by sampling a text completion from that distribution. Because LLMs are trained (at least during pre-training) using a proper scoring rule, usually log-likelihood, they learn to be well-calibrated predictors of text completions.

A well-calibrated LLM ought to arrive at a conditional probability distribution in which

P( never-before-seen-claim | prompt )

领英推荐

Optical Character Recognition for Table Extraction…

Free Online Courses With Printable Certificates 1 年前

Memorization VS genuine reasoning in LLMs

TuringPost 3 个月前

The power of one-hot encoding

Vizuara 10 个月前

is well-calibrated, where "never-before-seen-claim" means that the claim did not appear in the training corpus. Assuming that far more fictional sentences are possible than factual ones, a never-before-seen-claim will almost always be false, so that this is very close to the frequency of hallucination.

After applying a mathematical theorem from I.J. Good (1953), they show this probability to be nearly equal to the proportion of claims that appear exactly once in the training data.?

Their new insight has direct applicability to addressing hallucinations of citations, biographies, legal cases, etc., and to understanding why certain types of facts might be more prone to hallucination than others. For example, with enough training data, genuinely unique medical diagnoses might be quite rare. And in some domains in which each problem solution is a chain of reasoning steps in which the logical rule used for each step occurs many times in the training data (even though the high-level problem is unique), the rate of hallucination on each step might be manageable.

Why and how often hallucination occurs in LLMs

Lumina - makers of Analytica

Our software will help you map out your problems, discover what matters the most, and make better decisions faster.

领英推荐

Lumina - makers of Analytica的更多文章

社区洞察

其他会员也浏览了

Regularization in Machine Learning

Built to Hallucinate: The Unbreakable Limitation of LLMs

Understanding statistical inference

Defeating the Algorithm With Conversation

Transparency: Seeing a Wider World

Novel LLM Benchmarking methodologies

Long Context Window LLMs or RAG?

Last Week's Takeaway

Can LLMs be 'truly' evaluated?

Categories of GenAI Hallucinations

领英推荐

Lumina - makers of Analytica的更多文章

A.I. nears human-level forecasting quality

Do LLMs “understand” and “reason”?

LLMs with search

?? Exciting News: Generative models are revolutionizing data creation in machine learning projects.

Beyond RAG in the Analytica AI Assistant

AI is rapidly transforming science

Q* Breakthrough Sparks Speculation

DeepMind's new AI system is the world's most accurate 10-day weather forecaster

Can GPT really play chess?

Time to conduct 100 iterations, in about 60 seconds.

社区洞察

其他会员也浏览了

Regularization in Machine Learning

Built to Hallucinate: The Unbreakable Limitation of LLMs

Understanding statistical inference

Defeating the Algorithm With Conversation

Transparency: Seeing a Wider World

Novel LLM Benchmarking methodologies

Long Context Window LLMs or RAG?

Last Week's Takeaway

Can LLMs be 'truly' evaluated?

Categories of GenAI Hallucinations