BSer, Searcher, Researcher: Validating Generative AI Texts (#GPT and other #LLMs)
Damien Riehl
Lawyer + Speaker + Writer + Builder + Mediocre Coder + Musician + VP Solutions Champion
I've been thinking a lot about this — #GPT and #LLMs hallucinating both (1) propositions and (2) citations. Users (especially lawyers) need to know that the text is trustworthy. ("Is this BS?") And Users need to know textual provenance. ("Where did this come from?")
To address this problem, there are (at least) three options:
1. Bullshitter
2. Searcher
3. Researcher
1. BULLSHITTER. LLM hallucinations gone wild. Unchecked chaos.
2. SEARCHER. LLM generates text. Run queries to substantiate (or debunk) the text.
This is like a senior partner saying "I'm pretty sure there's a case out there that says X. Find it!"
Good luck.
3. RESEARCHER. Atomize the <PROPOSITION> + <CITATION> graph. Give each <PROPOSITION> and <CITATION> unique identifiers. User's query builds most-common ground truth (non-hallucinated).
Which to choose?
#1 BULLSHITTER is a nonstarter.
#2 SEARCHER seems obvious. But many rabbit holes.
e.g., sentence = BS hallucinated
e.g., sentence recites bad law (e.g., Plessy, Roe v. Wade)
#3 RESEARCHER is harder to build. But most trustworthy. Built atop ground truth.
I would expect that most Researchers will turn out to be an adversarial network of a BSer and a Searcher wearing a trenchcoat.
Product Manager | Agile | SaaS | B2B
2 年Good thoughts. I've had the experience of feeding three sequential compliance prompts to ChatGPT, and the third one got no result. That was a bit odd because that item does exist, but made me think that a Confidence Threshold of some sort may be helpful. Zero result may be better than Bullshit. As you well know, the User of these future systems may again be less experienced staff who are at greatest risk of grabbing onto an attractive answer.
Law + people + messy reality + ways of working + organisations + software + data
2 年Nicely done, Damien. Succinct and compelling.