登录查看更多内容

A Series of Unfortunate Decisions

Tomasz Tunguz

发布日期: 2024年5月10日

When a person asks a question of an LLM, the LLM responds. But there’s a good chance of an some error in the answer. Depending on the model or the question, it could be a 10% chance or 20% or much higher.

The inaccuracy could be a hallucination (a fabricated answer) or a wrong answer or a partially correct answer.

So a person can enter in many different types of questions & receive many different types of answers, some of which are correct & some of which are not.

In this chart, the arrow out of the LLM represents a correct answer. Askew arrows represent errors.

Today, when we use LLMs, most of the time a human checks the output after every step. But startups are pushing the limits of these models by asking them to chain work.

Imagine I ask an LLM-chain to make a presentation about the best cars to buy for a family of 5 people. First, I ask for a list of those cars, then I ask for a slide on the cost, another on fuel economy, yet another on color selection.

The AI must plan what to do at each step. It starts with finding the car names. Then it searches the web, or its memory, for the data necessary, then it creates each slide.

领英推荐

Putting AI to Work

Michael Dell 1 年前

Artificial Intelligence #189

Andriy Burkov 1 年前

Artificial Intelligence #189

Andriy Burkov 1 年前

As AI chains these calls together the universe of potential outcomes explodes.

If at the first step, the LLM errs : it finds 4 cars that exist, 1 car that is hallucinated, & a boat, then the remaining effort is wasted. The error compounds from the first step & the deck is useless.

As we build more complex workloads, managing errors will become a critical part of building products.

Design patterns for this are early. I imagine it this way :

At the end of every step, another model validates the output of the AI. Perhaps this is a classical ML classifier that checks the output of the LLM. It could also be an adversarial network (a GAN) that tries to find errors in the output.

The effectiveness of the overall chained AI system will be dependent on minimizing the error rate at each step. Otherwise, AI systems will make a series of unfortunate decisions & its work won’t be very useful.

Tomasz Tunguz

114,326 位关注者

Max Anfilofyev

Chief CareBot | Scaling Patient Care 8x with AI | Chief Product Officer @ DR | Connect to scale with AI

6 个月

Getting a second Agent validate the first Agent's output is an easy way to reduce hallucinations

2 次回应

Puneet A.

Co-founder and CEO at AIMon | Helping you build more deterministic LLM Apps

6 个月

Great post Tomasz Tunguz. While validation checks are essential for LLMs, the lack of scalable and cost-effective solutions is a major hurdle. In the evaluation phase, Engineers make limited use of expensive API calls to LLMs for checks like Hallucinations. But this doesn't scale to production deployments. At AIMon, we are addressing this problem by building lightweight solutions that validate LLM outputs with low-latency and accuracy on par with GPT4 (based on the industry standard benchmarks). I am happy to grab a slot and chat more about this topic.

Jay B.

?Technophile & Software Creator | C-Suite Network Liaison

6 个月

Factors like model limitations or ambiguous questions can contribute to inaccuracies, ranging from minor misunderstandings to more significant errors. It's important to approach AI-generated responses with a critical mindset and cross-reference information when necessary.

Thiyagarajan Maruthavanan (Rajan)

Managing Partner @Upekkha (SF/India) | 100+ SaaS Founders → Vertical AI Acceleration | Weekly Notes: India × Global Markets x AI.

6 个月

Before hallucination is fixed no enterprise AI will be adopt AI in production. They will be stuck in the pilot to production phase.

Pedro Cortés

SaaS Company? I’ll rewrite your vague landing page into a clear, conversion-focused page in 7 business days.

6 个月

Good point on LLM errors. But what's the game plan for handling these misfires?

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

A Series of Unfortunate Decisions

Tomasz Tunguz

领英推荐

Tomasz Tunguz

114,326 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

AI Agents: A Licence to Think (with a Little Help from James Bond)

The road to AGI (and beyond): it's all about human alignment

#E1I39: Savory Silicon Servings

AI in Overdrive: The Warp Speed Leap into Advanced Reasoning

AI Self-Regulation by USA Tech firms

Challenges and Opportunities: CIFDAQ's Roadmap Unfolded

The AI Era Digest 24-3 Issue

Meet Imandra

AI by the Cold Light of Day

How a Small AI Tweak made a huge impact:

领英推荐

Tomasz Tunguz

114,326 位关注者

Theory Two

2024年11月22日

My Little Library

2024年11月20日

75 Cents per Month

2024年11月19日

Small but Mighty AI

2024年11月15日

The Post Election Surge is Unevenly Distributed

2024年11月11日

I Talk to Robots While Driving

2024年11月8日

The White Collar Revolution

2024年11月6日

Profit Dollars per GPU Dollar

2024年11月5日

My AI Rube Goldberg Machine

2024年10月29日

Productivity One Year from Now

2024年10月28日

社区洞察

其他会员也浏览了

AI Agents: A Licence to Think (with a Little Help from James Bond)

The road to AGI (and beyond): it's all about human alignment

#E1I39: Savory Silicon Servings

AI in Overdrive: The Warp Speed Leap into Advanced Reasoning

AI Self-Regulation by USA Tech firms

Challenges and Opportunities: CIFDAQ's Roadmap Unfolded

The AI Era Digest 24-3 Issue

Meet Imandra

AI by the Cold Light of Day

How a Small AI Tweak made a huge impact: