登录查看更多内容

Why Infinite Context Is Still Not Enough?

Giri Tatavarty

发布日期: 2024年5月22日

LLMs are on the race to enable out large contexts, Leading the way in terms of context processing capacity are models like Gemini 1.5 Pro with a million token limit, followed by Claude 2.1 and then GPT-4, GPT-4o. The promise of these models lies in their potential to process vast amounts of data — from lengthy videos and audio files to extensive lines of code and voluminous novels. This is a truly remarkable achievement and requires immense amount of research (MoE, Ring Attention , hardware ). The key capability that is often touted is that Model ABC is able to identify a single instance of something in a full video - that is very impressive, but in reality, it is much less useful and has applicability to a small set of use cases.

Going Beyond the Needle in the Haystack

Despite their advanced capabilities, current Large Language Models (LLMs) hit a critical stumbling block: they excel at pinpointing precise details within huge datasets—akin to finding a needle in a haystack—yet they struggle to generate long-form content from extensive context with coherence. This gap poses a substantial hurdle in applications that demand the creation of lengthy, well-articulated, and well-structured outputs.

Here are some use cases where not just long context but also coherent and long output is needed.

Summarizing an extensive text or novel into a few thousand words or writing a cohesive short chapters from a long novel.
Extracting every instance of dialogue by characters in a lengthy novel or series.
Even seemingly straightforward tasks like counting to 1000 in another language.
Annotating videos, pdfs and audios with predefined criteria (eg. find all instances of X)

Simple Case Study:

let try a simply task of annotating and search through syllabus of AP Art History PDF . The PDF has exactly 250 artworks with images and descriptions of the artwork.

Task 1: Identify all the art works (i.e. 250 of them).

Gemini begins to list them but comes to a sudden halt after 134. Unfortunately, though the input can take 1M tokens, the maximum output is capped at 8K tokens. In practice, the output is often even shorter due to fine-tuning or a bias towards brief responses. Listing 250 artworks would not have taken more than 2K tokens, but the model has been trained to pause after a few - as it gets "tired." GPT-4o was no good either; it produces a partial list and stops there and when asked to continue messes up the numbering.

领英推荐

The Latest on LLMs: Decision-Making, Knowledge Graphs,…

Towards Data Science 6 个月前

Long-Context LLMs vs Retrieval-Augmented Generation:…

Amita Kapoor 3 周前

Top LLM Papers of the Week (October Week 4, 2024)

Kalyan KS 4 个月前

Task 2: Identify all the art works (i.e. 250 of them) with a rose or flower in them.

Again incorrect results with some hallucinations and some true positives.

The Take Away:

The long context is still a work in progress for many use cases that go beyond finding a single needle in the haystack. There should be better metrics, evaluations, and benchmarks to assess the quality of results for long context. RULER is a good attempt in this direction, but we need to think beyond the needle or multiple needles in the haystack - How can we construct accurate, long, coherent, and good quality answers? This may not be an easy task and will require significant engineering effort across the board. In the meantime, RAG or a divide-and-conquer approach is probably a better bet than large context queries.

要查看或添加评论，请登录

Giri Tatavarty的更多文章

Operator Goes Grocery Shopping: My AI’s Surprising Frozen-Dinner Haul

2025年1月26日

Operator Goes Grocery Shopping: My AI’s Surprising Frozen-Dinner Haul

Imagine a ChatGPT that doesn’t just answer your questions, but also sees the screen in front of you—clicking the right…

7 条评论
Reasoning LLMs (O1) & the Power of Test-Time Compute

2024年12月15日

Reasoning LLMs (O1) & the Power of Test-Time Compute

Large language models (LLMs) are undergoing rapid advancements, increasingly focusing on improving their reasoning…

11 条评论
Dawn of the Agents: Moving from AI Demos to Customer-Ready Products

2024年10月6日

Dawn of the Agents: Moving from AI Demos to Customer-Ready Products

Agents have been the buzzword of the past few months, and there's a lot to unpack. My goal today is simple: to…

3 条评论
The Age of AI and the Token Factories

2024年3月19日

The Age of AI and the Token Factories

It comes as no surprise that the age of AI has been heralded by numerous AI leaders like Bill Gates, Jensen Huang(…

5 条评论
Attention is All You Have

2024年1月22日

Attention is All You Have

The groundbreaking paper "Attention is All You Need," published by Vaswani et al. in 2017, marked a paradigm shift in…

6 条评论
Lazy GPT - When ChatGPT becomes lazy like humans

2023年10月17日

Lazy GPT - When ChatGPT becomes lazy like humans

The LAZY GPT - When ChatGPT becomes lazy like Humans As we venture deeper into the realm of artificial intelligence…

1 条评论
Beyond the AI Hype & Gloom: A Practical Guide to Generative Models in Business

2023年6月5日

Beyond the AI Hype & Gloom: A Practical Guide to Generative Models in Business

#genai #llm #gpt #aisafety #responsibleai #gpt4 Why this Article? Artificial Intelligence (AI) is often surrounded by…

5 条评论

See all articles

Why Infinite Context Is Still Not Enough?

Giri Tatavarty

Going Beyond the Needle in the Haystack

Simple Case Study:

领英推荐

The Take Away:

Giri Tatavarty的更多文章

社区洞察

其他会员也浏览了

??Top ML Papers of the Week

Watch#7: Small Tweaks with Big Impact

??Top ML Papers of the Week

??Top ML Papers of the Week

Introduction to Knowledge Graphs

Transformer Architectures for Dummies - Part 2 (Decoder Only Architectures)

The Accuracy Problem: GPT is a Tool, Not a Source—And It Lies

Efficiency meets performance: Comparing open-source LLMs - DBRX, Jamba, Qwen

From Weights to Words: A Beginner’s Guide to GenAI and LLMs

Enhancing Long-Context LLMs with SELF-ROUTE: A Hybrid Approach for Optimized Performance

Going Beyond the Needle in the Haystack

Simple Case Study:

领英推荐

The Take Away:

Giri Tatavarty的更多文章

Operator Goes Grocery Shopping: My AI’s Surprising Frozen-Dinner Haul

Reasoning LLMs (O1) & the Power of Test-Time Compute

Dawn of the Agents: Moving from AI Demos to Customer-Ready Products

The Age of AI and the Token Factories

Attention is All You Have

Lazy GPT - When ChatGPT becomes lazy like humans

Beyond the AI Hype & Gloom: A Practical Guide to Generative Models in Business

社区洞察

其他会员也浏览了

??Top ML Papers of the Week

Watch#7: Small Tweaks with Big Impact

??Top ML Papers of the Week

??Top ML Papers of the Week

Introduction to Knowledge Graphs

Transformer Architectures for Dummies - Part 2 (Decoder Only Architectures)

The Accuracy Problem: GPT is a Tool, Not a Source—And It Lies

Efficiency meets performance: Comparing open-source LLMs - DBRX, Jamba, Qwen

From Weights to Words: A Beginner’s Guide to GenAI and LLMs

Enhancing Long-Context LLMs with SELF-ROUTE: A Hybrid Approach for Optimized Performance