登录查看更多内容

Long Context vs RAG: The Final Take.

Abdelhadi Azzouni

HeyCloud | PacketAI | Forbes u30 | PhD | Building tools for devops and cloud engineers.

发布日期: 2024年4月19日

+ 关注

Will large context LLMs kill the need for RAG architecture??

Long Context vs RAG: The Final Take.

Context

Google released Gemini 1.5 last week, and with it, it spurred a large debate online on whether this is the end of RAG. The reason is that Gemini1.5 has a very large context window (input size) of 1M multimodal tokens, and up to 10M text tokens. On the surface, one might think: if the LLM can take all my data at once, why bother with building a RAG system?

Let’s analyse both sides of the debate:

Arguments for RAG's Continued Significance:

Efficiency and Cost-effectiveness:?RAG remains attractive for scenarios where processing power (think inference GPUs) is limited.
Data freshness and Dynamic data: if the data at hand is very dynamic, then it would be super costly to ingest everything in the LLM. Example, if only part of the website/codebase changes, why ingest the whole thing while you can retrieve the updated part only?!
Deterministic Security and Access Control:?The deterministic nature of RAG provides an edge in production-grade applications with strict security and access control requirements.

Arguments for Long Context

LLM-native retrieval is multimodal by default: It works on any data format, code, text, audio and video! Gemini 1.5 can retrieve a key frame in a video of 2h. Building an external multimodal RAG system would be very complex, as you need to deal with different data formats. When most RAG systems have hard time dealing with different data format, Gemini 1.5 has solved it on the model level.
Simplicity: For small scale retrieval tasks, like building an MVP or simple apps, there should be no need for a complex RAG architecture. You can easily feed all data to the LLM and retrieve information.
Diminishing costs:?As Gemini evolves, its potential to handle larger datasets at lower costs could further diminish the need for RAG in much more use cases than simple apps.

How I think about it: it’s a tradeoff

I know it’s a boring conclusion but just like many things in life, it’s a tradeoff. RAG itself won’t be dead soon, but 90% of small scale use cases won't need it anymore. Most dataset can fit in 1M tokens and even if the cost of inference on 1M tokens is high, the cost of building a RAG system for a small project is usually not worth it.

Sandeep Jain 3 个月前

Neo4j Graph Tech Weekly (E:11)

Neo4j 1 年前

Data-Parallelism in Rust with the Rayon?Crate

Luis Soares, M.Sc. 4 个月前

In addition, LLM native retrieval is actually very similar to an internal RAG. LLMs use token Key-Value caching (KV cache) to retrieve relevant tokens during inference. Instead of using cosine similarity to "retrieve" the most relevant chunks, you use self-attention to attend on the most relevant tokens. But both just reuse the pre-computed embeddings.

This reduces the cost of inference but we still don’t have a rigorous cost comparison of KV caching vs external RAGs.

For large scale, production use cases, I think RAG will definitely stay dominant. Primarily for security control reasons and costs.

The RAM, Hard-Drive Analogy

A good way to think about this tradeoff is the analogy to memory layers in a computer. RAM is a much more suitable place to store immediately needed data for computation by the processor. However, since RAM is too expensive, we extend it with an external storage (Hard drive) that is way larger but a bit more complex to manage. For small programmes, you can load the entire thing in RAM and execute it. However, once your programme needs external files and data, you will need to use a hard drive.

In conclusion

My quick takes:

RAG will still be used for complex production systems
Long context models will eat up simpler/pre-production use cases
We are still to see a rigorous cost comparison between native retrieval and RAG.

Resources

The debate unfolding on twitter: https://twitter.com/search?q=rag long context&src=typed_query
https://vectorize.io/2024/02/16/rag-is-dead-long-live-rag/

JJ Delgado

9-figure Digital Businesses Maker based on technology (Web2, Web3, AI, and noCode) | General Manager MOVE Estrella Galicia Digital & exAmazon

7 个月

Exciting insights on the latest developments in AI Can't wait to dive into this debate. ?? Abdelhadi Azzouni

1 次回应

Mirko Peters

Digital Marketing Analyst @ Sivantos

7 个月

Hey, sounds like a spicy debate! AI is evolving fast, huh?

1 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Long Context vs RAG: The Final Take.

Abdelhadi Azzouni

HeyCloud | PacketAI | Forbes u30 | PhD | Building tools for devops and cloud engineers.

Long Context vs RAG: The Final Take.

Context

Arguments for RAG's Continued Significance:

Arguments for Long Context

How I think about it: it’s a tradeoff

领英推荐

The RAM, Hard-Drive Analogy

In conclusion

Resources

更多精彩文章

社区洞察

其他会员也浏览了

An Architectural Guide to React State Management

DATA Pill #056 - Fine Tuning vs. Prompt Engineering LLM, Kedro-Snowflake plugin, and more…

Intelligent Abstraction Part 1: Clear Data Model

The World’s Fastest IFC Loader/Parser?

[EN] Data Engineering, Climbing and Art | [PT] Engenharia de Dados, Escalada e Arte

Understanding Snowflake ID, UUID, and ULID: Choosing the Right Identifier for Your System

Is DIY Entity Resolution Right for You? 5 Red Flags to Watch Out For

Monoliths are bad

Managing State in Jetpack Compose: Individual MutableStates vs. Single MutableState

Understanding the Dart Event Loop: A Detailed Guide

Long Context vs RAG: The Final Take.

Context

Arguments for RAG's Continued Significance:

Arguments for Long Context

How I think about it: it’s a tradeoff

领英推荐

The RAM, Hard-Drive Analogy

In conclusion

Resources

Measuring the stubbornness of LLMs

2024年4月23日

Text splitting (chunking) for RAG applications

2024年4月11日

Answers to Daniel Gross' questions about world after AGI

2024年1月20日

RAG is not easy, and no, you -most likely- can't create a useful app in one afternoon

2024年1月16日

Gemini MMLU results

2023年12月8日

OpenAI Dev Day - takeaways and Implications for the eco-system

2023年11月7日

Key takeaways from the #GPT4 technical report

2023年3月16日

Adopter l’IA dans la DSI? 4 questions essentielles que les CIOs se posent

2019年2月11日

IA pour IT. Retour Sur Investissement de AIOps

2019年2月5日

“Qu’est-ce qu’on a fait sans l’IA?”

2019年1月31日

社区洞察

其他会员也浏览了

An Architectural Guide to React State Management

DATA Pill #056 - Fine Tuning vs. Prompt Engineering LLM, Kedro-Snowflake plugin, and more…

Intelligent Abstraction Part 1: Clear Data Model

The World’s Fastest IFC Loader/Parser?

[EN] Data Engineering, Climbing and Art | [PT] Engenharia de Dados, Escalada e Arte

Understanding Snowflake ID, UUID, and ULID: Choosing the Right Identifier for Your System

Is DIY Entity Resolution Right for You? 5 Red Flags to Watch Out For

Monoliths are bad

Managing State in Jetpack Compose: Individual MutableStates vs. Single MutableState

Understanding the Dart Event Loop: A Detailed Guide