Long Context vs RAG: The Final Take.
Abdelhadi Azzouni
HeyCloud | PacketAI | Forbes u30 | PhD | Building tools for devops and cloud engineers.
Will large context LLMs kill the need for RAG architecture??
Long Context vs RAG: The Final Take.
Context
Google released Gemini 1.5 last week, and with it, it spurred a large debate online on whether this is the end of RAG. The reason is that Gemini1.5 has a very large context window (input size) of 1M multimodal tokens, and up to 10M text tokens. On the surface, one might think: if the LLM can take all my data at once, why bother with building a RAG system?
Let’s analyse both sides of the debate:
Arguments for RAG's Continued Significance:
Arguments for Long Context
How I think about it: it’s a tradeoff
I know it’s a boring conclusion but just like many things in life, it’s a tradeoff. RAG itself won’t be dead soon, but 90% of small scale use cases won't need it anymore. Most dataset can fit in 1M tokens and even if the cost of inference on 1M tokens is high, the cost of building a RAG system for a small project is usually not worth it.
领英推荐
In addition, LLM native retrieval is actually very similar to an internal RAG. LLMs use token Key-Value caching (KV cache) to retrieve relevant tokens during inference. Instead of using cosine similarity to "retrieve" the most relevant chunks, you use self-attention to attend on the most relevant tokens. But both just reuse the pre-computed embeddings.
This reduces the cost of inference but we still don’t have a rigorous cost comparison of KV caching vs external RAGs.
For large scale, production use cases, I think RAG will definitely stay dominant. Primarily for security control reasons and costs.
The RAM, Hard-Drive Analogy
A good way to think about this tradeoff is the analogy to memory layers in a computer. RAM is a much more suitable place to store immediately needed data for computation by the processor. However, since RAM is too expensive, we extend it with an external storage (Hard drive) that is way larger but a bit more complex to manage. For small programmes, you can load the entire thing in RAM and execute it. However, once your programme needs external files and data, you will need to use a hard drive.
In conclusion
My quick takes:
Resources
9-figure Digital Businesses Maker based on technology (Web2, Web3, AI, and noCode) | General Manager MOVE Estrella Galicia Digital & exAmazon
7 个月Exciting insights on the latest developments in AI Can't wait to dive into this debate. ?? Abdelhadi Azzouni
Digital Marketing Analyst @ Sivantos
7 个月Hey, sounds like a spicy debate! AI is evolving fast, huh?