Phased Approach | Reports of the death of RAG has been greatly exaggerated

Richard Skinner

CEO @ PhasedAI | Helping Enterprise Transform Operations with Generative AI

发布日期: 2024年8月18日

Is it just me or does every feature released by OpenAI and Anthropic lead to breathless headlines in the AI press that this is the end of RAG?

The latest feature that leads to these headlines is prompt caching released in Claude. My theory is that most of these AI YouTubers / writers have never actually used RAG for anything more than chatting with a PDF and so have no idea what the business and enterprise applications of it are.

So today I want to talk about this new feature from Anthropic why it is great but I also want to talk about what RAG is and what it does and how features like these are at best complimentary to RAG in an actual business context.

But before we dive into why prompt caching isn’t the silver bullet that some might think it is, let’s take a step back and talk about Retrieval-Augmented Generation (RAG)—what it really is and why it’s so crucial, especially for businesses dealing with large databases of documents.

What Exactly is RAG?

At its core, RAG is a hybrid approach that combines two powerful capabilities: retrieval and generation. Imagine you’re running a company with a vast knowledge base—think thousands of documents, technical manuals, customer interactions, or legal contracts. Finding the right piece of information quickly and accurately is a massive challenge. This is where RAG comes into play.

RAG uses a retrieval mechanism to sift through all that data and pull out the most relevant chunks of information based on the query. It doesn’t stop there, though. Once the relevant data is retrieved, it hands that over to a language model (the generation part) to craft a coherent, contextually relevant response. This means the AI isn’t just spitting out pre-existing text—it’s creating a nuanced answer based on the most pertinent data available.

Why is RAG So Important for Businesses?

Now, let’s connect the dots to why this matters in a business context. If you’re managing a large-scale operation, your data isn’t just big—it’s massive. We’re talking millions of tokens worth of information, spread across various systems and formats. In such environments, RAG becomes indispensable because:

Scalability: RAG is built to handle the vast amounts of data that businesses accumulate. It doesn’t just rely on a static, pre-defined context; it actively retrieves the latest, most relevant information whenever it’s needed, making it perfect for dynamic, ever-evolving data landscapes.
Accuracy: Businesses can’t afford to get things wrong. Whether it’s providing customer support, making legal decisions, or analysing technical data, accuracy is key. RAG’s retrieval component ensures that the generated responses are based on the most up-to-date and relevant data, which is critical in high-stakes environments.
Contextual Understanding: In a business setting, understanding the context is everything. RAG’s ability to pull from a vast pool of data and generate responses that are contextually aware means that the AI can deliver insights that are not just accurate but also highly relevant to the specific query or problem at hand.

So, while prompt caching is an exciting development, it's important to understand its true capabilities and limitations—especially when comparing it to something as robust as Retrieval-Augmented Generation (RAG).

What Does Claude’s Prompt Caching Really Do?

Claude’s prompt caching is a feature designed to make your interactions with AI models more efficient, particularly in scenarios where you need to repeatedly access the same information within a short period. The basic idea is this: if you’re working with a large document or a complex set of instructions, you can cache that information so that it doesn’t have to be reprocessed every time you interact with the model. This can dramatically reduce costs—by up to 90%—and improve response times by up to 85%, according to Anthropic.

Xencia Technology Solutions 7 个月前

Building Data Moats in the age of LLMs

Anirudh Shenoy 10 个月前

The Rise of Open-Source LLMs in Enterprises

Data Science Dojo 7 个月前

Here’s how it works: when you cache a prompt, Claude stores that information and keeps it ready for future use. If your session remains active, the cache persists, allowing for quick and cost-effective responses to subsequent queries. However, there’s a catch—this cache will only stay alive for five minutes of idle time. This means that if you don’t use the cached data within five minutes, the cache is cleared, and you’ll have to reload the information, which incurs additional costs and time.

The Limitations of Prompt Caching in Business Contexts

Now, this feature is fantastic for specific, short-term tasks where you’re dealing with stable, reusable content over a brief period—like debugging a codebase or analyzing a single document. But when we talk about large-scale enterprise environments, things get a bit more complicated.

In a business setting, you’re often dealing with vast amounts of data—millions of tokens spread across countless documents, databases, and knowledge repositories. The 200,000-token context window that Claude offers is substantial, but it’s still limited. In real-world applications, you’ll often need to pull from multiple sources that far exceed this limit. This is where the limitations of prompt caching become evident.

Why RAG Remains Essential

RAG excels in scenarios where you need to gather information from multiple, disparate sources. Instead of trying to fit everything into a single context window, RAG allows the model to retrieve relevant chunks of data from a vast knowledge base, embedding and combining them to generate a comprehensive, contextually relevant response. This makes RAG incredibly powerful for enterprises where the data landscape is both vast and varied.

While Claude’s prompt caching can be a game-changer for tasks that require multiple prompts on the same corpus of data, it simply can’t handle the scale and complexity that RAG is built for. For instance, if you’re interacting with a large database of customer service logs, legal documents, or technical manuals, you’re likely dealing with far more information than can be cached or processed in a single context window. RAG’s ability to retrieve and integrate multiple pieces of data from various sources ensures that you get a complete and accurate response, regardless of the size or complexity of the data involved.

The Bottom Line: Complementary, Not a Replacement

So, while prompt caching is a valuable tool—especially for reducing costs and speeding up interactions in specific scenarios—it’s not a replacement for RAG. Instead, it serves as a complementary feature that can enhance the efficiency of your AI systems when used appropriately. In cases where you’re working with a single document or a stable set of data points, prompt caching can save time and money. But for the larger, more complex tasks that define enterprise AI applications, RAG remains indispensable.

In essence, prompt caching is a smart way to optimise performance in the short term, but when it comes to handling the sprawling, interconnected data environments typical of large businesses, RAG’s comprehensive retrieval and generation capabilities are still the gold standard. So, next time you hear that prompt caching is the end of RAG, remember that it’s just one piece of a much larger puzzle in the world of AI.

Why are people always Ragging on RAG?

I have a couple of theories on this.

Theory 1: As I said above the AI press is filled with a lot of hobbyists that don't actually use these tools in business. They can load a VERY LARGE PDF in a vector database and ask it questions and then 2 months later they can load the PDF directly into the context window of the LLM and so they miss the fact that business are not just doing small POCs but have millions of documents.

Theory 2: Wishful thinking - RAG is very hard work. Its not just a vector db and a langchain integration. There are so many variables and tweaks to get a RAG pipeline to work consistently. People in the industry would love if one of the Gen AI companies could magically solve this issue so that they didn't have to worry about embedding models, chunk sizes, semantic search algorithms, graph database integrations etc.

If you made it this far in the Newsletter

As always, if you are curious about how the topics discussed in this newsletter relate to your business or projects that you are working on, please send a message and I'd be happy to have a chat and advise.

Phased Approach

319 位关注者

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

1 个月

The newsletter's emphasis on prompt caching as a game-changer might overshadow the enduring value of RAG for complex knowledge extraction. Recent research from Stanford suggests that human-in-the-loop systems, combining AI with expert oversight, could offer superior performance in certain domains. How would Phased Approach integrate human expertise into its recommendations for businesses leveraging AI?

查看更多评论

要查看或添加评论，请登录

Richard Skinner的更多文章

Phased Approach | Why Trustworthy AI Matters for Your Business

2024年10月6日

Phased Approach | Why Trustworthy AI Matters for Your Business

In early 2023, a major tech company found itself in hot water. Their newly launched AI-powered chatbot, which had shown…
Phased Approach | NotebookLM a surprisingly good tool Made by Google

2024年10月1日

Phased Approach | NotebookLM a surprisingly good tool Made by Google

Google's Experimental AI Assistant The AI news cycle is so constant that you can sometimes get fatigued to such a point…

9 条评论
Phased Approach | Maturing your AI Operations - A Quick Guide

2024年9月23日

Phased Approach | Maturing your AI Operations - A Quick Guide

I don't know if you have noticed but the honeymoon phase with Generative AI and Enterprise is over. More and more we…
Phased Approach | Generative AI - Thinking fast and Slow

2024年9月15日

Phased Approach | Generative AI - Thinking fast and Slow

OpenAI's GPT-o1 / Strawberry purposely thinks more slowly Can you count the number of occurrences of the letter 'R' in…

4 条评论
Phased Approach | How do we Evaluate Generative AI?

2024年9月2日

Phased Approach | How do we Evaluate Generative AI?

ANNOUNCEMENTS Some of you may have noticed an issue with our Webinar signup. Apologies for the problem.
Phased Approach | Why most Gen AI experiments never hit production

2024年8月27日

Phased Approach | Why most Gen AI experiments never hit production

Welcome to another edition of Phased Approach where we try to break down the challenges of introducing working AI…
Phased Approach | How to Evaluate AI Output

2024年8月5日

Phased Approach | How to Evaluate AI Output

Welcome to Phased Approach, my weekly musings on using and implementing Generative AI in Business. This week we will be…
Phased Approach | Are Open Source models finally ready for business use?

2024年7月28日

Phased Approach | Are Open Source models finally ready for business use?

This week brought two significant advancements in generative AI, with Meta and Mistral releasing new open-source models…
Phased Approach | Are Mini Models Good for Business?

2024年7月21日

Phased Approach | Are Mini Models Good for Business?

Welcome to this week's Phased Approach Newsletter This week, we're diving into the world of mini AI models, with…
Phased Approach | Claude 3.5 : Why should you care?

2024年7月7日

Phased Approach | Claude 3.5 : Why should you care?

Welcome to this week's Phased Approach Newsletter, where we're diving into the recent release of Anthropic's Claude 3.5…

1 条评论

See all articles

Phased Approach | Reports of the death of RAG has been greatly exaggerated

Richard Skinner

CEO @ PhasedAI | Helping Enterprise Transform Operations with Generative AI

What Exactly is RAG?

Why is RAG So Important for Businesses?

What Does Claude’s Prompt Caching Really Do?

领英推荐

The Limitations of Prompt Caching in Business Contexts

Why RAG Remains Essential

The Bottom Line: Complementary, Not a Replacement

Why are people always Ragging on RAG?

Phased Approach

319 位关注者

Richard Skinner的更多文章

社区洞察

其他会员也浏览了

Semantics. So boring. So indispensable.

Under the Sails of Bayesian

Evaluation of a Factual Claim Classifier with and without Using Entities as Features

The Future of Retrieval Systems & LLMs

2x2 matrix for Data Lack strategy- Final

RecSys 2023 - Part 2

ACAD 40: Coherent Thinking

How to get crowd annotations with expert’s quality

Vector RAG w/o fine tuned LLM

What Exactly is RAG?

Why is RAG So Important for Businesses?

What Does Claude’s Prompt Caching Really Do?

领英推荐

The Limitations of Prompt Caching in Business Contexts

Why RAG Remains Essential

The Bottom Line: Complementary, Not a Replacement

Why are people always Ragging on RAG?

Phased Approach

319 位关注者

Richard Skinner的更多文章

Phased Approach | Why Trustworthy AI Matters for Your Business

Phased Approach | NotebookLM a surprisingly good tool Made by Google

Phased Approach | Maturing your AI Operations - A Quick Guide

Phased Approach | Generative AI - Thinking fast and Slow

Phased Approach | How do we Evaluate Generative AI?

Phased Approach | Why most Gen AI experiments never hit production

Phased Approach | How to Evaluate AI Output

Phased Approach | Are Open Source models finally ready for business use?

Phased Approach | Are Mini Models Good for Business?

Phased Approach | Claude 3.5 : Why should you care?

社区洞察

其他会员也浏览了

Semantics. So boring. So indispensable.

Under the Sails of Bayesian

Evaluation of a Factual Claim Classifier with and without Using Entities as Features

The Future of Retrieval Systems & LLMs

2x2 matrix for Data Lack strategy- Final

RecSys 2023 - Part 2

ACAD 40: Coherent Thinking

How to get crowd annotations with expert’s quality

Vector RAG w/o fine tuned LLM