登录查看更多内容

What Is RAG? Let's Dive Deeper This Time!

Alex Wang

Learn AI Together - I share my learning journey into AI & Data Science here, 90% buzzword-free. Follow me and let's grow together!

发布日期: 2024年5月7日

As Large Language Models (LLMs) have revolutionized the world with their impressive capabilities, a crucial limitation has become apparent - their knowledge remains static and limited to what they were trained on. In today's fast-paced world, this knowledge rapidly becomes outdated.

Retrieval Augmented Generation (RAG) tackles two significant challenges associated with LLMs: keeping their knowledge up-to-date and providing accurate sources to support their responses.

How does it work?

Retrieval, augmentation, and generation are the three core steps followed by RAG systems.

Firstly, based on an input query, RAG systems fetch relevant information from knowledge sources such as document corpora, web pages, or databases. This fetched context is then combined with the original input to enlarge the query prompt. Finally, language models utilize their innate knowledge and context retrieved on the fly to generate output text.

While the concept may seem straightforward, its impact has been significant. Unlike traditional language models that are confined to months or years-old training data, which they soon forget, RAG systems keep LLMs up-to-date with rapidly changing fields around us.

As a result, many question-answering systems, analysis tools, and dialogue agents that rely heavily on knowledge can now greatly benefit from these powerful language models and operate effectively across multiple domains where things change rapidly.

Example 1: Which planet has the most moons in our solar system?

Suppose we ask an LLM, and it responds that Jupiter has the largest number of moons - a response that is technically correct but outdated.

This doesn't mean LLMs lack intelligence; on the contrary, they possess deep internal knowledge and can decide which information is relevant based on their training.

When an LLM is enhanced with RAG, it retrieves relevant information from trusted sources like NASA websites or scientific journals, combines it with the user's query, and generates a response.

In this case, the LLM would accurately state that Saturn has the most moons, backing its answer with the latest and authoritative data.

Example 2: Let's learn more about climate change!

When exploring complex topics like climate change, RAG technology ensures that LLMs generate responses based on reliable external data, rather than solely relying on their training data.

The LLM is ordered to prioritize the external input data over its own generated response, ensuring that the answer is grounded in credible sources.

For our query, it may collect data from peer-reviewed scientific articles or reports published by well-known organizations like the Intergovernmental Panel on Climate Change (IPCC).

Evolution of RAG: from Naive to Modular

Compared to earlier research, today's RAG systems have transformed from simple to sophisticated architectures, and offer a wide range of options.

Naive RAG

Follows the traditional process of indexing, retrieval, and generation
Issues include low precision and low recall, outdated information, hallucination, and poor and inaccurate responses

Advanced RAG

Deals with issues present in Naive RAG such as improving retrieval quality
Involves optimizing the pre-retrieval, retrieval, and post-retrieval processes

领英推荐

The rise and fall of synthetic datasets and smaller…

Thomas Wolf 7 个月前

? Are You Doing RAG Right?

Pascal Biese 8 个月前

? Time for LLMs?

Pascal Biese 1 年前

Modular RAG

Enhances functional modules such as incorporating a search module for similarity retrieval and applying fine-tuning in the retriever
Benefits from greater diversity and flexibility in that you can add or replace modules or adjust the flow between modules based on task requirements

??A fun webinar on the RAG comparison test, coming on 9 May

Live RAG Comparison Test: Pinecone vs Mongo vs Postgres vs SingleStore

RAG or Fine-tuning?

There are insightful discussions about whether retrieval augmentation via RAG or fine-tuning an LLM is the best approach. However, the relationship between the two is not zero-sum. LLMs work even better when these two approaches complement each other.

An effective approach can be to first fine-tune LLMs with data and skills specific to the domain, allowing them to specialize in that area. The more refined the model is based on data, the more rules can be established to generate a dynamic learning environment with fresh, real-time information during inference with RAG systems.

Some forward-thinking researchers are striving to learn the intricacies of leveraging blended models that utilize the strengths of both the offline and online worlds through iterative practices.

This can lead to a synergistic effect, where fine-tuning helps a model utilize its neural context more effectively, while the knowledge RAG written query makes a specialized model available, creating a cycle of ongoing learning and improvement.

RAG evaluation

We can assess various aspects like context relevance, output faithfulness to sources, output relevance, noise robustness, information synthesis, and adaptive reasoning.

This provides insights into their overall proficiency in dynamically retrieving and integrating external knowledge to enhance task performance.

The future...

While RAG systems have mainly focused on text-based tasks, there's growing interest in extending them to support other modalities like image, audio, and video.

The fundamental factors will be technical advancements in areas such as enhanced retrieval quality, dense embedding approach, augmentation techniques, knowledge grounding, model composability, and hybrid paradigms combining RAG with other methods.

With evaluation frameworks bringing RAG systems to maturity, the emergence of critical breakthroughs in machine intelligence should not come as a surprise.

This issue is brought to you in partnership with Rockset.

Rockset is the search and analytics database built for the cloud, with real-time indexing and full-featured SQL on JSON, time series, geospatial and vector data.

They also have an amazing YouTube channel to share tutorials and learning materials, from real-time analytics to building personal AI assistants. check it here.

Learn AI Together

475,093 位关注者

Konstantin Babenko, Ph.D.

Generative AI Innovator | AI Team Builder | Helping businesses transform with cutting-edge AI solutions

8 个月

Retrieval Augmented Generation (RAG) is a game-changer for AI, ensuring that models remain up-to-date and accurate by integrating real-time information from trusted sources. At Processica, we’ve been exploring similar avenues to maintain the relevance of AI systems. Our work in pre- and post-validation frameworks underscores the importance of rigorous QA in AI development, helping mitigate risks and improve reliability. Check out our articles on AI QA and validation techniques for more insights on these critical processes: https://www.dhirubhai.net/pulse/adapting-ai-models-strategic-choice-between-rag-babenko-ph-d--gpeze/?trackingId=YL4VvlttTmW86TBMHZdJTQ%3D%3D

Jay (JieBing) Yu, PhD

8 个月

Here is a simple and practical micro benchmark to compare a few RAG chatbots … https://www.dhirubhai.net/posts/jay-jiebing-yu-phd-7b97a8_ai-genai-llm-activity-7207724913632137216--uS9. You may be surprised at the results.

Amol Bhandari

Assistant Technical Manager | AWS Certified Solutions Architect – Associate

8 个月

RAG systems combine context retrieval with language models to provide up-to-date and accurate knowledge. Their modular architecture allows for customization and optimization. Evaluation goes beyond traditional accuracy metrics to assess factors like context relevance and adaptive reasoning, driving further development and improvement.

Volkan T?re

Engineering Manager @Gen Yaz?l?m Ltd.

10 个月

RAG is now at the very baby steps. The frustration around it is originated from two facts: 1] Every data, information or knowledge corpora are different and need expert filtering, classification, clustering, indexing and vectorization per their "meaning" to the field, which has nothing to do with any AI since an AI can only track contexts, the traits of numbers with each other, not meanings. 2] Assume a RAG is constructed flawlessly, then the AI dealing with it must be "non-intrusive", which means that the AI model should just has to resolve the prompt and the context according to the RAG and when generating anything it should be using the RAG again. Otherwise, if it takes over with its training biases etc, it will warp the "reality" of the RAG into the "reality" of its "training". So in a RAG + AI system, AI must assume the roles of context resolver and context reconstructor without its own "opinions" or "traits". Therefore as a result, an AI dealing with Big Data via RAG must have a transform layer with biased attention to the RAG and prompt context than its training data. How to achieve this? Well I ask the magicians out there to help us the mortals.

3 次回应

James Chu

Follow me to learn how you can leverage AI to 10x your productivity and accelerate your study and research. I'm creating Search engines from the future. Join ↓

10 个月

查看更多评论

要查看或添加评论，请登录

Alex Wang的更多文章

Beyond DeepSeek: How China is Advancing in Tech—Despite Sanctions

2025年2月10日

Beyond DeepSeek: How China is Advancing in Tech—Despite Sanctions

AI: DeepSeek is Just the Beginning By now, many of us have heard about DeepSeek, China’s homegrown AI model that’s…

57 条评论
AI's Biggest Moments in 2024: From AI Hardware to Massive Investments

2024年12月18日

AI's Biggest Moments in 2024: From AI Hardware to Massive Investments

AI Hardware and Infrastructure NVIDIA unveiled its Blackwell AI chip, a game-changer for computational power in…

36 条评论
Quantum Computing: Where We Stand Today, and Current Applications

2024年12月5日

Quantum Computing: Where We Stand Today, and Current Applications

First of all, what does it mean? 'Quantum' comes from quantum mechanics, the science of how tiny particles like atoms…

74 条评论
How to Make a CPU from Scratch (Fun Warning!)

2024年11月12日

How to Make a CPU from Scratch (Fun Warning!)

Step 1: Grab yourself a rock Okay, not just any rock - what you really need is something like quartz that contains…

40 条评论
AI Agents & Angentic Workflow: Why, How and The Impact

2024年11月1日

AI Agents & Angentic Workflow: Why, How and The Impact

But first of all, why are we so focused on developing AI agents and agentic workflows? Among the many explanations, I…

49 条评论
AI Hardware Round 2: TPU vs. DPU vs. VPU vs. APU vs. QPU

2024年10月9日

AI Hardware Round 2: TPU vs. DPU vs. VPU vs. APU vs. QPU

The world of AI hardware is expanding rapidly, and if you’re wondering what the deal is with all these processing…

50 条评论
LLM Reasoning: How They Made Models 'Think'?

2024年10月3日

LLM Reasoning: How They Made Models 'Think'?

First off, reasoning in large language models (LLMs) might not be AGI or ASI yet, but it’s a massive leap forward in…

68 条评论
GenAI Model Updates: Latest Developments from Meta, Mistral, Apple, Google, OpenAI, and more

2024年7月31日

GenAI Model Updates: Latest Developments from Meta, Mistral, Apple, Google, OpenAI, and more

Meta's Llama 3.1 Not entirely open-source, but close enough.

38 条评论
What are Prompt Injection Attacks? Wait, is it REAL?

2024年7月24日

What are Prompt Injection Attacks? Wait, is it REAL?

It's not just real; it's a serious AI security concern. Prompt Injection Attacks are a type of security vulnerability…

61 条评论
AI Hardware: CPU vs GPU vs NPU

2024年7月23日

AI Hardware: CPU vs GPU vs NPU

Let’s start with their roles in computing systems: CPU (Central Processing Unit) The CPU, or processor, is the brain of…

66 条评论

See all articles

What Is RAG? Let's Dive Deeper This Time!

Alex Wang

Learn AI Together - I share my learning journey into AI & Data Science here, 90% buzzword-free. Follow me and let's grow together!

领英推荐

Learn AI Together

475,093 位关注者

Alex Wang的更多文章

社区洞察

其他会员也浏览了

?? Getting RAG Right: All in One Go

LLM Papers Reading Notes - January 2025

RAG Explained: How to Enhance Large Language Models with Powerful Retrieval Techniques

Does Fine-Tuning cause more Hallucinations, and how does cross-layer Attention reduce Key-Value Cache size?

Enhancing Reasoning in Transformer-Based Large Language Models via Symbolic Templates

Evaluating LLM and RAG Systems

Are Long-LLMs A Necessity For Long-Context Tasks?

How to scale Large Language Models (LLMs) to infinite context?

Metrics That Matter: Measuring LLM Performance

Corrective Retrieval Augmented Generation: Why RAGs are not enough!

领英推荐

Learn AI Together

475,093 位关注者

Alex Wang的更多文章

Beyond DeepSeek: How China is Advancing in Tech—Despite Sanctions

AI's Biggest Moments in 2024: From AI Hardware to Massive Investments

Quantum Computing: Where We Stand Today, and Current Applications

How to Make a CPU from Scratch (Fun Warning!)

AI Agents & Angentic Workflow: Why, How and The Impact

AI Hardware Round 2: TPU vs. DPU vs. VPU vs. APU vs. QPU

LLM Reasoning: How They Made Models 'Think'?

GenAI Model Updates: Latest Developments from Meta, Mistral, Apple, Google, OpenAI, and more

What are Prompt Injection Attacks? Wait, is it REAL?

AI Hardware: CPU vs GPU vs NPU

社区洞察

其他会员也浏览了

?? Getting RAG Right: All in One Go

LLM Papers Reading Notes - January 2025

RAG Explained: How to Enhance Large Language Models with Powerful Retrieval Techniques

Does Fine-Tuning cause more Hallucinations, and how does cross-layer Attention reduce Key-Value Cache size?

Enhancing Reasoning in Transformer-Based Large Language Models via Symbolic Templates

Evaluating LLM and RAG Systems

Are Long-LLMs A Necessity For Long-Context Tasks?

How to scale Large Language Models (LLMs) to infinite context?

Metrics That Matter: Measuring LLM Performance

Corrective Retrieval Augmented Generation: Why RAGs are not enough!