What Is RAG? Let's Dive Deeper This Time!
As Large Language Models (LLMs) have revolutionized the world with their impressive capabilities, a crucial limitation has become apparent - their knowledge remains static and limited to what they were trained on. In today's fast-paced world, this knowledge rapidly becomes outdated.
Retrieval Augmented Generation (RAG) tackles two significant challenges associated with LLMs: keeping their knowledge up-to-date and providing accurate sources to support their responses.
How does it work?
Retrieval, augmentation, and generation are the three core steps followed by RAG systems.
Firstly, based on an input query, RAG systems fetch relevant information from knowledge sources such as document corpora, web pages, or databases. This fetched context is then combined with the original input to enlarge the query prompt. Finally, language models utilize their innate knowledge and context retrieved on the fly to generate output text.
While the concept may seem straightforward, its impact has been significant. Unlike traditional language models that are confined to months or years-old training data, which they soon forget, RAG systems keep LLMs up-to-date with rapidly changing fields around us.
As a result, many question-answering systems, analysis tools, and dialogue agents that rely heavily on knowledge can now greatly benefit from these powerful language models and operate effectively across multiple domains where things change rapidly.
Example 1: Which planet has the most moons in our solar system?
Suppose we ask an LLM, and it responds that Jupiter has the largest number of moons - a response that is technically correct but outdated.
This doesn't mean LLMs lack intelligence; on the contrary, they possess deep internal knowledge and can decide which information is relevant based on their training.
When an LLM is enhanced with RAG, it retrieves relevant information from trusted sources like NASA websites or scientific journals, combines it with the user's query, and generates a response.
In this case, the LLM would accurately state that Saturn has the most moons, backing its answer with the latest and authoritative data.
Example 2: Let's learn more about climate change!
When exploring complex topics like climate change, RAG technology ensures that LLMs generate responses based on reliable external data, rather than solely relying on their training data.
The LLM is ordered to prioritize the external input data over its own generated response, ensuring that the answer is grounded in credible sources.
For our query, it may collect data from peer-reviewed scientific articles or reports published by well-known organizations like the Intergovernmental Panel on Climate Change (IPCC).
Evolution of RAG: from Naive to Modular
Compared to earlier research, today's RAG systems have transformed from simple to sophisticated architectures, and offer a wide range of options.
Naive RAG
Advanced RAG
领英推荐
Modular RAG
??A fun webinar on the RAG comparison test, coming on 9 May
RAG or Fine-tuning?
There are insightful discussions about whether retrieval augmentation via RAG or fine-tuning an LLM is the best approach. However, the relationship between the two is not zero-sum. LLMs work even better when these two approaches complement each other.
An effective approach can be to first fine-tune LLMs with data and skills specific to the domain, allowing them to specialize in that area. The more refined the model is based on data, the more rules can be established to generate a dynamic learning environment with fresh, real-time information during inference with RAG systems.
Some forward-thinking researchers are striving to learn the intricacies of leveraging blended models that utilize the strengths of both the offline and online worlds through iterative practices.
This can lead to a synergistic effect, where fine-tuning helps a model utilize its neural context more effectively, while the knowledge RAG written query makes a specialized model available, creating a cycle of ongoing learning and improvement.
RAG evaluation
We can assess various aspects like context relevance, output faithfulness to sources, output relevance, noise robustness, information synthesis, and adaptive reasoning.
This provides insights into their overall proficiency in dynamically retrieving and integrating external knowledge to enhance task performance.
The future...
While RAG systems have mainly focused on text-based tasks, there's growing interest in extending them to support other modalities like image, audio, and video.
The fundamental factors will be technical advancements in areas such as enhanced retrieval quality, dense embedding approach, augmentation techniques, knowledge grounding, model composability, and hybrid paradigms combining RAG with other methods.
With evaluation frameworks bringing RAG systems to maturity, the emergence of critical breakthroughs in machine intelligence should not come as a surprise.
This issue is brought to you in partnership with Rockset.
Rockset is the search and analytics database built for the cloud, with real-time indexing and full-featured SQL on JSON, time series, geospatial and vector data.
They also have an amazing YouTube channel to share tutorials and learning materials, from real-time analytics to building personal AI assistants. check it here.
Generative AI Innovator | AI Team Builder | Helping businesses transform with cutting-edge AI solutions
8 个月Retrieval Augmented Generation (RAG) is a game-changer for AI, ensuring that models remain up-to-date and accurate by integrating real-time information from trusted sources. At Processica, we’ve been exploring similar avenues to maintain the relevance of AI systems. Our work in pre- and post-validation frameworks underscores the importance of rigorous QA in AI development, helping mitigate risks and improve reliability. Check out our articles on AI QA and validation techniques for more insights on these critical processes: https://www.dhirubhai.net/pulse/adapting-ai-models-strategic-choice-between-rag-babenko-ph-d--gpeze/?trackingId=YL4VvlttTmW86TBMHZdJTQ%3D%3D
Here is a simple and practical micro benchmark to compare a few RAG chatbots … https://www.dhirubhai.net/posts/jay-jiebing-yu-phd-7b97a8_ai-genai-llm-activity-7207724913632137216--uS9. You may be surprised at the results.
Assistant Technical Manager | AWS Certified Solutions Architect – Associate
8 个月RAG systems combine context retrieval with language models to provide up-to-date and accurate knowledge. Their modular architecture allows for customization and optimization. Evaluation goes beyond traditional accuracy metrics to assess factors like context relevance and adaptive reasoning, driving further development and improvement.
Engineering Manager @Gen Yaz?l?m Ltd.
10 个月RAG is now at the very baby steps. The frustration around it is originated from two facts: 1] Every data, information or knowledge corpora are different and need expert filtering, classification, clustering, indexing and vectorization per their "meaning" to the field, which has nothing to do with any AI since an AI can only track contexts, the traits of numbers with each other, not meanings. 2] Assume a RAG is constructed flawlessly, then the AI dealing with it must be "non-intrusive", which means that the AI model should just has to resolve the prompt and the context according to the RAG and when generating anything it should be using the RAG again. Otherwise, if it takes over with its training biases etc, it will warp the "reality" of the RAG into the "reality" of its "training". So in a RAG + AI system, AI must assume the roles of context resolver and context reconstructor without its own "opinions" or "traits". Therefore as a result, an AI dealing with Big Data via RAG must have a transform layer with biased attention to the RAG and prompt context than its training data. How to achieve this? Well I ask the magicians out there to help us the mortals.
Follow me to learn how you can leverage AI to 10x your productivity and accelerate your study and research. I'm creating Search engines from the future. Join ↓
10 个月