ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

I see RAG everywhere

Yusuf E.

AI Engineer @ Stealth | GenAI, LLM, RAG, Agents

å‘å¸ƒæ—¥æœŸ: 2024å¹´6æœˆ20æ—¥

+ å…³æ³¨

â€œSearchâ€ is dead, long live â€œAskâ€

See how "search anything" is changing to "ask anything"

Before we talk about what RAG is, let me begin with one of the most important problems of LLMs; their knowledge is static. Once the training is done there is no great way to teach them new knowledge.

But ultimately that might not be necessary anyway. As long as they read and understand/reason on the given text they donâ€™t need to update their core knowledge constantly. They can build a knowledge separate to their core knowledge which they can look up whenever they need to.

However at the moment they have static knowledge. For example GPT-4o inside ChatGPT is not constantly being updated with latest news, at best it can do Bing search and give you some of the latest news.

So letâ€™s start with static knowledge problem:

Static Knowledge

LLMs face significant challenges related to their static knowledge base, high retraining costs, and the limited scope for updates, making it difficult to keep them up-to-date with new information.

Static Knowledge: Once an LLM is trained, its knowledge is fixed based on the data available at the time of training. This means the model cannot learn or integrate new information that emerges after the training period. As a result, the model's responses may become outdated or incorrect as time progresses and new information becomes available.
Retraining Costs: Updating the model to include new knowledge requires retraining, which involves reprocessing vast amounts of data and adjusting billions of parameters.
Limited Scope for Updates: Fine-tuning can provide some level of update by training the model on new, smaller datasets. However, this approach has limitations. It can only incorporate a limited amount of new information and may not comprehensively update the model's entire knowledge base. Additionally, fine-tuning can sometimes lead to overfitting to the new data, reducing the model's overall performance.
Lack of Access to Private Data: LLMs do not have access to your private data, company documents, reports, or proprietary databases unless explicitly provided during their training. The modelâ€™s training data consists of publicly available information and licensed data, meaning it doesn't inherently know about confidential or internal information specific to individuals or organisations. This limitation means that the model cannot generate responses based on private or sensitive data that was not part of its training corpus.

RAG as Solution

Retrieval-Augmented Generation (RAG) is an AI technique that enhances the capabilities of language models by combining two core components: retrieval and generation. Here's a simplified explanation:

Retrieval: This component searches for and retrieves relevant information from a large database or knowledge source. When a user asks a question, the retrieval system looks through a vast collection of documents, articles, or other data sources to find the most relevant pieces of information.
Generation: This component involves a language model (like GPT-4) that can generate human-like text. It takes the information retrieved by the first component and uses it to generate a coherent and accurate response to the user's question.

How RAG Works

User Query: A user asks a question or makes a request.
Retrieval Step: The system searches its database for the most relevant documents or information related to the query.
Generation Step: The language model takes the retrieved information and formulates a response that answers the user's query, integrating the retrieved facts into the generated text.

Example:

Imagine you ask an AI system, "What are the benefits of renewable energy?"

Retrieval: The system searches a database of articles, studies, and documents about renewable energy.
Generation: Using the retrieved information, the language model generates a response like: "Renewable energy sources, such as wind and solar power, offer numerous benefits. They reduce greenhouse gas emissions, decrease dependence on fossil fuels, and can lead to lower energy costs in the long run."

ChatGPT becoming RAG

ChatGPT is trying on becoming a RAG by reading your documents on cloud. So now your questions would also be answered based on documents on your Google Drive or Microsoft OneDrive. Also we can say it is RAG by using Bing search; it searches (retrieve) and tries to answer your question based on search results.

é¢†è‹±æŽ¨è

?? Experimenting with OpenAIâ€™s Code Interpreter

Azeem Azhar 1 å¹´å‰

The AI Vanguard Newsletter #3

Danny Butvinik 1 å¹´å‰

Top LLM Papers of the Week (November Week 1, 2024)

Kalyan KS 4 ä¸ªæœˆå‰

ChatGPT wants to have RAG abilities with your private data

Notion becoming a RAG

Notion not only answers your questions based on itâ€™s own documents, now also it integrates with Google Drive and Slack and tries to bring you answers from these sources too. So Notion is becoming a big RAG too.

Notion also wants more of your data through integrations

Gemini as Google Tools RAG

Google Gemini integrates all their product to bring you answers

Gemini is in perfect position to bring you answer from their own tools

Perplexity as RAG of Internet

Perplexity wants to be the RAG of internet.

Perplexity wants to be RAG of all internet

RAG provides

Enhanced Accuracy: By combining retrieval with generation, the system can provide more accurate and relevant answers, as it grounds its responses in real, up-to-date information.
Dynamic Knowledge Integration: The system can adapt to new information without needing to be retrained from scratch, making it more flexible and up-to-date.
Better Handling of Specific Queries: RAG systems are particularly effective for answering specific, detailed questions that require precise information, improving the user experience significantly.

Building RAG is not hard

A simple overview of data flow in a simple RAG

For example using Llama Index framework you can build it in a small code snippet. And LlamaIndex is not as sophisticated as for example React. The real magic lies in embeddings and how LLM reads and generates answers. LlamaIndex is a thin wrapper around LLM api calls.

Building great RAG is very hard

Building a great production RAG systems involves addressing challenges related to:

Data retrieval accuracy
Integration complexity
Performance and scalability
Data privacy and security
Knowledge integration
Cost and resource management
Evaluation and benchmarking
Handling ambiguity and context

These challenges require careful consideration and sophisticated solutions to create effective and efficient RAG systems.

In my next article I will try to share my findings and experience on building â€œAdvancedâ€ RAG.

Dimitri Graur

AI lead @Kapernikov

4 ä¸ªæœˆ

Great article! Nice image too :) Do you mind if I use it for a post? Looking forward for the advanced RAG article. I've been playing around with it for a while but it seems that the "vanilla" implementation quickly needs to get more and more complex. Wondering if there are some architectures we know are the most optimal as well as retrieval methods we can generalize to any kinds of data.

èµž

å›žå¤

1 æ¬¡å›žåº”

Andy Hibbert

Advisor | Investor | Experienced CEO | Founder Super Reel Travel, Car & Away, Karshare, FlySpace

9 ä¸ªæœˆ

Very interesting as ever Yusuf E. . It would be great to catch up with you about using tools to index video and then using some AI components to drive content to users like TikTok. Good to understand how you can avoid funnelling preferences too much, to avoid losing inspiration for wider holiday inspirationâ€¦ be great to understand how systems learn about users and how that can be â€œfeatheredâ€ to broaden the preferences so to speak.

èµž

å›žå¤

1 æ¬¡å›žåº”

æŸ¥çœ‹æ›´å¤šè¯„è®º

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Yusuf E.çš„æ›´å¤šæ–‡ç«

Microsoft LLM Frameworks

2024å¹´7æœˆ11æ—¥

Microsoft LLM Frameworks

Microsoft has been pretty active on dropping open source frameworks to interact with LLMs. While you don't have to useâ€¦
AI in the Office: Supercharging Knowledge Workers' Productivity and Innovation

2024å¹´3æœˆ25æ—¥

AI in the Office: Supercharging Knowledge Workers' Productivity and Innovation

As a software engineer who's been coding side-by-side with GPT-4 every day, and even figuring out how to chopâ€¦
Multiverse Simulation

2024å¹´3æœˆ6æ—¥

Multiverse Simulation

Dall-E possesses the capability to create an unimaginable array of image combinations. Entertaining the notion as aâ€¦

1 æ¡è¯„è®º
Attention Is All You Need

2024å¹´2æœˆ20æ—¥

Attention Is All You Need

You might have heard this statement a few times now if you ever have explored the Large Language Models. This is theâ€¦

I see RAG everywhere

Yusuf E.

AI Engineer @ Stealth | GenAI, LLM, RAG, Agents

Static Knowledge