I see RAG everywhere
I see RAG everywhere

I see RAG everywhere

“Search” is dead, long live “Ask”

See how "search anything" is changing to "ask anything"

Before we talk about what RAG is, let me begin with one of the most important problems of LLMs; their knowledge is static. Once the training is done there is no great way to teach them new knowledge.

But ultimately that might not be necessary anyway. As long as they read and understand/reason on the given text they don’t need to update their core knowledge constantly. They can build a knowledge separate to their core knowledge which they can look up whenever they need to.

However at the moment they have static knowledge. For example GPT-4o inside ChatGPT is not constantly being updated with latest news, at best it can do Bing search and give you some of the latest news.

So let’s start with static knowledge problem:

Static Knowledge

LLMs face significant challenges related to their static knowledge base, high retraining costs, and the limited scope for updates, making it difficult to keep them up-to-date with new information.

  • Static Knowledge: Once an LLM is trained, its knowledge is fixed based on the data available at the time of training. This means the model cannot learn or integrate new information that emerges after the training period. As a result, the model's responses may become outdated or incorrect as time progresses and new information becomes available.
  • Retraining Costs: Updating the model to include new knowledge requires retraining, which involves reprocessing vast amounts of data and adjusting billions of parameters.
  • Limited Scope for Updates: Fine-tuning can provide some level of update by training the model on new, smaller datasets. However, this approach has limitations. It can only incorporate a limited amount of new information and may not comprehensively update the model's entire knowledge base. Additionally, fine-tuning can sometimes lead to overfitting to the new data, reducing the model's overall performance.
  • Lack of Access to Private Data: LLMs do not have access to your private data, company documents, reports, or proprietary databases unless explicitly provided during their training. The model’s training data consists of publicly available information and licensed data, meaning it doesn't inherently know about confidential or internal information specific to individuals or organisations. This limitation means that the model cannot generate responses based on private or sensitive data that was not part of its training corpus.

RAG as Solution

A simple overview of a RAG system

Retrieval-Augmented Generation (RAG) is an AI technique that enhances the capabilities of language models by combining two core components: retrieval and generation. Here's a simplified explanation:

  1. Retrieval: This component searches for and retrieves relevant information from a large database or knowledge source. When a user asks a question, the retrieval system looks through a vast collection of documents, articles, or other data sources to find the most relevant pieces of information.
  2. Generation: This component involves a language model (like GPT-4) that can generate human-like text. It takes the information retrieved by the first component and uses it to generate a coherent and accurate response to the user's question.

How RAG Works

  1. User Query: A user asks a question or makes a request.
  2. Retrieval Step: The system searches its database for the most relevant documents or information related to the query.
  3. Generation Step: The language model takes the retrieved information and formulates a response that answers the user's query, integrating the retrieved facts into the generated text.

Example:

Imagine you ask an AI system, "What are the benefits of renewable energy?"

  • Retrieval: The system searches a database of articles, studies, and documents about renewable energy.
  • Generation: Using the retrieved information, the language model generates a response like: "Renewable energy sources, such as wind and solar power, offer numerous benefits. They reduce greenhouse gas emissions, decrease dependence on fossil fuels, and can lead to lower energy costs in the long run."

ChatGPT becoming RAG

ChatGPT is trying on becoming a RAG by reading your documents on cloud. So now your questions would also be answered based on documents on your Google Drive or Microsoft OneDrive. Also we can say it is RAG by using Bing search; it searches (retrieve) and tries to answer your question based on search results.

ChatGPT wants to have RAG abilities with your private data

Notion becoming a RAG

Notion not only answers your questions based on it’s own documents, now also it integrates with Google Drive and Slack and tries to bring you answers from these sources too. So Notion is becoming a big RAG too.

Notion also wants more of your data through integrations

Gemini as Google Tools RAG

Google Gemini integrates all their product to bring you answers

Gemini is in perfect position to bring you answer from their own tools

Perplexity as RAG of Internet

Perplexity wants to be the RAG of internet.

Perplexity wants to be RAG of all internet

RAG provides

  • Enhanced Accuracy: By combining retrieval with generation, the system can provide more accurate and relevant answers, as it grounds its responses in real, up-to-date information.
  • Dynamic Knowledge Integration: The system can adapt to new information without needing to be retrained from scratch, making it more flexible and up-to-date.
  • Better Handling of Specific Queries: RAG systems are particularly effective for answering specific, detailed questions that require precise information, improving the user experience significantly.

Building RAG is not hard

A simple overview of data flow in a simple RAG

For example using Llama Index framework you can build it in a small code snippet. And LlamaIndex is not as sophisticated as for example React. The real magic lies in embeddings and how LLM reads and generates answers. LlamaIndex is a thin wrapper around LLM api calls.

Read more about this on: https://docs.llamaindex.ai/en/stable/getting_started/concepts/

Building great RAG is very hard

Building a great production RAG systems involves addressing challenges related to:

  • Data retrieval accuracy
  • Integration complexity
  • Performance and scalability
  • Data privacy and security
  • Knowledge integration
  • Cost and resource management
  • Evaluation and benchmarking
  • Handling ambiguity and context

These challenges require careful consideration and sophisticated solutions to create effective and efficient RAG systems.

In my next article I will try to share my findings and experience on building “Advanced” RAG.

Dimitri Graur

AI lead @Kapernikov

4 个月

Great article! Nice image too :) Do you mind if I use it for a post? Looking forward for the advanced RAG article. I've been playing around with it for a while but it seems that the "vanilla" implementation quickly needs to get more and more complex. Wondering if there are some architectures we know are the most optimal as well as retrieval methods we can generalize to any kinds of data.

Andy Hibbert

Advisor | Investor | Experienced CEO | Founder Super Reel Travel, Car & Away, Karshare, FlySpace

9 个月

Very interesting as ever Yusuf E. . It would be great to catch up with you about using tools to index video and then using some AI components to drive content to users like TikTok. Good to understand how you can avoid funnelling preferences too much, to avoid losing inspiration for wider holiday inspiration… be great to understand how systems learn about users and how that can be “feathered” to broaden the preferences so to speak.

要查看或添加评论,请登录

Yusuf E.的更多文章

  • Microsoft LLM Frameworks

    Microsoft LLM Frameworks

    Microsoft has been pretty active on dropping open source frameworks to interact with LLMs. While you don't have to use…

  • AI in the Office: Supercharging Knowledge Workers' Productivity and Innovation

    AI in the Office: Supercharging Knowledge Workers' Productivity and Innovation

    As a software engineer who's been coding side-by-side with GPT-4 every day, and even figuring out how to chop…

  • Multiverse Simulation

    Multiverse Simulation

    Dall-E possesses the capability to create an unimaginable array of image combinations. Entertaining the notion as a…

    1 条评论
  • Attention Is All You Need

    Attention Is All You Need

    You might have heard this statement a few times now if you ever have explored the Large Language Models. This is the…

社区洞察

其他会员也浏览了