Retrieval Augumented Generation
LLM supported by RAG architecture

Retrieval Augumented Generation

Anyone within the industry who has utilized ChatGPT for business purposes would likely have had the thought, "This is truly impressive! I appreciate how GPT can effectively address my inquiries. Now, the question is, how can I implement this for my own use? Can I train it using my specific data?"

Upon delving into this, one begins to explore the costs and complexities associated with training. This raises the question of whether such an endeavor is feasible or advisable. It seems unlikely that we are prepared to become direct competitors with OpenAI at this time.

Lewis et al., (2021) (

A group of Meta AI researchers introduced a methodology known as Retrieval Augmented Generation (RAG) to tackle tasks that require substantial knowledge. RAG merges an information retrieval component with a text generation model. This allows RAG to be fine-tuned and its internal knowledge to be adjusted efficiently without requiring a complete retraining of the entire model.

RAG operates by taking an input and retrieving a collection of pertinent supporting documents based on a given source. These documents are then concatenated as context with the original input, which is subsequently fed into the text generation component to produce the final output. This adaptability of RAG proves valuable for scenarios in which factual information may evolve over time, addressing a limitation of Language Model's static knowledge. RAG's approach permits language models to bypass the need for complete retraining, enabling them to access the most up-to-date information for generating accurate outputs via retrieval-based generation.

The process of implementing RAG involves several steps:

Candidate Selection: The retrieval system identifies a set of text snippets that are potential candidates due to their relevance to the input context or query.

Scoring and Ranking: Each candidate snippet is assigned a score based on factors such as relevance and accuracy. The retrieval system arranges the candidate snippets in order of their scores.

Input Combination: The top-rated candidate snippets are combined with the original input context or query, creating an extended input that encompasses both retrieved text and the original input.

Generation Process: The extended input is fed into the generative model, which utilizes both the retrieved text snippets and the original input to generate the final text output.

Is it possible to construct such a system?

Leading cloud service providers like Microsoft and Amazon offer RAG solutions.


Azure ML RAG

RAG with Azure Machine Learning:

In Azure Machine Learning, RAG is facilitated through integration with Azure OpenAI Service, making use of large language models and vectorization. This integration supports tools like Faiss and Azure Cognitive Search as vector stores, along with open-source offerings like LangChain for data chunking. Implementing RAG involves formatting data to enable efficient searchability before sending it to the Language Model, ultimately optimizing token consumption. Regularly updating the data is also crucial for maintaining RAG's effectiveness.


AWS RAG

RAG with Amazon SageMaker:

External data that enhances prompts can come from various sources like document repositories, databases, or APIs. The process involves converting documents and user queries into a compatible format for relevance searches. Embedding language models are used to transform the data into numerical representations, allowing comparisons. RAG models leverage these embeddings to combine user queries and relevant context, which is then fed to the foundation model. Knowledge libraries and their embeddings can be updated asynchronously.

The process is similar across platforms like AWS, Azure, and IBM, and open-source tools like Haystack can also achieve similar results.

The era of generative AI has unlocked numerous capabilities for existing systems. One notable advancement is Vector databases and retrieval augmented generation. This overview only scratches the surface of the potential, such as building AI agents capable of processing various data types like text, images, videos, or audio. RAG and vector databases tackle the challenges of extended context windows in Language Models, bringing historical knowledge-based reasoning to the forefront.

要查看或添加评论,请登录

Eeswar C.的更多文章

  • In-Context Learning

    In-Context Learning

    Have you ever encountered instances where ChatGPT repeatedly provides similar responses to your queries, or where its…

    1 条评论
  • Diffusion Model - Gen AI

    Diffusion Model - Gen AI

    Diffusion models have gained attention for their ability to handle various tasks, particularly in the domains of image…

  • Anomaly Detection with VAE

    Anomaly Detection with VAE

    Anomaly detection is a machine learning technique used to identify patterns that are considered unusual or out of the…

  • Neural Network

    Neural Network

    In this article I am going back to the basics, Neural Networks! Most of the readers must have seen the picture above…

  • BERT - Who?

    BERT - Who?

    BERT - Bidirectional Encoder Representations from Transformers, isn’t that a tongue twister! 5 years ago, google…

  • How Does my Iphone know its me?

    How Does my Iphone know its me?

    Ever wondered how does iPhone know its you and never mistakes someone else for you when using Face Detection? Drum Roll…

    1 条评论
  • Natural Language Data Search

    Natural Language Data Search

    Remember how search was tedious a decade ago! Today you can search and ask questions in any search engine as you would…

  • Machine Learning & Data Privacy

    Machine Learning & Data Privacy

    Every person i know fears about how their personal data is at risk by all the AI/ML that is surrounding them, whether…

  • Business at center of Data Science

    Business at center of Data Science

    Any one who has participated in brainstroming & whiteboarding sessions would agree that, what data scientists think of…

  • Capsule Networks (#capsnets)

    Capsule Networks (#capsnets)

    In my previous article on Handwriting Decoder (#ocr), we touched on how can we read Hand Writing using Computer vision.…

社区洞察

其他会员也浏览了