RAG in generative AI

RAG in generative AI

Pre-trained LLMs may not perform optimally for specific business needs. Potential limitations of pre-trained LLMs and the need to decide between model fine-tuning, gave rise to Retrieval-Augmented Generation as an alternative approach to enhance performance.


Model fine-tuning: a process where a pre-trained model is further trained on a new dataset without starting from scratch.

Potential applications of RAG, such as developing Q&A chatbots that securely interact with internal knowledge bases or enterprise data sources. RAG can be particularly useful in developing applications that require secure interaction with internal knowledge bases or enterprise data sources; and it is more suitable approach compared to out-of-the-box LLMs for certain applications.



Retrieval-Augmented Generation (RAG) is a technique that helps to retrieve data from outside a foundation model and augment your prompts. These prompts are natural language texts that request the Language Learning Model (LLM) to perform a specific task.

The key components of RAG are:

1. Retrieval - Relevant content is retrieved from external knowledge bases or other data sources based on the specifics of the user query.

2. Augmentation - The retrieved contextual information is then appended to the original user query, creating an augmented query to serve as the input to the foundation model.

3. Generation - The foundation model then generates a response based on the augmented query.

Thus, this approach helps to enhance the performance of pre-trained LLMs, which may not perform optimally for specific business needs out-of-the-box.

Check Image 1 and 2.

Image 1


Image 2


Embedding and its Relevance to RAG

Embedding refers to transforming data (text, images, audio) into numerical representation in a high-dimensional vector space using machine learning algorithms.

Embedding allows for:

- Understanding semantics

- Learning complex patterns

- Using the vector representation for applications like search, classification, and natural language processing


End-to-End RAG Architecture

- Extracting data from various sources (e.g., documents, PDFs, HTML) and converting it into a numerical representation (embeddings).

- The embedded data is then used to build a semantic index, which is stored in a knowledge base (e.g., vector database, graph database, SQL database).

- Obtaining relevant information from the knowledge base based on the user's query. (Retrieval)

- The user's query is converted into a vector representation, and a semantic search is performed to find the most relevant information.

- Enhancing the retrieved information using an LLM, which combines the retrieved information with the user's query to generate additional relevant content.(Augmentation)

- The final step, the LLM uses the augmented information to generate a coherent and informative response to the user's query.Similarity Measures

Image 3



References:

https://huggingface.co/docs/transformers/en/model_doc/rag

https://www.dhirubhai.net/pulse/rag-architecture-deep-dive-frank-denneman-4lple/

https://aws.amazon.com/what-is/retrieval-augmented-generation/

Woodley B. Preucil, CFA

Senior Managing Director

6 个月

Rima Modak Very interesting. Thank you for sharing

回复

要查看或添加评论,请登录

Rima Modak的更多文章

  • What is AI?

    What is AI?

    AI's are machines that can perform learning, problem-solving and decision making. AI, ML and Deep Learning: Figure 1:…

社区洞察

其他会员也浏览了