RAG in generative AI
Rima Modak
Data Science Engineer | Python and C++ | AI/ML, Cloud and IT | Certified in CCNA, Azure, Oracle | GHC 24
Pre-trained LLMs may not perform optimally for specific business needs. Potential limitations of pre-trained LLMs and the need to decide between model fine-tuning, gave rise to Retrieval-Augmented Generation as an alternative approach to enhance performance.
Model fine-tuning: a process where a pre-trained model is further trained on a new dataset without starting from scratch.
Potential applications of RAG, such as developing Q&A chatbots that securely interact with internal knowledge bases or enterprise data sources. RAG can be particularly useful in developing applications that require secure interaction with internal knowledge bases or enterprise data sources; and it is more suitable approach compared to out-of-the-box LLMs for certain applications.
Retrieval-Augmented Generation (RAG) is a technique that helps to retrieve data from outside a foundation model and augment your prompts. These prompts are natural language texts that request the Language Learning Model (LLM) to perform a specific task.
The key components of RAG are:
1. Retrieval - Relevant content is retrieved from external knowledge bases or other data sources based on the specifics of the user query.
2. Augmentation - The retrieved contextual information is then appended to the original user query, creating an augmented query to serve as the input to the foundation model.
3. Generation - The foundation model then generates a response based on the augmented query.
Thus, this approach helps to enhance the performance of pre-trained LLMs, which may not perform optimally for specific business needs out-of-the-box.
Check Image 1 and 2.
Embedding and its Relevance to RAG
Embedding refers to transforming data (text, images, audio) into numerical representation in a high-dimensional vector space using machine learning algorithms.
Embedding allows for:
领英推荐
- Understanding semantics
- Learning complex patterns
- Using the vector representation for applications like search, classification, and natural language processing
End-to-End RAG Architecture
- Extracting data from various sources (e.g., documents, PDFs, HTML) and converting it into a numerical representation (embeddings).
- The embedded data is then used to build a semantic index, which is stored in a knowledge base (e.g., vector database, graph database, SQL database).
- Obtaining relevant information from the knowledge base based on the user's query. (Retrieval)
- The user's query is converted into a vector representation, and a semantic search is performed to find the most relevant information.
- Enhancing the retrieved information using an LLM, which combines the retrieved information with the user's query to generate additional relevant content.(Augmentation)
- The final step, the LLM uses the augmented information to generate a coherent and informative response to the user's query.Similarity Measures
References:
Senior Managing Director
6 个月Rima Modak Very interesting. Thank you for sharing