登录查看更多内容

Retrieval Augmented Generation with LLM- HOW?

Karan Sehgal

Chief Customer Officer at OpenBots

发布日期: 2023年12月16日

A good design for Retrieval Augmented Generation (RAG) component is key while implementing Generative AI use cases. The quality of output depends on how good the input or user query (prompt) is to the Language model. Additionally, it is important to have this RAG system scalable and with high processing performance.

So, how do you build a RAG?

In its simplest form, you can just attach the data to the user query before you send it to LLM for processing. For example, if your user query should return results from the data available in a document, you could include all the text from that document as part of the query, which then acts as context that LLM will use while generating the output. This may be okay if you have just one document. But would this work when your dataset is large and diverse? For example, your result may reside in hundreds of policy documents, or it may be available in other unstructured formats like videos and audio. There is only a limited context window size for any LLM. The context window size defines the maximum length of the user query that the LLM would allow. It may also be expensive to have the query size larger since the pricing for some of the available LLMs are based on the token size (number of words in the request and response).

There are many reference architectures evolving to solve this issue. One such solution involves converting the data as "Embeddings". Embedding is a "Vector" representation of a word (or words) which enable efficient searching when hosted on a vector database. Vectors can be plotted on a multi-dimensional Semantic feature space.?Words / Text with embeddings similar to each other will be closer to each other on the semantic space and will represent the same context. The idea is to be able to identify a very small subset of data that is most relevant and is the real context for the user query (i.e. similar to the embedding of the User Query).?

Following are the high-level steps used to build a RAG.

Step 1: Chunking: In this step you split the data into smaller segments or "Chunks". Chunks can be created using simple technique of equally splitting based on a preset size or can use a more sophisticated technique using data classification.?

Bernard Marr 6 个月前

Understanding Retrieval Augmented Generation (RAG): A…

Quadrant Technologies 2 个月前

Revolutionizing Document Summarization with GenAI and…

FocusKPI, Inc. 3 个月前

Step 2: Embedding: the next step is to generate the embeddings for these chunks. There are different embedding models available including many open-source libraries. (OpenAI has its own embedding model which can be accessed over APIs.). You may want to test these against your data set and see which ones provide the best results while searching.

Step 3: Vector Store: Once you have the embeddings for the chunks, these embeddings are then sorted on a vector store (Vector database).

The above steps should be repeated for any data changes or new data. This will allow for dynamic context based on evolving data.

The following steps are used before a prompt or user query is sent to the LLM.

Step 1: User Query Embedding: Generate embedding of the user query.

Step 2: Search: Search Vector store to retrieve chunks that have matching or similar embeddings.

Step 3: Merge: In this step you merge the retrieved chunks and combine it with the user query (prompt). This enriched prompt then not only includes the user query but also includes the context that the LLM should use for the results.

Arunkumar Nair

Google Cloud Professional Architect, AWS Architect, AI, DL, ML, Analytics

11 个月

Impressive Karan Sehgal

Amir Towns

Investor looking to purchase businesses doing at least $200k in EBITDA

11 个月

Impressive insights! Looking forward to learning more about building a scalable and high-performance RAG system. ????

Yassine Fatihi ??

Crafting Audits, Process, Automations that Generate ?+??| FULL REMOTE Only | Founder & Tech Creative | 30+ Companies Guided

11 个月

Building a scalable and high-performance RAG system is crucial for generating quality output. ?? #AI #generativeai

查看更多评论

要查看或添加评论，请登录

查看全部

Retrieval Augmented Generation with LLM- HOW?

Karan Sehgal

Chief Customer Officer at OpenBots

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

?? What is Trending in AI Research?: PromptTTS 2 + CoALA + BigVSAN + Verba + Persimmon-8B + Falcon 180B + AskIt...

OneGen AI Framework: Does AI Generation and Retrieval Simultaneously

Transforming Industries with Mistral's New SDK for AI Fine-Tuning

Intro to LangChain: Enterprise AI use cases, top tools + frameworks - AI&YOU #56

Understanding Retrieval-Augmented Generation (RAG) in AI

Part 3: Implementing RAG – Retrieval-Augmented Generation for Powerful AI Applications

Tired of unreliable, generic AI solutions? Here's how to build your own powerful local RAG agent with LLaMA3!

When to Use Fine-Tuning, Instruction Sets, and RAG

领英推荐

RLHF for HHH LLM

2024年3月1日

Catastrophic Forgetting: A side effect of Fine Tuning a Large Language Model

2024年2月26日

Retrieval Augmented Generation - WHAT & WHY?

2023年12月15日

RPA - Cost of Automation Ownership

2021年5月11日

社区洞察

其他会员也浏览了

?? What is Trending in AI Research?: PromptTTS 2 + CoALA + BigVSAN + Verba + Persimmon-8B + Falcon 180B + AskIt...

OneGen AI Framework: Does AI Generation and Retrieval Simultaneously

Transforming Industries with Mistral's New SDK for AI Fine-Tuning

Intro to LangChain: Enterprise AI use cases, top tools + frameworks - AI&YOU #56

Understanding Retrieval-Augmented Generation (RAG) in AI

Part 3: Implementing RAG – Retrieval-Augmented Generation for Powerful AI Applications

Tired of unreliable, generic AI solutions? Here's how to build your own powerful local RAG agent with LLaMA3!

When to Use Fine-Tuning, Instruction Sets, and RAG