Next-Gen AI: The Power of RAG
Dr Rabi Prasad Padhy
Vice President, Data & AI | Generative AI Practice Leader
RAG is an emerging AI technique designed to improve the output of large language models (LLMs) by accessing and incorporating information outside their training data sets before generating a response. RAG eliminates the need for both building a costly LLM from scratch and sending sensitive data to the cloud. This "data on demand" approach offers a secure and cost-effective alternative.
A typical AI request (called an inference) involves six basic steps:
Step 1 : Input data preparation - This could involve normalization, tokenization (for text), resizing images, or converting the data into a specific format.
Considerations for RAG-specific preparation:
Step 2 : Model loading - This model has already been trained on a data set and has learned patterns that it can apply to new data.
This is where RAG shines! The prepared data is fed to the model, and additionally, RAG searches through authorized external sources like internal databases or documents.
Step 3 : Inference execution. The prepared input data is fed into the model.
Step 4 : Output generation. The nature of this output depends on the task.
In this step, RAG selects the most relevant retrieved documents and guides the LLM to generate a response tailored to the specific task.
Step 5 : Post-processing. The raw output from the model may undergo post-processing to convert it into a more interpretable or useful form.
领英推荐
Step 6 : Result interpretation and action. Finally, the post-processed output is interpreted within the context of the application, leading to an action or decision.
For example, in a medical diagnosis application, the output might be interpreted by a healthcare professional to inform a treatment plan.
In a RAG-augmented inference, RAG most affects steps 3 and 4. For example, in step 3, the application also searches whatever external data it’s been given access to (internal company databases, external documents, etc.) in addition to the training data the model was built on. Then, in step 4, RAG picks the top-matched documents from the retrieval step and uses the LLM to generate the response depending on the specific use case (i.e., question answering, summarization, etc.).
RAG : Pros & Cons
Pros:
Cons:
Intrapreneur & Innovator | Building Private Generative AI Products on Azure & Google Cloud | SRE | Google Certified Professional Cloud Architect | Certified Kubernetes Administrator (CKA)
7 个月What would be key considerations while taking a RAG based system to production . Is LangChain proven in production workloads ?