RAG (Retrieval-Augmented Generation) For Dummies. Demystifying A Key Design Pattern For Developing Enterprise AI Applications
Hassan Syed
Architect | Cloud Advisor | Azure Certified Solution Expert | Generative AI | Enterprise Systems Expert | IoT Solutions | Big Data | Digital Transformation Leader | Integration Architect | Hands-on| Mentor
RAG for Dummies - Building AI Apps with your data and power of LLMs
Disclaimer: This book cover doesn't exist (we do need one!)
My eight-year-old daughter avoids those "For Dummies" books taking to school. She thinks her classmates would make fun of her (Well, I had similar thoughts at college too :) ).
But let's be real, when it comes to foreseeing the future AI Powered Systems in our organisations, we adults today often feel like dummies ourselves.
So, let's take a step further, beyond ChatGPT, and dive with me into a popular design pattern that allows building a smart enterprise application that combines the power of AI (LLMs or SLMs), with the relevancy of your enterprise data (latest and accurate).
The Challenge: Find and Win the Construction Tenders Lightening Fast
Let's walk with me through this real use case, a simulated story of a detective but in sales.
Imagine you're a Sales Opportunity Detective trying to find
Find the latest building construction tenders released by the state Gov. in the last 7 days.
You've got your magnifying glass and detective hat on, ready to sift through piles of digital documents (scattered across websites). However, you have a secret weapon that others do not have (yet): Retrieval-Augmented Generation (RAG) Application.
In summary, your RAG app will do the three key things below:
Let's break down each step.
1. Retrieval: The Data Detective
The first step is like sending our detective out to gather clues.
In your organisation, for another use case, this could be extracting relevant content searching your company portals, document folders/drives, databases etc. Your RAG app will need to have the capability to use some smart search mechanism to find the relevant content in your enterprise or some app dedicated data stores.
2. Augmentation: The Clue Filter
Now that our detective has gathered a pile of clues (tenders), the next step is sorting through evidence, keeping only the relevant clues and organising them neatly. This is the augmentation phase. Here’s what happens:
3. Generation: Getting hold of all the info. on our target
With the clues sorted, it’s time for RAG to tell the story. The generation phase is where the magic happens:
This step is like our detective writing a final report, detailing everything clearly so you can understand it easily.
The Detective’s Report
Here’s an example of what the final report might look like:
Latest Tenders for Building Construction by New South Wales Government
Tender 1: Construction of New School Building
Description: This tender invites bids for the construction of a new school building in Sydney.
Deadline: June 15, 2024
Requirements: Experience in educational building projects, compliance with local regulations.
Contact: tender.nsw.gov.au
Tender 2: Hospital Wing Extension
Description: This tender calls for the extension of the west wing of a hospital in Newcastle.
Deadline: June 17, 2024
Requirements: Previous hospital construction experience, detailed project plan.
Contact: tender.nsw.gov.au
Summing up the system flows in this RAG Application Demo
领英推荐
The Technical Side
See this reference diagram from Microsoft below, gives you a good idea around the mechanics at a high level. Next you will see details on the options to implement an effective local search engine.
A sample technical implementation below, using OpenAI LLM. The hard part is implementing some effective local search on the enterprise data. If that part is done correctly then the other bits are normal programming challenges.
1. Document Indexing:
- Use an indexing tool like Elasticsearch, Apache Lucene, or other search technologies to index your documents. This process involves creating a searchable index of your documents that can be quickly queried.
2. Embedding-based Retrieval:
- Use embeddings to represent your documents and queries in a vector space. You can leverage pre-trained models like Sentence-BERT or other embedding models to create vector representations of your documents.
- Store these embeddings in a vector database such as FAISS, Pinecone, or Milvus.
3. Query Processing:
- When a user submits a query, convert the query into an embedding using the same model you used to embed your documents.
- Perform a similarity search in your vector database to find the most relevant documents or snippets based on the query embedding.
4. Context Augmentation:
- Retrieve the top relevant snippets or documents from your local source based on the similarity search.
- Combine the user query with these retrieved snippets to create an augmented context.
5. Sending to OpenAI:
- Send the augmented context (user query + relevant local snippets) to the OpenAI API for response generation.
- The OpenAI model can now use this enriched context to generate a more accurate and relevant response.
6. Response Delivery:
- Receive the response from OpenAI and deliver it to the user through your application interface.
### Example Workflow in Detail:
1. User Query: “How do I apply for a local business permit?”
2. Embedding Creation:
- Convert the query to an embedding using a model like Sentence-BERT.
3. Similarity Search:
- Use the query embedding to search for the top relevant document embeddings in your vector database (FAISS, Pinecone, etc.).
- Retrieve the top-k documents or snippets that are most similar to the query embedding.
4. Context Augmentation:
- Combine the query with the retrieved snippets: “User query: How do I apply for a local business permit? Retrieved snippet: To apply for a local business permit, you need to fill out form XYZ and submit it to the local council along with the required documents.”
5. Send to OpenAI:
- Send the augmented text to OpenAI: “How do I apply for a local business permit? To apply for a local business permit, you need to fill out form XYZ and submit it to the local council along with the required documents.”
6. Receive and Deliver Response:
- OpenAI generates a detailed response based on the augmented context.
- Response: “To apply for a local business permit, first, fill out form XYZ, which you can download from the local council’s website. Ensure you have all the required documents, such as proof of identity and business registration. Submit these to the local council office. You can find more details on their official website or contact their support desk.”
The Summary: Why RAG is Awesome
Here’s why it’s so valuable:
So, there is no doubt we are going to see a large number of RAG applications appearing on the enterprise horizon. Therefore, the sooner we start in this space, the more beneficial it will be for your orgnaisation against the never ending market competition.