RAG (Retrieval-Augmented Generation) For Dummies. Demystifying A Key Design Pattern For Developing Enterprise AI Applications

RAG (Retrieval-Augmented Generation) For Dummies. Demystifying A Key Design Pattern For Developing Enterprise AI Applications

RAG for Dummies - Building AI Apps with your data and power of LLMs

Disclaimer: This book cover doesn't exist (we do need one!)

My eight-year-old daughter avoids those "For Dummies" books taking to school. She thinks her classmates would make fun of her (Well, I had similar thoughts at college too :) ).

But let's be real, when it comes to foreseeing the future AI Powered Systems in our organisations, we adults today often feel like dummies ourselves.

So, let's take a step further, beyond ChatGPT, and dive with me into a popular design pattern that allows building a smart enterprise application that combines the power of AI (LLMs or SLMs), with the relevancy of your enterprise data (latest and accurate).

The Challenge: Find and Win the Construction Tenders Lightening Fast

Let's walk with me through this real use case, a simulated story of a detective but in sales.

Imagine you're a Sales Opportunity Detective trying to find

Find the latest building construction tenders released by the state Gov. in the last 7 days.

You've got your magnifying glass and detective hat on, ready to sift through piles of digital documents (scattered across websites). However, you have a secret weapon that others do not have (yet): Retrieval-Augmented Generation (RAG) Application.

In summary, your RAG app will do the three key things below:

  1. Retrieval: Fetches data from your data sources, the internet in this case.
  2. Augmentation: Filters and processes the data (relevant).
  3. Generation: Creates a human readable and actionable response (by sending your query, combined with the retrieved augmented content together (as a prompt) to AI/LLM and get a humanised smart response generated.

Let's break down each step.

1. Retrieval: The Data Detective

The first step is like sending our detective out to gather clues.

  • Scours the internet, visiting government websites, databases, and relevant portals.
  • Collects all the tenders published in the last seven days.

In your organisation, for another use case, this could be extracting relevant content searching your company portals, document folders/drives, databases etc. Your RAG app will need to have the capability to use some smart search mechanism to find the relevant content in your enterprise or some app dedicated data stores.


2. Augmentation: The Clue Filter

Now that our detective has gathered a pile of clues (tenders), the next step is sorting through evidence, keeping only the relevant clues and organising them neatly. This is the augmentation phase. Here’s what happens:

  • RAG App reviews all the gathered data and refine it by filtering out irrelevant information.
  • In this case, it keeps only the tenders related to building construction.
  • Structures the data, making sure the important details are highlighted.


3. Generation: Getting hold of all the info. on our target

With the clues sorted, it’s time for RAG to tell the story. The generation phase is where the magic happens:

  • RAG takes the refined information and feeds it into a language model (LLM).
  • The language model then crafts a clear and detailed report, summarising the tenders.

This step is like our detective writing a final report, detailing everything clearly so you can understand it easily.

The Detective’s Report

Here’s an example of what the final report might look like:


Latest Tenders for Building Construction by New South Wales Government

Tender 1: Construction of New School Building

Description: This tender invites bids for the construction of a new school building in Sydney.

Deadline: June 15, 2024

Requirements: Experience in educational building projects, compliance with local regulations.

Contact: tender.nsw.gov.au

Tender 2: Hospital Wing Extension

Description: This tender calls for the extension of the west wing of a hospital in Newcastle.

Deadline: June 17, 2024

Requirements: Previous hospital construction experience, detailed project plan.

Contact: tender.nsw.gov.au


Summing up the system flows in this RAG Application Demo

The Technical Side

See this reference diagram from Microsoft below, gives you a good idea around the mechanics at a high level. Next you will see details on the options to implement an effective local search engine.

c/o Microsoft

Source https://learn.microsoft.com/en-us/azure/search/retrieval-augmented-generation-overview

A sample technical implementation below, using OpenAI LLM. The hard part is implementing some effective local search on the enterprise data. If that part is done correctly then the other bits are normal programming challenges.

1. Document Indexing:

- Use an indexing tool like Elasticsearch, Apache Lucene, or other search technologies to index your documents. This process involves creating a searchable index of your documents that can be quickly queried.

2. Embedding-based Retrieval:

- Use embeddings to represent your documents and queries in a vector space. You can leverage pre-trained models like Sentence-BERT or other embedding models to create vector representations of your documents.

- Store these embeddings in a vector database such as FAISS, Pinecone, or Milvus.

3. Query Processing:

- When a user submits a query, convert the query into an embedding using the same model you used to embed your documents.

- Perform a similarity search in your vector database to find the most relevant documents or snippets based on the query embedding.

4. Context Augmentation:

- Retrieve the top relevant snippets or documents from your local source based on the similarity search.

- Combine the user query with these retrieved snippets to create an augmented context.

5. Sending to OpenAI:

- Send the augmented context (user query + relevant local snippets) to the OpenAI API for response generation.

- The OpenAI model can now use this enriched context to generate a more accurate and relevant response.

6. Response Delivery:

- Receive the response from OpenAI and deliver it to the user through your application interface.

### Example Workflow in Detail:

1. User Query: “How do I apply for a local business permit?”

2. Embedding Creation:

- Convert the query to an embedding using a model like Sentence-BERT.

3. Similarity Search:

- Use the query embedding to search for the top relevant document embeddings in your vector database (FAISS, Pinecone, etc.).

- Retrieve the top-k documents or snippets that are most similar to the query embedding.

4. Context Augmentation:

- Combine the query with the retrieved snippets: “User query: How do I apply for a local business permit? Retrieved snippet: To apply for a local business permit, you need to fill out form XYZ and submit it to the local council along with the required documents.”

5. Send to OpenAI:

- Send the augmented text to OpenAI: “How do I apply for a local business permit? To apply for a local business permit, you need to fill out form XYZ and submit it to the local council along with the required documents.”

6. Receive and Deliver Response:

- OpenAI generates a detailed response based on the augmented context.

- Response: “To apply for a local business permit, first, fill out form XYZ, which you can download from the local council’s website. Ensure you have all the required documents, such as proof of identity and business registration. Submit these to the local council office. You can find more details on their official website or contact their support desk.”

The Summary: Why RAG is Awesome

Here’s why it’s so valuable:

  • Accuracy: By pulling in current data (from your data sources), RAG ensures that you’re working with the most up-to-date information.
  • Relevance: The augmentation phase filters out irrelevant data, saving you time and effort.
  • Clarity: The generation phase (AI LLM's smartness and content humanisation powers) presents the information in a clear, readable format, making complex data easy to understand.
  • Cost Optimisation: By filtering and passing on only limited filtered data we can reduce the LLM costs
  • Scalability: Following this pattern the model could quickly be scaled up and enhanced by adding more data sources.

So, there is no doubt we are going to see a large number of RAG applications appearing on the enterprise horizon. Therefore, the sooner we start in this space, the more beneficial it will be for your orgnaisation against the never ending market competition.




要查看或添加评论,请登录

Hassan Syed的更多文章

社区洞察

其他会员也浏览了