Retrieval-Augmented Generation (RAG) Workflow

Retrieval-Augmented Generation (RAG) Workflow

Why I created my own RAG Diagram:

1. Highlight the Importance of data preparation?– To illustrate that?unstructured data?must be?prepared and embedded?before it can be queried in a RAG system.

2. Emphasize dedicated Query Tools?– To show the need for creating?dedicated tools for each document, such as:

  • Summary tool?– For high-level overviews.
  • Vector tool?– For similarity searches.
  • Keyword search tool?– For text-based lookups.

3. Showcase LLM function calling?– To contrast the?LLM’s ability to call tools dynamically?with?simple RAG systems,?where the?orchestration framework?handles all tool invocation.

4. Lay the groundwork for Agentic Workflows?– This diagram sets the stage for?agentic workflows,?which may be illustrated in a follow-up diagram.

RAG Workflow (2024)


Key Components and Flow


1.?Documents or Data Sources

The process starts with?preparing the data. The orchestration framework?ingests?unstructured data (text, PDFs, images, etc.) and integrates structured data sources (APIs, relational DBs, etc).

  • Unstructured data?(documents) → Embedded and stored in vector DBs.
  • Structured data?(SQL, APIs) → Queried directly during retrieval (without embedding in advance).

This forms the?knowledge base?that the system can query later.


2.?Embedding Model & Vector Database

The embedding process allows data to be?stored and searched efficiently?in the vector database.

  • Documents are split into chunks by the orchestration framework (Step 1).
  • Each chunk is passed to the embedding model (Step 2), producing a high-dimensional vector representation (Step 3).
  • The orchestration layer inserts these embeddings into the vector DB, ensuring efficient similarity searches (Step 4).

Next the query tools enabling e.g. search or summarisation for each document can be created. They are dedicated tools per document. For images, a?single VectorIndex?can handle?multiple images?(no need for per-image tools).


3.?User Query

A?user submits a query?in plain text (Step 5). This could be something like:

  • "Summarize MetaGPT’s architecture."
  • "Show me similar diagrams."

The system needs to enhance the query by retrieving relevant data.


4.?Embedding the User Query

The user query is passed to the?embedding model?for conversion into a vector embedding. (Step 6).

The result is an?embedded query that allows the system to perform similarity searches (Step 7).


5.?Retrieval Process (Triggered by Orchestration Layer or Function Calling)

The embedded query is passed to an LLM (Step 8):

  • The LLM?evaluates available tools?and may invoke one or more tools to?retrieve missing context (Steps 9 & 10).
  • Multiple tools?can be invoked in parallel if the LLM deems their data relevant.
  • The LLM?typically queries all relevant tools?rather than selectively choosing one.


6.?Combining Retrieved Data with Embedded Query

The retrieved data?and the embedded query are combined to form a?context-rich prompt?(Step 11). This augmented prompt contains the most relevant documents or data that match the user's query.


7.?Passing Context-rich Prompt to the LLM

The enriched prompt is sent to the?LLM (Step 12):

  • The LLM generates an informed, accurate response using the retrieved context.
  • If the available context is insufficient, the LLM may?dynamically invoke additional tools through?function calling?to retrieve more data.
  • This process continues until the LLM has gathered enough information to confidently generate a complete answer.

The final generation step ensures the output is grounded in external knowledge rather than relying solely on the LLM’s internal data.


8.?Final Output

The LLM produces the?final response, which is returned to the user. The output reflects the?combined knowledge?from the?vector DB, relational DBs, APIs and other used data sources.


Additional Notes on Orchestration and Function Calling:

  • Basic RAG:?The orchestration framework (e.g. LangChain or LlamaIndex) retrieves data from?all relevant sources?by default.
  • Advanced (Function Calling): The LLM can dynamically?invoke query tools?if it determines that additional data is needed. This process is adaptive, allowing?parallel queries?to multiple tools (vector DB, API, SQL).




要查看或添加评论,请登录

Jenya Stoeva的更多文章

社区洞察

其他会员也浏览了