Build a multi-agent RAG system with Granite

Build a multi-agent RAG system with Granite

Written by: Kelly Abuelsaad

Artificial intelligence (AI)?agents?are?generative AI (genAI)?systems or programs capable of autonomously designing and executing task workflows using available tools. Can you build agentic workflows without needing extremely large, costly?large language models (LLMs)?

The answer is yes. In this tutorial, we will demonstrate how to build a multi agent RAG system locally. ?

Agentic RAG overview

Retrieval-Augmented Generation (RAG)?is an effective way of providing an LLM with additional datasets from various data sources without the need for expensive fine-tuning. Similarly, agentic RAG leverages an?AI agent’s ability to plan and execute subtasks along with the retrieval of relevant information to supplement an LLM's knowledge base. This allows for the optimization and greater scalability of RAG applications.?

The future of agentic RAG is multi agent RAG, where several specialized agents collaborate to achieve optimal latency and efficiency. We will demonstrate this using a small, efficient model like Granite 3.1 and combine it with a modular agent architecture. We will use multiple specialized "mini agents" that collaborate to achieve tasks through adaptive planning and tool calling. Like humans, a team of agents, or a?multi agent system, often outperforms the heroic efforts of an individual, especially when they have clearly defined roles and effective communication.

For the orchestration of this collaboration, we can use?AutoGen (AG2)?as the core framework to manage workflows and decision-making, alongside other tools like?Ollama?for local LLM serving and?Open WebUI?for interaction. Notably, every one of these components is?open source. Together, these tools enable you to build an AI system that is both powerful and privacy-conscious—all without leaving your laptop.

Multi agent architecture: When collaboration beats competition

Our Granite Retrieval Agent relies on a modular architecture in which each agent has a specialized role. Much like humans, agents perform best when they have targeted instructions and just enough context to make an informed decision. Too much extraneous information, such as an unfiltered chat history, can create a “needle in the haystack” problem, where it becomes increasingly difficult to decipher signal from noise.

In this agentic AI architecture, the agents work together sequentially to achieve the goal. Here is how the generative AI system is organized:

Planner Agent: Creates the initial high-level plan, once, in the beginning of the workflow. For example, if a user asks, “What are comparable open source projects to the ones my team is using?” then the agent will put together a step-by-step plan that may look something like this: “1. Search team documents for open source technologies. 2. Search the web for similar open source projects to the ones found in step 1.” If any of these steps fail or provide insufficient results, the steps can be later adapted by the Reflection Agent.

Research Assistant: The Research Assistant is the workhorse of the system. It takes in and executes instructions such as, “Search team documents for open source technologies.” For step 1 of the plan, it uses the initial instruction from the Planner Agent. For subsequent steps, it also receives curated context from the outcomes of previous steps.

For example, if tasked with “Search the web for similar open source projects,” it will also receive the output from the previous document search step. Depending on the instruction, the Research Assistant can use tools like web search or document search, or both, to fulfill its task.

Summarizer Agent: The Summarizer Agent condenses the Research Assistant’s findings into a concise, relevant response. For example, if the Research Assistant finds detailed meeting notes stating, “We discussed the release of Tool X that uses Tool Y underneath,” then the Summarizer Agent extracts only the relevant snippets such as, "Tool Y is being used," and reformulates it to directly answer the original instruction. This may seem like a small detail, but it can help give higher quality results and keep the model on task, especially as one step builds upon the output of another step.

Critic Agent: The Critic Agent is responsible for deciding whether the output of the previous step satisfactorily fulfilled the instruction it was given. It receives two pieces of information: the single step instruction that was just executed and the output of that instruction from Summarizer Agent. Having a Critic Agent weigh in on the conversation brings clarity around whether the goal was achieved, which is needed for the planning of the next step.

Reflection Agent: The reflection agent is our executive decision maker. It decides what step to take next, whether that is encroaching onto the next planned step, pivoting course to make up for mishaps or giving the thumbs up that the goal has been completed. Much like a real-life CEO, it performs its best decision making when it has a clear goal in mind and is presented with concise findings on the progress that has or has not been made to reach that goal. The output of the Reflection Agent is either the next step to take or the instructions to terminate if the goal has been reached. We present the Reflection Agent with the following items:

  • The goal.
  • The original plan.
  • The last step executed.
  • The output of the Summarizer and Critic Agents from the last step.
  • A concise sequence of previously executed instructions (just the instructions, not their output).

Presenting these items in a structured format makes it clear to our decision maker what has been done so that it can decide what needs to happen next.

Report Generator: Once the goal is achieved, the Report Generator synthesizes all findings into a cohesive output that directly answers the original query. While each step in the process generates targeted outputs, the Report Generator ties everything together into a final report.

Leveraging open source tools

For beginners, it can be difficult to build an agentic AI application from scratch. Hence, we will use a set of open source tools.

The following architecture diagram illustrates how the Granite Retrieval Agent integrates multiple tools for agentic RAG.

Open WebUI: The user interacts with the system through an intuitive chat interface hosted in Open WebUI. This interface acts as the primary point for submitting queries (such as “Fetch me the latest news articles pertaining to my project notes”) and viewing the outputs.

Python-based agent (AG2 Framework): At the core of the system is a Python-based agent built using AutoGen (AG2). This agent coordinates the workflow by breaking down tasks and dynamically calling tools to execute steps.

The agent has access to two primary tools:

  • Document search tool: Fetches relevant information from a vector database containing uploaded project notes or documents stored as embeddings. This vector search leverages the built-in documental retrieval APIs inside of Open WebUI, rather than setting up an entirely separate data store.
  • Web search tool: Performs web-based searches to gather external knowledge and real-time information. In this case, we are using?SearXNG?as our metasearch engine.

Ollama: The IBM Granite 3.1 LLM serves as the language model powering the system. It is hosted locally using Ollama, ensuring fast inference, cost efficiency and data privacy.

Other common open source, agent frameworks not covered in this tutorial include?LangChain,?LangGraph?and?crewAI.

Step 1: Install Ollama

Installing Ollama is as simple as running the following command in your terminal. The full installation instructions can also be found in?Ollama's README file?on GitHub.

On Mac OS X:

brew install ollama        

On Linux:

curl -fsSL https://ollama.com/install.sh | sh        

Now, run Ollama and install the Granite 3.1 LLM. Another open source model option can be?Llama 3.

ollama serve
ollama pull granite3.1-dense:8b        

You are now up and running with Ollama and Granite.

Step 2. Install Open WebUI

In your terminal, install and run Open WebUI.

pip install open-webui
open-webui serve        

Step 3. Set up SearXNG for web search

SearXNG is a metasearch engine that aggregates retrieved information from multiple search engines. The reason for its inclusion in this architecture is that it requires no SaaS API key, as it can run directly on your laptop.

For more in-depth instructions on how to run Searxng, refer to the?Open WebUI Documentation, detailing integration with Searxng. Here is a quick walk-through:

1. Create configuration files for Searxng.

 mkdir ~/searxng
 cd ~/searxng        

2. Create a new file in the?~/searxng?directory called?settings.yml?and copy this code into the file.

 # see https://docs.searxng.org/admin/settings/settings.html#settings-use-default-settings
 use_default_settings: true

 server:
   secret_key: "ultrasecretkey"  # change this!
   limiter: false
   image_proxy: true
   port: 8080
   bind_address: "0.0.0.0"

 ui:
   static_use_hash: true

 search:
   safe_search: 0
   autocomplete: ""
   default_lang: ""
 formats:
     - html
     - json        

3. Create a new file in the?~/searxng?directory called?uwsgi.ini. You can populate it with the values from the?example uwsgi.ini from Searxng Github.

4. Run the SearXNG docker image in your terminal.

 docker pull searxng/searxng
 docker run -d --name searxng -p 8888:8080 -v ~/searxng:/etc/searxng --restart     always searxng/searxng:latest        

Note: SearXNG and Open WebUI both run on port 8080, so we can map SearXNG to the local machine port 8888.

This agent uses the SearXNG API directly, so you do not need to follow the steps in the Open WebUI documentation to setup SearXNG in the UI of Open WebUI. It is only necessary if you want to use SearXNG via the Open WebUI interface apart from this agent.

Step 4. Import the agent into Open WebUI

  1. In your browser, go to?https://localhost:8080/?to access Open Web UI. If it is your first time opening the Open WebUI interface, register a username and password. This information is kept entirely local to your machine.
  2. After logging in, click the icon on the lower left-hand side where your username is. From the pop-up menu, click?Admin panel.
  3. At the top of the menu, click?Functions.
  4. At the top right, click the?+?sign to add a new function.
  5. Give the function a name, such as "Granite RAG Agent," and a description, both of?str?type.
  6. Paste the contents of?granite_autogen_rag.py?into the text box provided, replacing any existing content.
  7. Click?Save?at the bottom of the screen.
  8. Back on the?Functions?page, make sure the agent is toggled to?Enabled.
  9. Click the gear icon next to the enablement toggle to customize any settings such as the inference endpoint, the SearXNG endpoint or the model ID.

Now, your brand new "Granite Agent" shows up as a model in the Open WebUI interface. You can select it and provide it with user queries.

Summary

A multi agent setup enables the creation of practical, usable tools by getting the most out of moderately sized, open source models like Granite 3.1. This agentic RAG architecture, built with fully open source tools, can serve as a launching point to design and customize your own agents and AI algorithms or be used outside of the box for a wide array of use cases.

Acknowledgements

A tremendous Thank You to Anna Gutowska for her refinement and editing of this article's content.

Next steps

Explore this demo project in the?GitHub repository.

Try out Granite in the IBM Granite Playground

Read more about?Granite models.


Rodrigo Dias de Oliveira

Full Stack Developer | Oracle OCI | Azure | Java | Spring Framework | React Framework | Python | TRI MINDS AI SOLUTIONS Co-Funder and Product Manager

3 小时前

Thanks for sharing IBM

Leslie (Lez) Gonsalves

Strategic Enterprise Architecture | Banking-Financial Markets-Payments | Data Regulation & Governance | Embedded Finance, AI & FinTech

8 小时前

Thanks for the tutorial IBM watsonx

Arshiya Abdul Latheef

Software Engineer @IntegraConnect | 4x Certified in Microsoft and Google | Creative Thinker | Problem Solver | Innovator | ISB Shastra Pratibha 2015 Gold Medalist | Ex-Accenture

15 小时前

Such a range of powerful game changing innovations!!! Looking forward to many more!!! IBM watsonx ????

Yvonne McGinnis

DevOps Engineering, ITMLP student. l'm looking to further my educational training prior to working, or while I work. I would like to spend 4 hours in school, and 4 hours at work daily.

1 天前

Absolutely!

Jeff Geissler

IBM Cloud Business Program Manager - CSM, PMP, CISA

1 天前

Super helpful

要查看或添加评论,请登录

IBM watsonx的更多文章