Build a multi-agent RAG system with Granite
IBM watsonx
Watsonx is an enterprise-ready AI and data platform designed to multiply the impact of AI across your business.
Written by: Kelly Abuelsaad
Artificial intelligence (AI)?agents?are?generative AI (genAI)?systems or programs capable of autonomously designing and executing task workflows using available tools. Can you build agentic workflows without needing extremely large, costly?large language models (LLMs)?
The answer is yes. In this tutorial, we will demonstrate how to build a multi agent RAG system locally. ?
Agentic RAG overview
Retrieval-Augmented Generation (RAG)?is an effective way of providing an LLM with additional datasets from various data sources without the need for expensive fine-tuning. Similarly, agentic RAG leverages an?AI agent’s ability to plan and execute subtasks along with the retrieval of relevant information to supplement an LLM's knowledge base. This allows for the optimization and greater scalability of RAG applications.?
The future of agentic RAG is multi agent RAG, where several specialized agents collaborate to achieve optimal latency and efficiency. We will demonstrate this using a small, efficient model like Granite 3.1 and combine it with a modular agent architecture. We will use multiple specialized "mini agents" that collaborate to achieve tasks through adaptive planning and tool calling. Like humans, a team of agents, or a?multi agent system, often outperforms the heroic efforts of an individual, especially when they have clearly defined roles and effective communication.
For the orchestration of this collaboration, we can use?AutoGen (AG2)?as the core framework to manage workflows and decision-making, alongside other tools like?Ollama?for local LLM serving and?Open WebUI?for interaction. Notably, every one of these components is?open source. Together, these tools enable you to build an AI system that is both powerful and privacy-conscious—all without leaving your laptop.
Multi agent architecture: When collaboration beats competition
Our Granite Retrieval Agent relies on a modular architecture in which each agent has a specialized role. Much like humans, agents perform best when they have targeted instructions and just enough context to make an informed decision. Too much extraneous information, such as an unfiltered chat history, can create a “needle in the haystack” problem, where it becomes increasingly difficult to decipher signal from noise.
In this agentic AI architecture, the agents work together sequentially to achieve the goal. Here is how the generative AI system is organized:
Planner Agent: Creates the initial high-level plan, once, in the beginning of the workflow. For example, if a user asks, “What are comparable open source projects to the ones my team is using?” then the agent will put together a step-by-step plan that may look something like this: “1. Search team documents for open source technologies. 2. Search the web for similar open source projects to the ones found in step 1.” If any of these steps fail or provide insufficient results, the steps can be later adapted by the Reflection Agent.
Research Assistant: The Research Assistant is the workhorse of the system. It takes in and executes instructions such as, “Search team documents for open source technologies.” For step 1 of the plan, it uses the initial instruction from the Planner Agent. For subsequent steps, it also receives curated context from the outcomes of previous steps.
For example, if tasked with “Search the web for similar open source projects,” it will also receive the output from the previous document search step. Depending on the instruction, the Research Assistant can use tools like web search or document search, or both, to fulfill its task.
Summarizer Agent: The Summarizer Agent condenses the Research Assistant’s findings into a concise, relevant response. For example, if the Research Assistant finds detailed meeting notes stating, “We discussed the release of Tool X that uses Tool Y underneath,” then the Summarizer Agent extracts only the relevant snippets such as, "Tool Y is being used," and reformulates it to directly answer the original instruction. This may seem like a small detail, but it can help give higher quality results and keep the model on task, especially as one step builds upon the output of another step.
Critic Agent: The Critic Agent is responsible for deciding whether the output of the previous step satisfactorily fulfilled the instruction it was given. It receives two pieces of information: the single step instruction that was just executed and the output of that instruction from Summarizer Agent. Having a Critic Agent weigh in on the conversation brings clarity around whether the goal was achieved, which is needed for the planning of the next step.
Reflection Agent: The reflection agent is our executive decision maker. It decides what step to take next, whether that is encroaching onto the next planned step, pivoting course to make up for mishaps or giving the thumbs up that the goal has been completed. Much like a real-life CEO, it performs its best decision making when it has a clear goal in mind and is presented with concise findings on the progress that has or has not been made to reach that goal. The output of the Reflection Agent is either the next step to take or the instructions to terminate if the goal has been reached. We present the Reflection Agent with the following items:
Presenting these items in a structured format makes it clear to our decision maker what has been done so that it can decide what needs to happen next.
Report Generator: Once the goal is achieved, the Report Generator synthesizes all findings into a cohesive output that directly answers the original query. While each step in the process generates targeted outputs, the Report Generator ties everything together into a final report.
Leveraging open source tools
For beginners, it can be difficult to build an agentic AI application from scratch. Hence, we will use a set of open source tools.
The following architecture diagram illustrates how the Granite Retrieval Agent integrates multiple tools for agentic RAG.
Open WebUI: The user interacts with the system through an intuitive chat interface hosted in Open WebUI. This interface acts as the primary point for submitting queries (such as “Fetch me the latest news articles pertaining to my project notes”) and viewing the outputs.
Python-based agent (AG2 Framework): At the core of the system is a Python-based agent built using AutoGen (AG2). This agent coordinates the workflow by breaking down tasks and dynamically calling tools to execute steps.
The agent has access to two primary tools:
Ollama: The IBM Granite 3.1 LLM serves as the language model powering the system. It is hosted locally using Ollama, ensuring fast inference, cost efficiency and data privacy.
Other common open source, agent frameworks not covered in this tutorial include?LangChain,?LangGraph?and?crewAI.
Step 1: Install Ollama
Installing Ollama is as simple as running the following command in your terminal. The full installation instructions can also be found in?Ollama's README file?on GitHub.
On Mac OS X:
brew install ollama
On Linux:
curl -fsSL https://ollama.com/install.sh | sh
Now, run Ollama and install the Granite 3.1 LLM. Another open source model option can be?Llama 3.
ollama serve
ollama pull granite3.1-dense:8b
You are now up and running with Ollama and Granite.
Step 2. Install Open WebUI
In your terminal, install and run Open WebUI.
pip install open-webui
open-webui serve
Step 3. Set up SearXNG for web search
SearXNG is a metasearch engine that aggregates retrieved information from multiple search engines. The reason for its inclusion in this architecture is that it requires no SaaS API key, as it can run directly on your laptop.
For more in-depth instructions on how to run Searxng, refer to the?Open WebUI Documentation, detailing integration with Searxng. Here is a quick walk-through:
1. Create configuration files for Searxng.
mkdir ~/searxng
cd ~/searxng
2. Create a new file in the?~/searxng?directory called?settings.yml?and copy this code into the file.
# see https://docs.searxng.org/admin/settings/settings.html#settings-use-default-settings
use_default_settings: true
server:
secret_key: "ultrasecretkey" # change this!
limiter: false
image_proxy: true
port: 8080
bind_address: "0.0.0.0"
ui:
static_use_hash: true
search:
safe_search: 0
autocomplete: ""
default_lang: ""
formats:
- html
- json
3. Create a new file in the?~/searxng?directory called?uwsgi.ini. You can populate it with the values from the?example uwsgi.ini from Searxng Github.
4. Run the SearXNG docker image in your terminal.
docker pull searxng/searxng
docker run -d --name searxng -p 8888:8080 -v ~/searxng:/etc/searxng --restart always searxng/searxng:latest
Note: SearXNG and Open WebUI both run on port 8080, so we can map SearXNG to the local machine port 8888.
This agent uses the SearXNG API directly, so you do not need to follow the steps in the Open WebUI documentation to setup SearXNG in the UI of Open WebUI. It is only necessary if you want to use SearXNG via the Open WebUI interface apart from this agent.
Step 4. Import the agent into Open WebUI
Now, your brand new "Granite Agent" shows up as a model in the Open WebUI interface. You can select it and provide it with user queries.
Summary
A multi agent setup enables the creation of practical, usable tools by getting the most out of moderately sized, open source models like Granite 3.1. This agentic RAG architecture, built with fully open source tools, can serve as a launching point to design and customize your own agents and AI algorithms or be used outside of the box for a wide array of use cases.
Acknowledgements
A tremendous Thank You to Anna Gutowska for her refinement and editing of this article's content.
Next steps
Explore this demo project in the?GitHub repository.
Read more about?Granite models.
Full Stack Developer | Oracle OCI | Azure | Java | Spring Framework | React Framework | Python | TRI MINDS AI SOLUTIONS Co-Funder and Product Manager
3 小时前Thanks for sharing IBM
Strategic Enterprise Architecture | Banking-Financial Markets-Payments | Data Regulation & Governance | Embedded Finance, AI & FinTech
8 小时前Thanks for the tutorial IBM watsonx
Software Engineer @IntegraConnect | 4x Certified in Microsoft and Google | Creative Thinker | Problem Solver | Innovator | ISB Shastra Pratibha 2015 Gold Medalist | Ex-Accenture
15 小时前Such a range of powerful game changing innovations!!! Looking forward to many more!!! IBM watsonx ????
DevOps Engineering, ITMLP student. l'm looking to further my educational training prior to working, or while I work. I would like to spend 4 hours in school, and 4 hours at work daily.
1 天前Absolutely!
IBM Cloud Business Program Manager - CSM, PMP, CISA
1 天前Super helpful