From Reasoning to Action: Understanding AI Agents With Simple Program
Zahir Shaikh
Lead (Generative AI / Automation) @ T-Systems | Specializing in Automation, Large Language Models (LLM), LLAMA Index, Langchain | Expert in Deep Learning, Machine Learning, NLP, Vector Databases | RPA
Artificial Intelligence (AI) continues to evolve, and one of the most exciting developments is the concept of AI Agents. These agents operate autonomously to perform tasks, leveraging the power of tools, memory, and reasoning abilities to achieve their goals. In this article, we’ll dive deep into what AI Agents are, their critical components, and how they work, with practical examples to demonstrate their applications.
1. What are AI Agents?
An AI agent is an autonomous system that uses AI to interact with its environment, perceive information, and make decisions to achieve specific tasks. Unlike traditional models, these agents can access tools, use memory, plan their actions, and reflect on their decisions in real time. They go beyond mere question-answering, aiming to execute complex tasks by reasoning, fetching data, and dynamically adjusting based on new inputs.
2. Building Blocks of AI Agents
AI agents consist of several key components that work together to enable their autonomy and decision-making. Let’s explore each of these in detail:
2.1 Memory
AI agents rely on memory to store and retrieve information. This can be divided into:
Using Retrieval-Augmented Generation (RAG), AI agents can retrieve relevant external knowledge from large databases or documents, enabling them to access data that may not be part of their initial training.
2.2 Tools
AI agents are equipped with various tools to interact with the environment and perform actions. Some of the most common tools include:
These tools provide AI agents with the ability to perform real-world tasks, making them more dynamic and capable of handling complex scenarios.
2.3 Planning
AI agents utilize planning mechanisms to break down complex problems into manageable tasks. Key components of planning include:
2.4 Action
Finally, the Action component involves executing the planned steps, whether that’s calling an API, performing a search, or running computations. AI agents can dynamically decide when and how to act, adjusting based on the outcomes of previous actions.
3. Why AI Agents?
While Large Language Models (LLMs) like GPT-3 and GPT-4 are powerful, they have limitations in reasoning, accessing real-time information, and performing complex tasks that require decision-making and planning. LLMs are restricted to the data they were trained on, lacking access to live data, databases, and domain-specific knowledge.
AI agents address these limitations by combining LLMs with external tools and reasoning capabilities, enabling them to handle more complex tasks dynamically.
Limitations of LLMs:
How AI Agents Overcome These Limitations:
In short, AI agents extend the capabilities of LLMs by enabling real-time data retrieval, autonomous reasoning, and dynamic actions, making them ideal for complex workflows like research, automation, and decision-making.
4. Four Core Components of AI Agent
Let’s explore the core components of an AI agent using Llama Index, where each element adds distinct capabilities (Refer to the program below for the definition of each component):
领英推荐
4.1 FunctionTool
FunctionTool enables AI agents to execute dynamic functions (e.g., database searches, API calls) within their workflows, extending them beyond simple text responses.
4.2 FunctionCallingAgent
FunctionCallingAgent allows agents to invoke multiple tools dynamically during task execution, solving complex problems through interactions like API calls and searches.
4.3 AgentRunner
AgentRunner orchestrates the execution of agents, ensuring tasks are managed efficiently by coordinating multiple tools or agents in parallel or sequence.
4.4 FunctionCallingAgentWorker
FunctionCallingAgentWorker is responsible for executing individual tools in an agent’s workflow, handling specific tasks like API calls or data retrieval, enabling efficient task execution.
5. Agentic RAG: AI Agents with Retrieval-Augmented Generation (RAG)
5.1 What is RAG?
Retrieval-Augmented Generation (RAG) is a technique that combines the generative capabilities of large language models (LLMs) with the retrieval of relevant information from external knowledge sources (e.g., documents, databases, or APIs). In RAG, instead of relying solely on pre-trained knowledge, the model queries external data to augment its responses, making the results more accurate, up-to-date, and context-specific.
5.2 What is the RAG Pipeline?
The RAG pipeline consists of a structured process where external data is ingested, processed, and then used in combination with an LLM for generating accurate responses. The key steps include:
5.3 What is Ingestion in the RAG Pipeline?
Ingestion refers to the process of converting external documents or data into a format that can be queried by the agent. Typically, this involves:
Once the ingestion is complete, the agent can retrieve the relevant pieces of information and use them in conjunction with the LLM for response generation.
6. What is the ReAct Agent?
The ReAct Agent (short for Reasoning and Acting) represents a distinct approach to building AI agents that can dynamically reason through complex tasks by engaging in a loop of Observation and Action.
6.1 Observation & Action in ReAct Agent
This observation-action loop enables the ReAct Agent to adjust its course mid-execution, allowing it to handle complex tasks with multiple decision points. It’s ideal for scenarios where the agent needs to gather information incrementally and act based on intermediate results.
6.2 What is the ReAct Prompt?
The ReAct Prompt is a structured way to guide the agent’s decision-making process by combining natural language instructions with reasoning steps. It serves as both an input format and a strategy for breaking down the agent’s interactions into a logical flow of observation and action. There are various open source frameworks like Langchain or CrewAI that provide different prompt templates
6.3 Input and Output in ReAct Agent
7. Agentic RAG in Action (you will need google collab and openai key to run this program)
#install all dependencies
!pip install llama-index
!pip install duckduckgo-search
!pip install openai transformers
# Import necessary libraries
from llama_index.core.tools import FunctionTool
from duckduckgo_search import DDGS
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core.agent import FunctionCallingAgent, FunctionCallingAgentWorker, AgentRunner
from llama_index.core import SimpleDirectoryReader, Settings
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import SummaryIndex, VectorStoreIndex
from llama_index.core.tools import QueryEngineTool
from llama_index.core.query_engine.router_query_engine import RouterQueryEngine
from llama_index.core.selectors import LLMSingleSelector
from llama_index.core.agent import ReActAgent
# -----Define the Search Tool -----
def search(query: str) -> str:
"""
Args:
query: User prompt
Returns:
context (str): Search results based on the user query
"""
req = DDGS() # Use DuckDuckGo Search for results
response = req.text(query, max_results=4)
context = ""
for result in response:
context += result['body'] + "\n"
return context
# Create a FunctionTool from the search function
search_tool = FunctionTool.from_defaults(fn=search)
# ----- Set up OpenAI LLM and OpenAI Embedding Model -----
openai_api_key = "" # Set your OpenAI API key here
# Use OpenAI for the LLM (GPT-3.5-turbo or GPT-4, based on your model access)
llm = OpenAI(api_key=openai_api_key, model="gpt-3.5-turbo") # Replace with your model name
# Use OpenAI for embeddings
embed_model = OpenAIEmbedding(api_key=openai_api_key)
# Set up the LLM and embedding model in ServiceContext
Settings.llm = llm
Settings.embed_model = embed_model
# ----- Function Calling Agent Example -----
query = "Who won the FIFA World Cup 2022 in Qatar Saudi Arabia?"
# Function calling directly with the tools
function_call = llm.predict_and_call(
[search_tool],
user_msg=query,
allow_parallel_tool_calls=True
)
print("Function Call Response:", function_call.response)
# ----- Function Calling Agent and AgentRunner -----
# FunctionCallingAgent Example
agent = FunctionCallingAgent.from_tools(
[search_tool],
llm=llm,
verbose=True,
allow_parallel_tool_calls=True
)
response = agent.chat(query)
print("FunctionCallingAgent Response:", response)
# AgentRunner Example
agent_worker = FunctionCallingAgentWorker.from_tools(
[search_tool],
llm=llm,
verbose=True,
allow_parallel_tool_calls=True
)
agent_runner = AgentRunner(agent_worker)
runner_response = agent_runner.query(query)
print("AgentRunner Response:", runner_response.response)
# ----- Agentic RAG -----
# Load your documents (example with a PDF)
documents = SimpleDirectoryReader(input_files=["/content/kt.pdf"]).load_data()
# Split the document into nodes for processing
splitter = SentenceSplitter(chunk_size=1024)
nodes = splitter.get_nodes_from_documents(documents)
# Build summary and vector indices
summary_index = SummaryIndex(nodes)
vector_index = VectorStoreIndex(nodes)
# Set up query engines for summary and vector retrieval
summary_query_engine = summary_index.as_query_engine(response_mode="tree_summarize", use_async=True)
vector_query_engine = vector_index.as_query_engine()
# Create tools from the query engines
summary_tool = QueryEngineTool.from_defaults(
query_engine=summary_query_engine,
description="This tool summarizes the content in a simplified way."
)
vector_tool = QueryEngineTool.from_defaults(
query_engine=vector_query_engine,
description="This tool retrieves relevant information from the internet."
)
# Set up RouterQueryEngine with a single selector, we can select different selector for multiple llms
query_engine = RouterQueryEngine(
selector=LLMSingleSelector.from_defaults(),
query_engine_tools=[summary_tool, vector_tool],
verbose=True
)
# Example queries with Agentic RAG, retrieving data from uploaded pdf's
response_summary = query_engine.query("About knowledge transition")
print("Summary Tool Response:", response_summary)
response_vector = query_engine.query("Explain technical details from knowledge article?")
print("Vector Tool Response:", response_vector)
# ----- ReAct Agent Example -----
# Define a ReAct agent that uses the search tool and LLM
react_agent = ReActAgent.from_tools([search_tool], llm=llm, verbose=True, allow_parallel_tool_calls=True)
# Example ReAct agent query
react_response = react_agent.chat("Tell me about Generative AI.")
print("ReAct Agent Response:", react_response)