How to Build an AI Agent Without Using Any Libraries: A Step-by-Step Guide
Zahiruddin Tavargere
Senior Principal Software Engineer@Dell | Opinions are my own
Video Tutorial
If you prefer video tutorials, below is the video version.
The Context
Try asking these questions to ChatGPT or Claude and you will see a response that looks like below.
We know LLMs cannot access real-time information - so let's use this as a problem statement for our basic AI Agent.
Before we begin - let's understand the fundamental building blocks of an agent.
An agent essentially comprises three primary components - the Models, the Tool, and the Reasoning Loop.
The model refers to the Large Language Model (LLM) at the core of the AI agent. This is the foundation of the agent's intelligence and capabilities. The LLM is trained on vast amounts of text data, allowing it to understand and generate human-like text. It provides the agent with knowledge, language understanding, and the ability to process complex instructions.
Tools are specific functions or capabilities that the AI agent can use to interact with its environment or perform tasks
The reasoning loop is the iterative process that allows the agent to make decisions, solve problems, and accomplish tasks.
In the context of our use case, the following is how the 3 components would work together.
In this tutorial, we'll create a more advanced LLM agent that uses the concept of "tools" to perform various tasks. This approach allows for greater flexibility and easier expansion of the agent's capabilities.
Prerequisites
Implementing the Tool-based Agent
Let's start by creating the tool interface. This will allow us to build tools that have to implement the required methods
领英推荐
import datetime
import requests
from zoneinfo import ZoneInfo
from abc import ABC, abstractmethod
class Tool(ABC):
@abstractmethod
def name(self) -> str:
pass
@abstractmethod
def description(self) -> str:
pass
@abstractmethod
def use(self, *args, **kwargs):
pass
Now, let's implement some tools:
class TimeTool(Tool):
def name(self):
return "Time Tool"
def description(self):
return "Provides the current time for a given city's timezone like Asia/Kolkata, America/New_York etc. If no timezone is provided, it returns the local time."
def use(self, *args, **kwargs):
format = "%Y-%m-%d %H:%M:%S %Z%z"
current_time = datetime.datetime.now()
input_timezone = args[0]
if input_timezone:
print("TimeZone", input_timezone)
current_time = current_time.astimezone(ZoneInfo(input_timezone))
return f"The current time is {current_time}."
class WeatherTool(Tool):
def name(self):
return "Weather Tool"
def description(self):
return "Provides weather information for a given location"
def use(self, *args, **kwargs):
location = args[0].split("weather in ")[-1]
api_key = userdata.get("OPENWEATHERMAP_API_KEY")
url = f"https://api.openweathermap.org/data/2.5/weather?q={location}&appid={api_key}&units=metric"
response = requests.get(url)
data = response.json()
if data["cod"] == 200:
temp = data["main"]["temp"]
description = data["weather"][0]["description"]
return f"The weather in {location} is currently {description} with a temperature of {temp}°C."
else:
return f"Sorry, I couldn't find weather information for {location}."
Now let's implement the all-important agent.
import requests
import json
import ast
class Agent:
def __init__(self):
self.tools = []
self.memory = []
self.max_memory = 10
def add_tool(self, tool: Tool):
self.tools.append(tool)
def json_parser(self, input_string):
python_dict = ast.literal_eval(input_string)
json_string = json.dumps(python_dict)
json_dict = json.loads(json_string)
if isinstance(json_dict, dict):
return json_dict
raise "Invalid JSON response"
def process_input(self, user_input):
self.memory.append(f"User: {user_input}")
self.memory = self.memory[-self.max_memory:]
context = "\n".join(self.memory)
tool_descriptions = "\n".join([f"- {tool.name()}: {tool.description()}" for tool in self.tools])
response_format = {"action":"", "args":""}
prompt = f"""Context:
{context}
Available tools:
{tool_descriptions}
Based on the user's input and context, decide if you should use a tool or respond directly.
Sometimes you might have to use multiple tools to solve user's input. You have to do that in a loop.
If you identify a action, respond with the tool name and the arguments for the tool.
If you decide to respond directly to the user then make the action "respond_to_user" with args as your response in the following format.
Response Format:
{response_format}
"""
response = self.query_llm(prompt)
self.memory.append(f"Agent: {response}")
response_dict = self.json_parser(response)
# Check if any tool can handle the input
for tool in self.tools:
if tool.name().lower() == response_dict["action"].lower():
return tool.use(response_dict["args"])
return response_dict
def query_llm(self, prompt):
api_key = userdata.get("OPENAI_API_KEY")
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {api_key}"
}
data = {
"model": "gpt-4o-mini-2024-07-18",
"messages": [{"role": "user", "content": prompt}],
"max_tokens": 150
}
response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, data=json.dumps(data))
final_response = response.json()['choices'][0]['message']['content'].strip()
print("LLM Response ", final_response)
return final_response
def run(self):
print("LLM Agent: Hello! How can I assist you today?")
user_input = input("You: ")
while True:
if user_input.lower() in ["exit", "bye", "close"]:
print("See you later!")
break
response = self.process_input(user_input)
if isinstance(response, dict) and response["action"] == "respond_to_user":
print("Reponse from Agent: ", response["args"])
user_input = input("You: ")
else:
user_input = response
Finally, let's create the main script to run our tool-based agent:
from google.colab import userdata
def main():
agent = Agent()
# Add tools to the agent
agent.add_tool(TimeTool())
agent.add_tool(WeatherTool())
agent.run()
if __name__ == "__main__":
main()
How It Works
Adding New Tools
To add a new tool to the agent, simply create a new class that inherits from Tool and implement the required methods. Then, add an instance of your new tool to the agent using the add_tool method.
Conclusion
You can further enhance this agent by:
Technology Lead AI & Automation workflow | Expertise RPA and Automation | Expertise in Performance Testing
7 个月Could you please suggest the code example for this scenario below I want to build a local AI chatbot that understands and responds in Thai. I already have a Thai language model, but its performance in function prediction is not satisfactory. Therefore, I need to integrate an orchestrator model, specifically the 'Functionary' from MeetKai, which excels in function prediction. My idea is to have the Thai LLM translate user queries into English, then send these translated queries to the orchestrator model. After receiving the response from the orchestrator model (including tool execution), the Thai LLM should summarize the response and deliver the answer back to the user in Thai.
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
7 个月Building an AI agent from scratch without libraries necessitates a deep understanding of core concepts like symbolic reasoning and knowledge representation. Implementing a tool-based architecture requires careful consideration of interfaces and communication protocols between different modules, potentially leveraging techniques like message passing or event-driven architectures. How would you design a memory system that effectively handles both short-term and long-term information while maintaining computational efficiency in this agent?