How to Build an AI Agent Without Using Any Libraries: A Step-by-Step Guide

How to Build an AI Agent Without Using Any Libraries: A Step-by-Step Guide

Video Tutorial

If you prefer video tutorials, below is the video version.

The Adaptive Engineer - How to build AI Agent

The Context

Try asking these questions to ChatGPT or Claude and you will see a response that looks like below.

We know LLMs cannot access real-time information - so let's use this as a problem statement for our basic AI Agent.

Before we begin - let's understand the fundamental building blocks of an agent.

The Adaptive Engineer - Primary components of an Agent

An agent essentially comprises three primary components - the Models, the Tool, and the Reasoning Loop.

The model refers to the Large Language Model (LLM) at the core of the AI agent. This is the foundation of the agent's intelligence and capabilities. The LLM is trained on vast amounts of text data, allowing it to understand and generate human-like text. It provides the agent with knowledge, language understanding, and the ability to process complex instructions.

Tools are specific functions or capabilities that the AI agent can use to interact with its environment or perform tasks

The reasoning loop is the iterative process that allows the agent to make decisions, solve problems, and accomplish tasks.

In the context of our use case, the following is how the 3 components would work together.

In this tutorial, we'll create a more advanced LLM agent that uses the concept of "tools" to perform various tasks. This approach allows for greater flexibility and easier expansion of the agent's capabilities.

Prerequisites

  • Basic Python programming knowledge
  • Understanding of LLMs and their capabilities
  • Python 3.7 or higher

Implementing the Tool-based Agent

Let's start by creating the tool interface. This will allow us to build tools that have to implement the required methods

import datetime
import requests
from zoneinfo import ZoneInfo
from abc import ABC, abstractmethod

class Tool(ABC):
    @abstractmethod
    def name(self) -> str:
        pass

    @abstractmethod
    def description(self) -> str:
        pass

    @abstractmethod
    def use(self, *args, **kwargs):
        pass
        

Now, let's implement some tools:

class TimeTool(Tool):
    def name(self):
        return "Time Tool"

    def description(self):
        return "Provides the current time for a given city's timezone like Asia/Kolkata, America/New_York etc. If no timezone is provided, it returns the local time."

    def use(self, *args, **kwargs):        
        format = "%Y-%m-%d %H:%M:%S %Z%z"
        current_time = datetime.datetime.now()        
        input_timezone = args[0]
        if input_timezone:
            print("TimeZone", input_timezone)            
            current_time =  current_time.astimezone(ZoneInfo(input_timezone))            
        return f"The current time is {current_time}."


class WeatherTool(Tool):
    def name(self):
        return "Weather Tool"

    def description(self):
        return "Provides weather information for a given location"

    def use(self, *args, **kwargs):
        location = args[0].split("weather in ")[-1]
        api_key = userdata.get("OPENWEATHERMAP_API_KEY")
        url = f"https://api.openweathermap.org/data/2.5/weather?q={location}&appid={api_key}&units=metric"
        response = requests.get(url)
        data = response.json()
        if data["cod"] == 200:
            temp = data["main"]["temp"]
            description = data["weather"][0]["description"]
            return f"The weather in {location} is currently {description} with a temperature of {temp}°C."
        else:
            return f"Sorry, I couldn't find weather information for {location}."
        

Now let's implement the all-important agent.

import requests
import json
import ast

class Agent:
    def __init__(self):        
        self.tools = []
        self.memory = []
        self.max_memory = 10

    def add_tool(self, tool: Tool):
        self.tools.append(tool)

    def json_parser(self, input_string):
      python_dict = ast.literal_eval(input_string)
      json_string = json.dumps(python_dict)
      json_dict = json.loads(json_string)

      if isinstance(json_dict, dict):
        return json_dict        

      raise "Invalid JSON response"    


    def process_input(self, user_input):
        self.memory.append(f"User: {user_input}")
        self.memory = self.memory[-self.max_memory:]

        context = "\n".join(self.memory)
        tool_descriptions = "\n".join([f"- {tool.name()}: {tool.description()}" for tool in self.tools])
        response_format = {"action":"", "args":""}

        prompt = f"""Context:
        {context}

        Available tools:
        {tool_descriptions}

        Based on the user's input and context, decide if you should use a tool or respond directly.
        Sometimes you might have to use multiple tools to solve user's input. You have to do that in a loop.
        If you identify a action, respond with the tool name and the arguments for the tool.
        If you decide to respond directly to the user then make the action "respond_to_user" with args as your response in the following format.

        Response Format: 
        {response_format}

        """

        response = self.query_llm(prompt)
        self.memory.append(f"Agent: {response}")

        response_dict = self.json_parser(response)

        # Check if any tool can handle the input
        for tool in self.tools:
            if tool.name().lower() == response_dict["action"].lower():
                return tool.use(response_dict["args"])

        return response_dict

    def query_llm(self, prompt):        
        api_key = userdata.get("OPENAI_API_KEY")
        headers = {
            "Content-Type": "application/json",
            "Authorization": f"Bearer {api_key}"
        }
        data = {
            "model": "gpt-4o-mini-2024-07-18",
            "messages": [{"role": "user", "content": prompt}],
            "max_tokens": 150
        }
        response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, data=json.dumps(data))
        final_response = response.json()['choices'][0]['message']['content'].strip()
        print("LLM Response ", final_response)
        return final_response

    def run(self):
      print("LLM Agent: Hello! How can I assist you today?")
      user_input = input("You: ")

      while True:
        if user_input.lower() in ["exit", "bye", "close"]:
          print("See you later!")
          break

        response = self.process_input(user_input)
        if isinstance(response, dict) and response["action"] == "respond_to_user":
          print("Reponse from Agent: ", response["args"])
          user_input = input("You: ")
        else:
          user_input = response
        

Finally, let's create the main script to run our tool-based agent:

from google.colab import userdata

def main():    
    agent = Agent()

    # Add tools to the agent
    agent.add_tool(TimeTool())    
    agent.add_tool(WeatherTool())
    agent.run()   

if __name__ == "__main__":
    main()
        

How It Works

  1. The Agent class manages the overall operation of the agent.
  2. Each tool is implemented as a separate class inheriting from the Tool abstract base class.
  3. The agent checks if any tool can handle the user's input directly.
  4. If no tool matches, the agent uses the LLM to decide whether to use a tool or respond directly.
  5. The LLM is provided with context, including the conversation history and available tools.

Adding New Tools

To add a new tool to the agent, simply create a new class that inherits from Tool and implement the required methods. Then, add an instance of your new tool to the agent using the add_tool method.

Conclusion

You can further enhance this agent by:

  • Implementing more sophisticated tools
  • Adding error handling and input validation
  • Improving the LLM prompts for better tool selection
  • Implementing a more advanced memory system

Code Repository

GitHub - zahere-dev/basic-ai-agent: Experimental basic AI Agent without using any libraries.

Experimental basic AI Agent without using any libraries. - zahere-dev/basic-ai-agent

github.com


My previous articles

https://www.dhirubhai.net/newsletters/7162380998746234880/

Chirawat Chitpakdee

Technology Lead AI & Automation workflow | Expertise RPA and Automation | Expertise in Performance Testing

7 个月

Could you please suggest the code example for this scenario below I want to build a local AI chatbot that understands and responds in Thai. I already have a Thai language model, but its performance in function prediction is not satisfactory. Therefore, I need to integrate an orchestrator model, specifically the 'Functionary' from MeetKai, which excels in function prediction. My idea is to have the Thai LLM translate user queries into English, then send these translated queries to the orchestrator model. After receiving the response from the orchestrator model (including tool execution), the Thai LLM should summarize the response and deliver the answer back to the user in Thai.

回复
Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

7 个月

Building an AI agent from scratch without libraries necessitates a deep understanding of core concepts like symbolic reasoning and knowledge representation. Implementing a tool-based architecture requires careful consideration of interfaces and communication protocols between different modules, potentially leveraging techniques like message passing or event-driven architectures. How would you design a memory system that effectively handles both short-term and long-term information while maintaining computational efficiency in this agent?

要查看或添加评论,请登录

Zahiruddin Tavargere的更多文章

社区洞察

其他会员也浏览了