OpenAI Agents SDK: A Step-by-Step Guide to building your first agent

OpenAI Agents SDK: A Step-by-Step Guide to building your first agent

Key Features and Benefits

I will guide you through the OpenAI Agents SDK, a powerful tool designed to simplify the creation of AI agents. The SDK offers several key features that make building and managing agents more efficient:

Simplified Multi-Agent Handoff: Easily manage interactions between multiple agents, allowing for seamless task transitions.

Built-in Guardrails: Implement safety and control mechanisms to ensure responsible agent behavior.

Easy Lifecycle Hooks: Access and interact with agents at different stages of their execution, enabling custom actions and monitoring.

Seamless Tool Integration: Integrate tools like web search and computer use effortlessly, expanding agent capabilities.

By the end of this guide, you'll understand the fundamental concepts and be ready to start building your own AI agents.

What We Will Cover

We'll start with the basics, creating a simple agent, and then gradually introduce more advanced features. Here’s a roadmap of what we'll cover:

  • Creating a Basic Agent: We'll begin by building a fundamental agent without tools to understand the core SDK components.
  • Adding Advanced Features: We'll then enhance our agent with context management and tool integration to create more realistic and capable agents.
  • Integrating Multiple Tools: You'll learn how to equip your agents with multiple tools and manage the workflow between them.
  • Running and Monitoring Agents: We'll explore how to execute agents, track their progress, and analyze execution traces using the OpenAI platform.

Building Your First Agent: The Basics

Let's start by creating a very basic agent using the openai-agents-sdk. This example will demonstrate the ease of setting up an agent and running it with minimal code.

First, ensure you have the openai-agents-sdk installed. You can install it using pip:

pip install openai-agents-sdk        

You'll also need to set your OpenAI API key. It's recommended to use environment variables for this. Create a .env file in your project directory and add your API key:

OPENAI_API_KEY=your_openai_api_key        

Now, let's create a Python file named most_basic_agent.py with the following code:

from agents import Agent, Runner
from dotenv import load_dotenv
from agents import set_default_openai_key
import os

load_dotenv()

openai_api_key = os.environ.get("OPENAI_API_KEY")
set_default_openai_key(openai_api_key)
agent = Agent(
  name="Assistant", instructions="You are sassy code instructor", model="gpt-4o"
)

result = Runner.run_sync(agent, "Write a haiku about recursion in programming.")

print(result.final_output)        

In this code:

We import the necessary classes Agent and Runner from the agents module.

We use dotenv to load environment variables, and set_default_openai_key to set the API key for OpenAI requests.

We create an Agent instance named "Assistant". We provide it with instructions: "You are sassy code instructor" and specify the model as gpt-4o.

We use Runner.run_sync() to execute the agent synchronously with the input prompt: "Write a haiku about recursion in programming."

Finally, we print the final_output of the agent.

This basic agent doesn't use any tools and is designed for a simple task. You can visualize this basic agent setup as follows:

To run this agent, navigate to your project directory in the terminal and execute:

uv run most_basic_agent.py        

After running, the agent will generate a haiku about recursion and print it to the console.

Tracing Agent Execution

One of the powerful features of the OpenAI Agents SDK is built-in tracing. When you run an agent, execution details are automatically logged and can be viewed on the OpenAI platform.

To access these traces, go to platform.openai.com/traces. You'll need to be logged in to your OpenAI account. Here, you can find detailed information about each agent run, including:

Workflow: The sequence of steps the agent took.

Handoffs: If multiple agents are involved, you can see how tasks are transferred.

Tools: Which tools were used during execution.

Execution Time: Timestamps and durations for each step.

Inputs and Outputs: The prompts and responses at each stage.

For example, after running most_basic_agent.py, you can find a trace on the platform.

Enhancing Agents with Context and Tools

While the basic agent is a good starting point, real-world applications often require agents that can maintain context and utilize tools. Let's explore how to build a more advanced agent.

Consider an agent designed to be a "Founder Knowledge Assistant." This agent should be able to answer questions about company founders using a specialized database of founder articles and web search when necessary.

To implement this, we'll need:

Tools: Functions that allow the agent to access information. We'll use two tools:

search_founder_articles: To search a database of articles about founders.

tavily_search: To perform web searches using the Tavily search engine.

Context Management: A mechanism to maintain conversation history and track relevant information across interactions. We'll use a custom AgentContext class for this.

Let's look at the code for agent_with_state_or_context.py:

founder_agent = Agent[AgentContext](
    name="Founder Knowledge Assistant",
    instructions=FOUNDER_AGENT_INSTRUCTIONS,
    tools=[
        search_founder_articles,
        tavily_search,
    ],
    model="gpt-4o",
)


# --- Running the Agent ---
async def run_agent_with_query(query: str, context: AgentContext, verbose_logging: bool = False):
    """Run the agent with a query, using the provided context"""
    result = await Runner.run(
        starting_agent=founder_agent,
        input=query,
        context=context,
    )

    print("\n=== RESPONSE ===")
    print(result.final_output)

    if context.last_tool_used == "founder_articles":
        print("\n[Source: Retrieved from founder articles database]")
    elif context.last_tool_used == "tavily_search":
        print("\n[Source: Retrieved from web search]")


async def interactive_agent_loop(verbose_logging: bool = False):
    """Run the agent in an interactive loop with persistent context"""
    context = AgentContext()

    print("\n=== Founder Knowledge Assistant ===")
    print("Ask questions about founders. Type 'exit', 'quit', or 'q' to end the conversation.")
    print("Your context and conversation history will be maintained until you exit.")

    while True:
        query = input("\nYour question: ")
        if query.lower() in ["exit", "quit", "q"]:
            print("Ending conversation. Goodbye!")
            break

        await run_agent_with_query(query, context, verbose_logging)

        if verbose_logging:
            print("\n=== CURRENT CONTEXT STATE ===")
            print(f"Recent searches: {context.recent_searches}")
            print(f"Number of documents retrieved: {len(context.recent_documents)}")


async def main():
    await interactive_agent_loop(verbose_logging=True) # Set to True for verbose logging

if __name__ == "__main__":
    asyncio.run(main())        

Key improvements in this code:

AgentContext Class: This dataclass is used to maintain state across interactions. It tracks recent_searches, recent_documents, and last_tool_used.

Tool Integration: The founder_agent is now created with a list of tools: search_founder_articles and tavily_search. These are defined using the @function_tool decorator, making them easily accessible to the agent.

Complex Instructions: The FOUNDER_AGENT_INSTRUCTIONS provide detailed guidelines to the agent on how to use the tools, prioritize information sources, and format responses.

Interactive Loop: The interactive_agent_loop function sets up a conversational interface where the agent maintains context across multiple turns.

When you run agent_with_state_or_context.py, you can interact with the "Founder Knowledge Assistant." For example, asking "who is the ceo of openai" will trigger the agent to use the tools, first checking the founder articles database and then resorting to web search if needed. The agent will then provide a concise answer with source links, demonstrating its ability to use tools and maintain context.

Building Agents with Multiple Tools and Handoffs

The OpenAI Agents SDK truly shines when you need to build agents that utilize multiple tools and manage complex workflows. The "Founder Knowledge Assistant" example already demonstrates using two tools. Let's reiterate how the agent decides which tool to use.

In the FOUNDER_AGENT_INSTRUCTIONS, we explicitly guide the agent to:

Prioritize search_founder_articles: First, attempt to retrieve information from the founder articles database.

Evaluate Document Quality: Assess the retrieved documents for relevance, completeness, accuracy, and credibility.

Use tavily_search as fallback: If the founder articles are insufficient or outdated, use web search (tavily_search) for more current information.

This logic is embedded in the agent's instructions, allowing it to make informed decisions about tool selection. The SDK's "handoff" mechanism, while not explicitly demonstrated in a multi-agent handoff in this example, is implicitly used when the agent decides to switch from using search_founder_articles to tavily_search based on the quality of results.

Running Agents and Interactive Loops

We've already seen how to run a basic agent using Runner.run_sync(). For more complex agents and interactive sessions, the Runner.run() method (used asynchronously) and interactive loops are essential.

Runner.run() vs. Runner.run_sync():

Runner.run_sync(): Executes the agent workflow synchronously, blocking the execution until completion. Suitable for simple, non-interactive tasks.

Runner.run(): Executes the agent workflow asynchronously, allowing for non-blocking operations and better handling of I/O-bound tasks. Necessary for interactive applications and complex workflows.

Interactive Agent Loop: The interactive_agent_loop function in agent_with_state_or_context.py demonstrates how to create a persistent conversational session. It initializes an AgentContext once and reuses it across multiple queries, maintaining conversation history and context.

Within the interactive_agent_loop, the run_agent_with_query function is called for each user input. This function uses Runner.run() to execute the agent asynchronously and then prints the response and source information. The loop continues until the user types "exit," "quit," or "q."

This interactive loop structure is fundamental for building conversational AI applications where context persistence is crucial.

Conclusion: Unleashing the Power of Agent SDK

The OpenAI Agents SDK provides a robust and user-friendly platform for building sophisticated AI agents. By simplifying multi-agent interactions, offering built-in guardrails and lifecycle hooks, and enabling seamless tool integration, the SDK significantly streamlines agent development.

Key Takeaways:

Ease of Use: The SDK makes it surprisingly easy to create agents, even with advanced features like tool integration and context management.

Powerful Features: Built-in tracing, lifecycle hooks, and the "handoff" mechanism provide developers with fine-grained control and observability.

Context Management: The ability to maintain context across interactions is crucial for building engaging and coherent conversational agents.

Tool Integration: Agents can be easily extended with tools to access external information and perform actions, greatly expanding their capabilities.

As you continue to explore the OpenAI Agents SDK, consider these points:

Experiment with Different Tools: Explore and integrate various tools to tailor your agents to specific tasks and domains.

Refine Agent Instructions: Carefully craft agent instructions to guide behavior, tool selection, and response formatting.

Leverage Tracing for Debugging: Utilize the OpenAI platform's tracing tools to understand agent execution and troubleshoot issues.

Explore Lifecycle Hooks: Implement lifecycle hooks to add custom logic at different stages of agent execution, such as logging, monitoring, or dynamic context updates.

The OpenAI Agents SDK empowers you to build the next generation of intelligent AI agents. Start experimenting, building, and exploring the possibilities!

Shane Maley

The best way to grow your business is by sharing your expertise with Apprendo

1 周

so helpful, thanks @hai !

Chris Bartholomew

Want to build a RAG pipeline in few clicks? Contact me.

1 周

Nice work, Hai N.. Did you see anything in these announcements about an MCP-equivalent from OpenAI?

要查看或添加评论,请登录

Hai N.的更多文章

社区洞察