LLM Agents: Overview and implementation
What do we exactly mean by an agent? And how do we implement them? This is what we will explore in this article.
LLM agents is popular idea having emerged from the capabilities of LLMs. Let's start by defining what it its, and then show how to implement.
Definition of an LLM Agent
LLM Agent?= LLM + State + Tools
Agents can be triggered by manual user input or by external events. External events can for example be a change in a database, monitoring of an email inbox, or time-based trigger.
An autonomous agent is an agent that is triggered by an event other than a user's direct request. It gets particularly interesting when the output of the agent is used to trigger a future invocation of itself or another agent. This can be done by maintaining state in a database and scheduling triggers with something like a cron job based on the state and how it changes.
Types of agents
The above definition of an agent is very general. It can be applied to many different types of agents. In this section, we will explore some of the most common types of agents.
Stateless agents
Stateless agents do not have a state. They are triggered by an event and respond with a message or use a tool. They do not remember anything about previous events. They are useful for simple tasks that do not require any memory or ability to perform a sequence of actions.
Example: Agent that plays music based on a user's request.
Workflow agents
Workflow agents have a state and can perform a sequence of actions. But this sequence is fixed and does not depend on the state or the input. However, the actual tool use does depend on the state and the input. They are useful for tasks that require a sequence of actions but do not require any reasoning.
Example: A lead generation agent that scrapes linkedin, updates a google sheet, and sends personalized messages to the leads.
RAG is a workflow agent
A RAG system (retrieval augmented generation) can be seen as a workflow agent. It typically performs query rewrite, retrieves relevant documents, and then generates a response. More involved RAG systems have many more steps. This is the core of Sana AI, that I'm involved in building at Sana Labs. We have a very advanced RAG and are working on adding more general agent capabilities.
General recursive agents
A general recursive agent has a set of tools from with the agent recursively uses one tool until agent decides it is done with its goal/task. The agent has a state that is updated after each tool use. At each step the agent decides what tool to select based on its current state. The tools are used in sequence, but in the most general formulation the tools can be used in parallel.
General recursive agents are useful for tasks that require more flexibility and complex reasoning to perform multi-step actions. (But they don't really work, yet.)
Example: A human is a general agent. A general LLM agent could be a virtual assistant, capable of performing a wide range of tasks.
领英推荐
Why don't we see more general agents in practice?
Todays state-of-the-art LLMs (GPT4) are simply not good enough in their reasoning capabilities to perform correct tool use when the state is complex or requires a clever combination of tools.
Most agents we encounter in practice are therefore stateless or workflow agents. The release of GPT5 will likely change this.
Implementation
Let's build a general agent to demonstrate the fundamental concepts. We will do this using the OpenAI Assistant API which gives nice abstractions for the LLM, managing state, and tool selection. Running the tool is of course separate.
LLM agents use tools, but the tools are external
A common confusion is that the agent itself somehow runs the tools. This is not the case. The agent is simply an LLM knowing what tools it has available; described in text in its prompt. The LLM can simply asks for a tool to be used. A tool is of course just software that is being run. We as engineers built he system that does the actual execution of the tools and return the tool output to the LLM agent.
The OpenAI Assistant API
Think of the Assistant API as the Text Generation API and Function Calling API combined with a state. This state is called Threads in the Assistant API.
Practical use of agents in production typically involves combining these APIs manually and managing state in a database to get more control of the setup. But for this article, we will use the Assistant API to keep it simple and focus on demonstrating the core concepts.
Code
See the GitHub repository for full code and runnable example. It's about 150 lines of code to orchestrate the agent and to build a CLI for interacting with it.
The agent is hooked up with some simple tools: getCurrentLocation, getCurrentWeather, playMusic. Consider the user request "Play music that fit's the mood of the weather". For this, the agent needs to know the current location, based on which it can get the the current weather, which it can use to select a relevant song, and then use the playMusic tool.
Example output:
> Play music that fit's the mood of the weather
<< Agent requests function calls: [ getCurrentLocation({}) ]
>> Submitting function outputs: [ "Stockholm, SE" ]
<< Agent requests function calls: [ getCurrentWeather({"location":"Stockholm, SE"}) ]
>> Submitting function outputs: [ "?? -7°C" ]
<< Agent requests function calls: [ playMusic({"songName":"Winter Winds","artistName":"Mumford & Sons"}) ]
>> Submitting function outputs: [ "Playing Winter Winds by Mumford & Sons" ]
< I've set "Winter Winds" by Mumford & Sons to play, which should match the chilly and snowy mood of the weather in Stockholm. Enjoy the music! ????
Here we can see the agent reasoning about the weather and selecting a song that fits the mood of the weather. The agent is general in the sense that the tools runs in multiple steps and depend on both the input and the state.
Conclusion
We've seen how LLM agents can be implemented and what they can be used for. We introduced the distinction of stateless agents, workflow angents, and general agents, based on their capabilities and complexity.
We also saw that state-of-the-art LLMs (GPT4) are not good enough to build general agents. But this will likely change with the release of GPT5 later this year.
Prediction: 2024 is the year of the agents. We will start seeing real agents in production for a range of use cases.
AI Tech Lead | Driving Transformative GenAI Strategies & Industrial-Scale Solutions | Innovating with Multi-Agentic Flows & AI Agents | Enhancing Developer Experience
1 年Thanks also for a great talk yesterday on the workshop!
Director GTM, Product AI @Sana.ai
1 年Great explanation Viktor!
PhD. Tech Lead @ All Ears. Creator of metacurate.io
1 年Thanks for sharing ! The GitHub repo link is a 404. I believe this is the one you intended to post? https://github.com/ViktorQvarfordt/LLM-Agent-Demo
Predli | AI Consulting Studio
1 年Thanks for a great agent deepdive session Viktor & Anton! Exciting times ahead to get agents more reliable and efficient to run in production systems.
VP of Engineering at Sana
1 年Check out https://Sana.AI/ to see what I'm building at Sana. We're hiring for senior engineers with a passion for the craft. Open positions across the stack: All the way from from ML infra to frontend.