Building an AI agent
Ashwin Limaye
Building something new | Former [Startup CPO, Product@Alphabet, McKinsey] | CS@IITBombay
Lately, our news and LinkedIn feeds are full of talk of “AI agents”, promising sliced-bread level of improvements to everything software. Intrigued (and definitely a tad skeptical), Rafal Kapela and I embarked on our own agent-building journey, to see what all the fuss was about. This essay chronicles our experience, both technical observations and some product "aha" moments we encountered along the way.
What is an agent?
First, let's demystify the term "agent." Simply put, it's a new (or fancy, if you will) way to say “a software program that creates a guided and controlled conversation with a Large Language Model (LLM) like ChatGPT, and other software”. If you've ever used ChatGPT to write an article or an email by asking it to write something, and then refining its output through successive chat messages (a.k.a. prompts), congratulations! You've already experienced the power of an "agentic flow."?
Building and agent = writing a program that creates a structured back-and-forth, whereby the program takes the LLM's output, processes it, and uses it to formulate the next prompt to the LLM, or a step not involving an LLM, such as performing calculations, searching the web, or accessing external data (called “tools”). So it’s just a new way to write software, and because we have an LLM in the mix, it is more powerful, less predictable, and more conducive to writing blog posts about and creating startups out of ;)
If you’ve never written software, or dealt with a software dev team, here’s one way to visualize an agent: Imagine a highly motivated intern with a large rote memory and unwavering focus. They are eager to please but need explicit, step-by-step instructions to complete any task. How detailed depends on how complex the task.Oh, and their judgment is still under development, so you wouldn't trust them with critical decision-making (yet!).
Our toy project: An Essay-Writing Agent
For our foray into agent creation, we chose a familiar task: writing an essay. It felt like a manageable first project, seemingly just a few coding steps away from our everyday ChatGPT interactions. You know, like creating emails with our POV on some long article a co-worker shared on a mailing list that also has our manager on it.
To build our agent, we broke the essay-writing process into discrete stages:
Thinking in Graphs
As it turns out, our boxes-and-arrows workflow diagram isn't just a visual aid; it defines the program structure of an agent perfectly. And LangGraph, an open source framework for building agent applications, utilizes the concept of graphs like this diagram to represent agent workflows and make it easy to code them up.
For those interested a little in the details, here are the core components of these graphs:
Using LangGraph, we could translate our diagram above into functional working python code, with each stage represented as a node. Each node is just a function, that deals with:
From Concept to Code and the messy reality
With our LangGraph agent designed, it seemed straightforward to bring our agent to life through code. And yes, there was sample code available online too, so we didn’t have to actually start from scratch. But we chose to, because it seemed simple and we know we’ll want to experience the pain in small steps ?? . Also, we wanted to be able to create an actual web-app where anyone could run this agent, not just a proof-of-concept that ran on our own machines (for that, there are Juypter notebooks with working code available a short Perplexity search away. But what’s the fun in that!). So here we are.
And as with any software project, this too came with its fair share of setup grunge work and hurdles. From a cold start (i.e. a mac-mini with only default software dev packages, we went through installing homebrew, pip, 3 versions of python, some click-and-shoot Docker container setups, LLM API keys, and had something that worked. As a side note Anthropic and OpenAI are hugely ?easier to use than Google’s Gemini. It’s almost like Google doesn’t want developers to quickly get going ??. For folks intending to get into this seriously, we highly recommend going through the basic documentation of the LangGraph library. Some concepts (like reducers) can be tricky and are very important to understand before starting to work on AI agents even on a basic level [1], [2].
领英推荐
Further on, we wired up LangSmith (a platform for debugging and monitoring agents) and LangGraph Studio (a visual tool for building LangChain graphs, currently working only on Mac). While powerful, these tools came with learning curves of their own. We think it’s going to be worth it. LangSmith is quite flaky though, and every time it updates it makes me feel like the code will break. But, as with any new project, these issues should settle down with time.
Showtime!
After all the above setup, we finally had a running agent! With anticipation and ??, we hit the "execute" button in LangGraph Studio. The graph sprang to life, nodes lighting up sequentially as our agent diligently progressed through each stage of the essay-writing process.
Seeing our creation come alive was satisfying, but the real work was just beginning.
The first essays produced by our agent, while structurally sound, were far from Pulitzer-worthy. This is where the iterative nature of agent development comes into play. We experimented with different prompts, fiddled with agent state variables and steps, and explored competing LLM providers and configurations. Each tweak aimed to enhance the quality of our agent's output, focusing on three key areas:
Each node in our graph is a playground for experimentation, offering opportunities to inject domain knowledge, refine instructions, and leverage the ever-evolving capabilities of LLMs.?
What’s next??
We have in mind a few other aspects of making this agent more useful, like?
References:
[1] Low Level LangGraph concepts https://langchain-ai.github.io/langgraph/concepts/low_level/#reducers?
[2] LangGraph: Context Objects in State https://langchain-ai.github.io/langgraph/how-tos/state-context-key/?
That’s all for now! Here’s a picture of Niki and Nora playing cat-in-the-bag, as a thank you for reading this far.
CTO/VPEng | AI, Data, Cloud, Graphics/VR | C#/Java/Python/Typescript/C++
1 周My first attempt at getting the AI to do all of this (including all your preparation work :)) was too prescriptive; tried again with a simpler prompt and ... its pretty good. Then I asked it to write the Terraform scripts and figure out LangSmith/Studio - although it bailed a little on the latter, and I think it oversimplified the Python install process ;). But ... Chat here: https://chatgpt.com/share/66f18e5e-46a4-8001-9179-029469e3b690 Diagram it produced (no editing from me): attached.
CTO/VPEng | AI, Data, Cloud, Graphics/VR | C#/Java/Python/Typescript/C++
1 周Did you consider going via LLM instead of making the initial graph yourself - i.e. focus on the absolute minimum of critical info you need to provide, and aim to have the whole agentic process run with the rest? e.g. initial prompt: "Imagine you're a lecturer in good essay design. Define a structured process of 4-8 sequential steps to create an essay. Provide output in langchain format." And then further down the process (these days I'd rather slap myself in the face than go through the pain of manual python setup (still a poor system/design even after many years)) ... so I'd be prompting it for a DevOps setup for that - something like: "As an expert in DevOps, specifically [insert your preferred provider here], provide a [Terraform or your pref] script and any additional necessary scripts to configure a fully working LangChain setup [etc etc]".