登录查看更多内容

Building an AI agent

Ashwin Limaye

Building something new | Former [Startup CPO, Product@Alphabet, McKinsey] | CS@IITBombay

发布日期: 2024年9月15日

Lately, our news and LinkedIn feeds are full of talk of “AI agents”, promising sliced-bread level of improvements to everything software. Intrigued (and definitely a tad skeptical), Rafal Kapela and I embarked on our own agent-building journey, to see what all the fuss was about. This essay chronicles our experience, both technical observations and some product "aha" moments we encountered along the way.

What is an agent?

First, let's demystify the term "agent." Simply put, it's a new (or fancy, if you will) way to say “a software program that creates a guided and controlled conversation with a Large Language Model (LLM) like ChatGPT, and other software”. If you've ever used ChatGPT to write an article or an email by asking it to write something, and then refining its output through successive chat messages (a.k.a. prompts), congratulations! You've already experienced the power of an "agentic flow."?

Building and agent = writing a program that creates a structured back-and-forth, whereby the program takes the LLM's output, processes it, and uses it to formulate the next prompt to the LLM, or a step not involving an LLM, such as performing calculations, searching the web, or accessing external data (called “tools”). So it’s just a new way to write software, and because we have an LLM in the mix, it is more powerful, less predictable, and more conducive to writing blog posts about and creating startups out of ;)

If you’ve never written software, or dealt with a software dev team, here’s one way to visualize an agent: Imagine a highly motivated intern with a large rote memory and unwavering focus. They are eager to please but need explicit, step-by-step instructions to complete any task. How detailed depends on how complex the task.Oh, and their judgment is still under development, so you wouldn't trust them with critical decision-making (yet!).

Our toy project: An Essay-Writing Agent

For our foray into agent creation, we chose a familiar task: writing an essay. It felt like a manageable first project, seemingly just a few coding steps away from our everyday ChatGPT interactions. You know, like creating emails with our POV on some long article a co-worker shared on a mailing list that also has our manager on it.

To build our agent, we broke the essay-writing process into discrete stages:

Plan: Given the essay topic, come up with the key points to make.
Research: Gather relevant information and supporting evidence for these points.
Draft: Create the first draft of the essay.
Review: Evaluate the draft for clarity, coherence, accuracy, style, etc.
Edit: Incorporate feedback and refine the essay.
Iterate: Review and again, and Edit again, until satisfied.

Thinking in Graphs

As it turns out, our boxes-and-arrows workflow diagram isn't just a visual aid; it defines the program structure of an agent perfectly. And LangGraph, an open source framework for building agent applications, utilizes the concept of graphs like this diagram to represent agent workflows and make it easy to code them up.

For those interested a little in the details, here are the core components of these graphs:

Nodes: Each node in a graph represents a distinct step in the agent's workflow (e.g., "Plan," "Research," "Review"). START and END are special nodes that… well, tell the program where to start and end.?
Edges: Edges connect these nodes, defining the flow of information and sequence of actions. For example, in the graph above, the “Draft” step is followed by the “Review” step, and typically, the output of the "Draft" node would be fed as one of the inputs to the "Review" node as well.?
State: State refers to all the things that we want to keep track of e.g. The user input topic, a “current draft” of the essay, a list of snippets of info from an internet search, etc. Back to our intern analogy, this is everything you want the intern to be tracking, either in their head or on paper.

Using LangGraph, we could translate our diagram above into functional working python code, with each stage represented as a node. Each node is just a function, that deals with:

Input: The information required to execute the task (e.g., essay topic, research keywords).
Output: The result of the task (e.g., an outline, a paragraph of text).
State: The ability to store intermediate results, and track overall progress of our agent.?

From Concept to Code and the messy reality

With our LangGraph agent designed, it seemed straightforward to bring our agent to life through code. And yes, there was sample code available online too, so we didn’t have to actually start from scratch. But we chose to, because it seemed simple and we know we’ll want to experience the pain in small steps ?? . Also, we wanted to be able to create an actual web-app where anyone could run this agent, not just a proof-of-concept that ran on our own machines (for that, there are Juypter notebooks with working code available a short Perplexity search away. But what’s the fun in that!). So here we are.

And as with any software project, this too came with its fair share of setup grunge work and hurdles. From a cold start (i.e. a mac-mini with only default software dev packages, we went through installing homebrew, pip, 3 versions of python, some click-and-shoot Docker container setups, LLM API keys, and had something that worked. As a side note Anthropic and OpenAI are hugely ?easier to use than Google’s Gemini. It’s almost like Google doesn’t want developers to quickly get going ??. For folks intending to get into this seriously, we highly recommend going through the basic documentation of the LangGraph library. Some concepts (like reducers) can be tricky and are very important to understand before starting to work on AI agents even on a basic level [1], [2].

Dialpad 1 年前

A use case for AI that I enjoy a lot

Jan Tissler 1 个月前

World's 1st AI Software Engineer, ChatGPT with body…

Asavari Moon 6 个月前

Further on, we wired up LangSmith (a platform for debugging and monitoring agents) and LangGraph Studio (a visual tool for building LangChain graphs, currently working only on Mac). While powerful, these tools came with learning curves of their own. We think it’s going to be worth it. LangSmith is quite flaky though, and every time it updates it makes me feel like the code will break. But, as with any new project, these issues should settle down with time.

Showtime!

After all the above setup, we finally had a running agent! With anticipation and ??, we hit the "execute" button in LangGraph Studio. The graph sprang to life, nodes lighting up sequentially as our agent diligently progressed through each stage of the essay-writing process.

Seeing our creation come alive was satisfying, but the real work was just beginning.

The first essays produced by our agent, while structurally sound, were far from Pulitzer-worthy. This is where the iterative nature of agent development comes into play. We experimented with different prompts, fiddled with agent state variables and steps, and explored competing LLM providers and configurations. Each tweak aimed to enhance the quality of our agent's output, focusing on three key areas:

Improved Planning: How do we generate more creative and insightful essay outlines?
Enhanced Drafting: How could we push the LLM to produce more engaging and well-argued paragraphs?
Sharper Critique: Can we develop more nuanced and insightful feedback mechanisms within the agent's review process?

Each node in our graph is a playground for experimentation, offering opportunities to inject domain knowledge, refine instructions, and leverage the ever-evolving capabilities of LLMs.?

What’s next??

We have in mind a few other aspects of making this agent more useful, like?

Style and Voice: What, or who, should the output sound like? Today, the outputs are still disappointingly smelling-like-LLM. What kind of work would allow us to style-transfer from our own writing, or that or someone we admire?
The Stopping Problem: Our agent can critique whatever draft it is given. Like an employee out of Dilber’s office (or yours), we seem to have transferred the human-like capability to “give inputs” on anything. In real-life situations, we stop when we think something is good enough, or when the deadline approaches, or when people are sick of meeting on this topic one more time. What’s the equivalent in agent-speak? How might we formulate a stopping problem for our program?

References:

[1] Low Level LangGraph concepts https://langchain-ai.github.io/langgraph/concepts/low_level/#reducers?

[2] LangGraph: Context Objects in State https://langchain-ai.github.io/langgraph/how-tos/state-context-key/?

That’s all for now! Here’s a picture of Niki and Nora playing cat-in-the-bag, as a thank you for reading this far.

Adam Martin

CTO/VPEng | AI, Data, Cloud, Graphics/VR | C#/Java/Python/Typescript/C++

1 周

My first attempt at getting the AI to do all of this (including all your preparation work :)) was too prescriptive; tried again with a simpler prompt and ... its pretty good. Then I asked it to write the Terraform scripts and figure out LangSmith/Studio - although it bailed a little on the latter, and I think it oversimplified the Python install process ;). But ... Chat here: https://chatgpt.com/share/66f18e5e-46a4-8001-9179-029469e3b690 Diagram it produced (no editing from me): attached.

1 次回应

Adam Martin

CTO/VPEng | AI, Data, Cloud, Graphics/VR | C#/Java/Python/Typescript/C++

1 周

Did you consider going via LLM instead of making the initial graph yourself - i.e. focus on the absolute minimum of critical info you need to provide, and aim to have the whole agentic process run with the rest? e.g. initial prompt: "Imagine you're a lecturer in good essay design. Define a structured process of 4-8 sequential steps to create an essay. Provide output in langchain format." And then further down the process (these days I'd rather slap myself in the face than go through the pain of manual python setup (still a poor system/design even after many years)) ... so I'd be prompting it for a DevOps setup for that - something like: "As an expert in DevOps, specifically [insert your preferred provider here], provide a [Terraform or your pref] script and any additional necessary scripts to configure a fully working LangChain setup [etc etc]".

查看更多评论

要查看或添加评论，请登录

查看全部

Building an AI agent

Ashwin Limaye

Building something new | Former [Startup CPO, Product@Alphabet, McKinsey] | CS@IITBombay

What is an agent?

Our toy project: An Essay-Writing Agent

From Concept to Code and the messy reality

领英推荐

Showtime!

What’s next??

更多精彩文章

社区洞察

其他会员也浏览了

Advanced Prompt Engineering with ChatGPT Frameworks

?? Build a custom Chat-GPT (without any technical skill)

Key Elements of a Well-Structured Prompt

What ChatGPT is. What it Isn't. And why that Matters.

Leveraging ChatGPT for Business Improvement: A Guide for Local Businesses

Navigating the Future: Making Money Online with AI and the Power of Prompt Engineering

Wondrous Tools Often Don’t Replace The Person Wielding It

Building meaningful software in the age of AI

?What is the difference between GPT store and chatGPT plugins?

The Custom GPT Problem

What is an agent?

Our toy project: An Essay-Writing Agent

From Concept to Code and the messy reality

领英推荐

Showtime!

What’s next??

Preview: Putting together an AI-101

2024年7月13日

Reboot

2023年7月16日

The last post: end of sabbatical

2022年7月3日

Month 3+4! Time flies, some ideas progress

2022年5月3日

Week 7+8: A plan for a plan

2022年3月8日

Week 5+6: Steady state chill

2022年2月22日

Week #3+4: A podcast, rethinking productive, and saying no

2022年2月5日

Week 2: Un-TODO-list

2022年1月22日

Week #1: New ideas

2022年1月13日

Week #1: No default plan

2022年1月10日

社区洞察

其他会员也浏览了

Advanced Prompt Engineering with ChatGPT Frameworks

?? Build a custom Chat-GPT (without any technical skill)

Key Elements of a Well-Structured Prompt

What ChatGPT is. What it Isn't. And why that Matters.

Leveraging ChatGPT for Business Improvement: A Guide for Local Businesses

Navigating the Future: Making Money Online with AI and the Power of Prompt Engineering

Wondrous Tools Often Don’t Replace The Person Wielding It

Building meaningful software in the age of AI

?What is the difference between GPT store and chatGPT plugins?

The Custom GPT Problem