Plan and Execute Agents with Chain of Reasoning: An Improved Approach to Agentic Systems

Plan and Execute Agents with Chain of Reasoning: An Improved Approach to Agentic Systems

#langchain #openai #anthropic #langgraph #agents #agenticsystems

In a previous post, we looked at a simple agentic system. For a quick refresher, check out this LinkedIn post.

Even though this agent works for simple use cases, it has serious limitations:

  1. We're letting the Large Language Model (LLM) do all the reasoning to come up with a plan.
  2. The LLM is also in control of how this plan should be executed.
  3. There's no human intervention along the process; once the agent is fired up, we wait for the end result.

So, how could we address these limitations? Let's tackle them one step at a time.

The Reasoning Debate

Let's start with point 1 - can LLMs really reason and plan? This is a hotly debated topic with two schools of thought:

  • One camp argues that LLMs aren't really reasoning, they're just regurgitating from memory in a way that appears to be a plan.
  • The second camp believes LLMs can indeed reason.

Honestly, I don't know the definitive answer. But for our purposes, we don't need to worry about it as long as we get a solid plan, whether it's through reasoning or not.

Taking Control with LangGraph

The second part is easier to address - we as developers can take the driver's seat and control the execution of the plan to a great extent. This is where LangGraph comes in. LangGraph is a library that allows us to engineer our process flow, which is super handy for creating and managing complex workflows in language models. It gives us fine-grained control over how our agent operates.

Enabling Human Intervention

Now onto the third point - human intervention. LangGraph helps us intervene before or after the execution of a node (think of a node as a function in your code). This means we can check and adjust the process at various stages, ensuring the agent stays on track.

The Challenge of Planning

If you've dabbled with agentic systems, you've probably noticed that planning is much more difficult than execution. Let me explain with an example:

Let's say I want an agentic system to come up with a plan to buy a lens for my camera. At a high level, this involves creating a list of steps (let's call it the Plan) and then executing each step until we get the final outcome.

Here's where it gets tricky. The LLM has to factor in a lot of things:

  • The steps need to be in a logical, sequential order (e.g., research reviews before finalizing a lens).
  • It needs to understand the context of what I want to do with my new lens (Am I planning to use this lens handheld or on a tripod?).
  • And many more factors...

It's not as easy as it might sound at first. We can't simply assume that the LLM will magically come up with a plan that's perfectly personalized for me.

The Solution: A Two-Pronged Approach

If you've read this far, two things should be clear:

  1. We need a mechanism for the LLM to come up with a solid plan.
  2. We also need a way to interact with the LLM to give our inputs along the way.

I've put this approach into action, and I'm super impressed with the results. The plans it generates are so comprehensive that even as a photographer myself, I wouldn't have considered all those aspects if someone had asked me the same question

How It Works: The Plan Generation Process

The approach to generating the Plan is highly inspired by Professor Synapse's method. It's a novel way of prompting that you can check out here.

With this approach, we're essentially forcing the LLM to engage in "System 2 thinking" - slow, deliberate, and conscious. We design our prompts to make the LLM do things it might not do otherwise.

In this example, I asked the LLM to generate the following outputs:

  • Main Objective
  • User Preferences
  • Research Focus
  • Potential Outcomes
  • Strategy Adjustments
  • Required Expertise
  • High Level Strategy
  • The Plan

Here's the kicker: I don't actually use any of these outputs except the Plan. But by forcing the LLM to generate all of this additional information, I end up with a much better plan. It's as if the process of considering all these factors leads to a more thorough and well-thought-out plan.

We can then iterate over this plan with the user and only proceed to execution if the user approves the final plan.

A Neat Trick: Complement Plan with Execution approach

Here's another clever technique we can use: along with the plan, we ask the LLM to come up with an approach for executing each step.

For example, let's say one step in our plan is "Find whether the shortlisted lenses have weather sealing". The execution approach for this step might be "Do a Google search with keywords 'Weather sealing' or 'extreme weather handling' in reputable photography forums only".

Is this necessary? Not always. But remember, wherever we can intervene and take control from the LLM, it's generally a good thing. These 'approaches' supporting the steps in a plan will be useful for another LLM that's going to execute these steps down the line. This second LLM doesn't have to worry about coming up with search queries - it's already been done.

This setup is particularly useful when we want to use a powerful LLM like GPT-4o or Claude Sonnet to come up with the Plan, and then let a less powerful (and less expensive) model like GPT-4-mini execute the plan. As a bonus, this approach can also help reduce costs.

User Intervention: Making It Personal

Now, let's talk about user intervention. The session starts with a requirement gathering process that continues in a loop until the user types "quit" or "exit". This is incredibly valuable as it gives the LLM lots of context about what exactly the user is trying to achieve.

There's another aspect of human intervention too - during plan execution, there could be steps that need human input. In our lens buying example, a critical input might be "Do you want a third-party tripod collar?" This kind of specific question might not have been covered during the initial requirement gathering process.

Key technical elements

To implement this Chain of Reasoning approach with human intervention, we use the following technical components:

LangGraph: We create a 4-node graph for:

  • Requirement gathering
  • Creating the plan
  • Executing the plan
  • Generating the final output This structure allows us to control the flow of our agentic system, enabling the step-by-step process we described earlier.

Create React Agent: We use create react agent for execution of each step in our plan. Create React Agent is a pre built agent that allows us to create autonomous agents ideal for executing steps that involve web searches or other tools.

FastAPI Websocket: This handles communication between the front end and backend, facilitating real-time human intervention. When the system needs user input, it can send a request through the websocket, and receive the user's response in real-time, allowing for seamless integration of human intelligence into the process.

React Front End: Our user interface is built with React, providing a responsive and interactive experience for users to input requirements, review plans, and provide necessary interventions throughout the process.

Wrapping Up

Based on my experiments with different Plan and Execute methods, I've gotten very good results with this Chain of Reasoning approach combined with human intervention. I hope this motivates you to try it out yourself!

Remember, the key take aways are:

  1. A structured planning process that forces the LLM to consider multiple factors
  2. Use a powerful LLM to do the planning and a less powerful LLM to execute the plan
  3. Assist the less powerful LLM in execution by generating execution approach for each step
  4. Multiple points for human intervention and feedback

By combining these elements, we can create agentic systems that are more robust, more personalized, and ultimately more useful. Happy experimenting!




要查看或添加评论,请登录

Ajith Aravind的更多文章

社区洞察

其他会员也浏览了