Plan and Execute Agents with Chain of Reasoning: An Improved Approach to Agentic Systems
Ajith Aravind
GenAI Solution Architect: React | Python | Langchain | Javascript | Node.js | Blockchain | Generate exponential value using NextGen technologies
#langchain #openai #anthropic #langgraph #agents #agenticsystems
In a previous post, we looked at a simple agentic system. For a quick refresher, check out this LinkedIn post.
Even though this agent works for simple use cases, it has serious limitations:
So, how could we address these limitations? Let's tackle them one step at a time.
The Reasoning Debate
Let's start with point 1 - can LLMs really reason and plan? This is a hotly debated topic with two schools of thought:
Honestly, I don't know the definitive answer. But for our purposes, we don't need to worry about it as long as we get a solid plan, whether it's through reasoning or not.
Taking Control with LangGraph
The second part is easier to address - we as developers can take the driver's seat and control the execution of the plan to a great extent. This is where LangGraph comes in. LangGraph is a library that allows us to engineer our process flow, which is super handy for creating and managing complex workflows in language models. It gives us fine-grained control over how our agent operates.
Enabling Human Intervention
Now onto the third point - human intervention. LangGraph helps us intervene before or after the execution of a node (think of a node as a function in your code). This means we can check and adjust the process at various stages, ensuring the agent stays on track.
The Challenge of Planning
If you've dabbled with agentic systems, you've probably noticed that planning is much more difficult than execution. Let me explain with an example:
Let's say I want an agentic system to come up with a plan to buy a lens for my camera. At a high level, this involves creating a list of steps (let's call it the Plan) and then executing each step until we get the final outcome.
Here's where it gets tricky. The LLM has to factor in a lot of things:
It's not as easy as it might sound at first. We can't simply assume that the LLM will magically come up with a plan that's perfectly personalized for me.
The Solution: A Two-Pronged Approach
If you've read this far, two things should be clear:
I've put this approach into action, and I'm super impressed with the results. The plans it generates are so comprehensive that even as a photographer myself, I wouldn't have considered all those aspects if someone had asked me the same question
How It Works: The Plan Generation Process
The approach to generating the Plan is highly inspired by Professor Synapse's method. It's a novel way of prompting that you can check out here.
With this approach, we're essentially forcing the LLM to engage in "System 2 thinking" - slow, deliberate, and conscious. We design our prompts to make the LLM do things it might not do otherwise.
In this example, I asked the LLM to generate the following outputs:
领英推荐
Here's the kicker: I don't actually use any of these outputs except the Plan. But by forcing the LLM to generate all of this additional information, I end up with a much better plan. It's as if the process of considering all these factors leads to a more thorough and well-thought-out plan.
We can then iterate over this plan with the user and only proceed to execution if the user approves the final plan.
A Neat Trick: Complement Plan with Execution approach
Here's another clever technique we can use: along with the plan, we ask the LLM to come up with an approach for executing each step.
For example, let's say one step in our plan is "Find whether the shortlisted lenses have weather sealing". The execution approach for this step might be "Do a Google search with keywords 'Weather sealing' or 'extreme weather handling' in reputable photography forums only".
Is this necessary? Not always. But remember, wherever we can intervene and take control from the LLM, it's generally a good thing. These 'approaches' supporting the steps in a plan will be useful for another LLM that's going to execute these steps down the line. This second LLM doesn't have to worry about coming up with search queries - it's already been done.
This setup is particularly useful when we want to use a powerful LLM like GPT-4o or Claude Sonnet to come up with the Plan, and then let a less powerful (and less expensive) model like GPT-4-mini execute the plan. As a bonus, this approach can also help reduce costs.
User Intervention: Making It Personal
Now, let's talk about user intervention. The session starts with a requirement gathering process that continues in a loop until the user types "quit" or "exit". This is incredibly valuable as it gives the LLM lots of context about what exactly the user is trying to achieve.
There's another aspect of human intervention too - during plan execution, there could be steps that need human input. In our lens buying example, a critical input might be "Do you want a third-party tripod collar?" This kind of specific question might not have been covered during the initial requirement gathering process.
Key technical elements
To implement this Chain of Reasoning approach with human intervention, we use the following technical components:
LangGraph: We create a 4-node graph for:
Create React Agent: We use create react agent for execution of each step in our plan. Create React Agent is a pre built agent that allows us to create autonomous agents ideal for executing steps that involve web searches or other tools.
FastAPI Websocket: This handles communication between the front end and backend, facilitating real-time human intervention. When the system needs user input, it can send a request through the websocket, and receive the user's response in real-time, allowing for seamless integration of human intelligence into the process.
React Front End: Our user interface is built with React, providing a responsive and interactive experience for users to input requirements, review plans, and provide necessary interventions throughout the process.
Wrapping Up
Based on my experiments with different Plan and Execute methods, I've gotten very good results with this Chain of Reasoning approach combined with human intervention. I hope this motivates you to try it out yourself!
Remember, the key take aways are:
By combining these elements, we can create agentic systems that are more robust, more personalized, and ultimately more useful. Happy experimenting!