登录查看更多内容

Anthropic Agentic Systems - #5. Evaluator-Optimizer

Vikram Ekambaram

发布日期: 2025年2月20日

Anthropic Agentic Systems: A Five-Part Exploration is sponsored by Agent.ai - Discover, connect with and hire AI agents to do useful things.

The world of GenAI has seen big step changes since ChatGPT was released in November of 2022 (yes it's just a 2 year old). 2023 was the year of Chatbots, Copilots and Assistants. 2024 was the year that Agents and Agentic Systems broke on to the scene. 2025 is the year that Reasoning models are becoming the norm. The term LRM (Large Reasoning Model) is being thrown around.

In AI, "LRM" stands for "Large Reasoning Model," referring to a type of artificial intelligence model that is specifically designed to perform complex reasoning tasks, going beyond simple text generation to analyze situations, deduce logic, and make informed decisions, mimicking human-like thinking abilities more closely than traditional language models.?

Just a short while ago that "simple Text, Image and Audio Generation" aka Generative AI was mind blowing. That it generated a Valentine's day Haiku in 2023 was a cause for wonder and now we take it for granted and call it simple. In fact LLMs are now "traditional" and not cool anymore!!!

In fact OpenAI announced their roadmap recently and the key takeaway was that Reasoning Models (what they call Simulated Reasoning - SR models) will not be separate and exclusive, but part of the core offering and they will simplify to one offering.

After that, GPT-5 will be a system that brings together features from across OpenAI's current AI model lineup, including conventional AI models, SR models, and specialized models that do tasks like web search and research. "In both ChatGPT and our API, we will release GPT-5 as a system that integrates a lot of our technology, including o3,

It's a long preamble to introduce our 5th pattern in Anthropic's 5 patterns of Agentic Systems. As, this pattern gets into what is happening under the covers with these Reasoning models. To recap the 5 patterns -

We are going to talk about the Evaluator-Optimizer pattern in this article and hopefully it provides some insights into how the reasoning models are able to review their output and then iterate on them to make them better. In fact by combining Orchestrator-Workers and Evaluator-Optimizers one can build their own Chain of Thought reasoning.

In the evaluator-optimizer workflow, one LLM call generates a response while another provides evaluation and feedback in a loop. This workflow is particularly effective when we have clear evaluation criteria, and when iterative refinement provides measurable value.

As you can see in the visual above, we have an LLM that does some work, generates an output that is then passed to an Evaluator that evaluates the output and if it is acceptable it moves on, else it is sent back with Feedback and then a new output is generated and evaluated again.

There are 2 important pieces to the evaluator - It needs a good evaluation criteria and it needs to give clear and actionable feedback that can be used by the Regenerate activity to produce a better output the second time.

To showcase this pattern in action, I will keep with the Superhero Bio theme where I take a LinkedIn profile and generate a Superhero Bio agent in Agent.ai. The one difference here is that the Bio that is generated is passed to an evaluator step that rates it and generates feedback, if it is not good it is passed back and iterated on. Here is a visual of this in action.

You can see the Loop that is the evaluator-optimizer. This is implemented as a For loop in Agent.ai with an embedded IF loop to check the rating that the evaluator generates. We stay in this loop (upto a max of 3 times) till the output is highly 'rated' - >=4 and once that is achieved, we exit the loop.

Here is how the agent is implemented -

In the above agent

Step 3 generates the initial Superhero Bio from the LinkedIn profile.

Step 5 is the For Loop (set to a max of 3)
Step 6 is the evaluator (generates feedback and a rating)
Step 8 is an IF condition that checks if the rating was >=4, which means the Evaluator accepted the Bio or < 4, which means it needs rework.
Step 9 (which we get to only if the evaluator declined the output) takes the feedback and regenerates the output and we then loop again.

What does the final output look like (this is a very corny example, but I wanted something silly that people can relate to)

Here is the first attempt at generating the bio

Here is the feedback and rating (3)

Here is the prompt I wrote to evaluate the bio. As you can see this is a very simple evaluator. To make it really good, one would give it much more specific instructions on what to evaluate it on.

You are a Marvel Comics superhero expert with a deep understanding of what makes a legendary origin story. You have been given a Superhero Biography, generated from a person's LinkedIn profile, and your mission is to critically evaluate it.

Your review should be extremely detailed and honest—judging whether the bio captures the excitement, depth, and grandeur worthy of a true Marvel hero. Rate the bio on a scale of 1-5:

5: A masterpiece—Stan Lee himself would be proud!
1: A disaster—this needs a complete overhaul.
In addition to the rating, provide constructive, specific and actionable feedback that helps refine and elevate the bio, making it more compelling. Do not hesitate to ask for a rewrite if that is needed.

Generate a JSON object called "review" with the following attributes. Return just the JSON object and nothing else:

"rating": The score (1-5) based on quality.
"feedback": Specific and actionable advice for improving the bio.
Example JSON Output:

"review": {
  "rating": "the rating",
  "feedback": "the feedback"
}
Now, analyze the following Superhero Biography and generate your review:

Superhero Bio:
{{out_bio}}

Here is the final bio that was accepted by the Evaluator. Much more detailed and the story is more built out. It will not win any prizes and as you all know the quality of the output is based on the quality of the prompting.

While LRM's like OpenAI o1 pro, o3 and DeepSeek R1 and Gemini DeepResearch are all here to stay and will be leveraged widely, they are black boxes when it comes to the reasoning they do and thus the output they generate. They will keep getting better, but I think there will be a world where we will have these 'autonomous' LRMs AND also custom agents, where we control the data that is accessed, the criteria for the evaluation, the output that is generated. So learning these patterns will be critical for agent builders. At minimum it will help you prompt the reasoning models in a better way as you understand what makes them tick.

So go sign up for Agent.ai and start building some agents and play around with these patterns.

GenAI for Go-To-Market teams

2,073 位关注者

Hayk C.

Founder @Agentgrow | 3x Head of Sales

1 周

Totally. adding 1 self reflection step is really useful

1 次回应

要查看或添加评论，请登录

Vikram Ekambaram的更多文章

Vibe Prompting and Agent Debugging

2025年3月6日

Vibe Prompting and Agent Debugging

Vibe Prompting - A Two Part Series is sponsored by Agent.ai - Discover, connect with and hire AI agents to do useful…

9 条评论
Agent Building and Vibe Prompting - #1

2025年3月5日

Agent Building and Vibe Prompting - #1

Vibe Prompting - A Two Part Series is sponsored by Agent.ai - Discover, connect with and hire AI agents to do useful…

3 条评论
A person walks onto your website...

2025年3月3日

A person walks onto your website...

Mixture of Agents (MOAT) - A Two Part Series is sponsored by Agent.ai - Discover, connect with and hire AI agents to do…

4 条评论
The Future is Multi-Agentic

2025年2月24日

The Future is Multi-Agentic

Mixture of Agents (MOAT) - A Two Part Series is sponsored by Agent.ai - Discover, connect with and hire AI agents to do…

1 条评论
Anthropic Agentic Systems - #4. Orchestrator-Workers

2025年2月14日

Anthropic Agentic Systems - #4. Orchestrator-Workers

Anthropic Agentic Systems: A Five-Part Exploration is sponsored by Agent.ai - Discover, connect with and hire AI agents…

5 条评论
Anthropic Agentic Systems - #3. Parallelization

2025年2月6日

Anthropic Agentic Systems - #3. Parallelization

Anthropic Agentic Systems: A Five-Part Exploration is sponsored by Agent.ai - Discover, connect with and hire AI agents…

1 条评论
Anthropic Agentic Systems - #2. Routing

2025年1月31日

Anthropic Agentic Systems - #2. Routing

Anthropic Agentic Systems: A Five-Part Exploration is sponsored by Agent.ai - Discover, connect with and hire AI agents…
Challenges with Operator

2025年1月28日

Challenges with Operator

Yesterday I had written about my initial response in working with OpenAI Operator. Here is a link to that article.

4 条评论
Will Operator take my job

2025年1月27日

Will Operator take my job

There were 2 Tsunamis that hit the AI world last week - DeepSeek and OpenAI's Operator. This post is about Operator and…

15 条评论
Anthropic Agentic Systems - #1. Prompt Chaining

2025年1月24日

Anthropic Agentic Systems - #1. Prompt Chaining

Anthropic Agentic Systems: A Five-Part Exploration is sponsored by Agent.ai - Discover, connect with and hire AI agents…

6 条评论

See all articles

GenAI for Go-To-Market teams

2,073 位关注者

Vikram Ekambaram的更多文章

Vibe Prompting and Agent Debugging

Agent Building and Vibe Prompting - #1

A person walks onto your website...

The Future is Multi-Agentic

Anthropic Agentic Systems - #4. Orchestrator-Workers

Anthropic Agentic Systems - #3. Parallelization

Anthropic Agentic Systems - #2. Routing

Challenges with Operator

Will Operator take my job

Anthropic Agentic Systems - #1. Prompt Chaining