Next level Gen AI: From ChatGPT to Multi-agent workflows

Next level Gen AI: From ChatGPT to Multi-agent workflows

One of the godfathers of AI, Andrew Ng , recently released an open source implementation of a sample application for agentic workflow. It's a "weekend project" not meant to be used as production code; nevertheless it effectively demonstrates a major shift that's coming for Generative AI applications.

The example here translates text from one language to another, optionally tailored for country-specific usage. The input text is broken up into multiple chunks if it is too long. The system makes it easy to modify the output style, use idiomatic terminology and apply regional dialects.

Unlike typical ChatGPT-like usage, where you would simply ask the LLM for a translation and receive the generated output, this approach uses a different architecture: here we have three cooperating LLM agents with different roles (and correspondingly different prompts) working together in a multi-step fashion to generate a higher-quality translation, as follows:

  1. The first agent generates the initial translation as usual.
  2. A second agent, acting as a translation expert, then reflects on the generated translation, taking into account factors like accuracy, fluency, style and terminology, to provide expert suggestions and constructive criticisms.
  3. Finally the third agent edits the initial translation by taking into account the expert's suggestions and criticisms, to generate a new (and presumably better) version.

This is a simple example, but it represents a more sophisticated way to use LLMs than simply "calling ChatGPT". By assigning different roles to various agents and getting them to cooperate, you are basically mimicking a team of humans working together to achieve a common goal.

Why is this important? As Andrew Ng says, no one writes an essay or a story by simply putting down one word after another. The way humans work in real life looks more like this:

  • Reflect on the topic and think about what you want to say
  • Generate an outline of main points, adjusting as necessary to create a coherent narrative
  • Develop the individual points into paragraphs and sentences to generate an initial draft
  • Edit the draft to flesh out ideas, clarify confusing language and select the right words
  • Create the polished final document

The reason we do it this way is because it generates much better output: using multiple passes coupled with reflection and editing allows us to separate the different tasks. First we pay attention(!) to the larger thread of the document, the cadence and flow of words, and only then do we focus on optimizing the individual phrases, words and terminology.

Early applications of GPT and other LLM models have so far focused on straightforward chat completions: the user provides some question or statement, and the LLM responds. Those responses are already startlingly effective, especially when integrated with buffer memory and RAG recall; the technology is catching on like wildfire.

But that is only scratching the surface. As we move into more sophisticated usage, building agents that can reason independently, communicate together and cooperate to achieve a common goal, these applications will increasingly start to act autonomously on behalf of the user. Beyond simply asking the LLM a question, users will instead be able to assign a complex task to an agent; this agent in turn will work with other agents acting in various roles, to fulfill that task and return with a completed solution for the user.

The example code base above is a concrete implementation of this more complex approach.

As always in technology, this is just the beginning ...


For the Gen AI software geeks, some observations about the code:

  • I found the reflection inference prompt really interesting. The system prompt is not very different than for the initial generation, but the user prompt provides very specific and actionable advice on how to act like a critic, changing the LLM's role from creation to evaluation. I suspect that this type of focus on role and perspective in prompting is going to be increasingly important in the future.
  • It's interesting to see the original and translated text carefully delimited by <xml tags> inside the prompts. The best choice for data delimiter depends on the LLM model provider, here they've chosen XML since it's one of the options OpenAI supports. XML is also a particularly good choice for Anthropic 's Claude models.
  • The code uses JSON_mode for the response format. Consistently obtaining valid JSON output used to be a challenge in the GPT-3 models but OpenAI has now introduced an explicit parameter for this purpose.
  • I don't fully understand the calculate_chunk_size method very well (seems unwieldy?), although I've seen it done this way before by others. I need to play with the code to get better insight.
  • As a personal choice, I tend to avoid using langchain because it pulls in a lot of code that is very hard to inspect and adjust - once your Gen AI application works 80-85% of the time, each additional increment in quality becomes more challenging. I've found that being able to understand minutely all the details of prompt generation and semantic content extraction is critical for that process.
  • I was not familiar with the python icecream package - it's a better version of print() for debugging. Software development is a constant learning process.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了