A look inside an AI multi-agent

A look inside an AI multi-agent

If you want to catch-up the previous episodes of the AI-Agent series:

Now in this episode it’s time to cut into the guts of a pretty general AI Multi-Agent design (by the way, no better introduction into the subject than the classic “Artificial Intelligence, a Modern Approach” book. I recommend any serious AI designer read it cover to cover).

So what is a Multi-Agent?

Let’s say you want to create an analytics report about USA Economy. You can’t just ask ChatGPT to do that — sure, it will give you some basic data based on the data it was trained on, but even that is not guaranteed to be correct, and it will not contain any actual research.

A useful AI assistant would need to do the following:

  • search the web for various sources on USA economy
  • ask the user to provide additional information to use in the report
  • “read” and “understand” all that information
  • write an actual report section by section
  • insert important charts and tables
  • format the final report

All of it is possible, but using multi-agent architecture, where each subtask is being executed by a specific agent, based on an LLM or other ANN with additional ability to call external APIs (e.g., for web search or converting html text to something more readable etc).

That is exactly what we do at Beyond the Cloud next time we’ll dive deep into specific AI multi-agents design and how easy it is to do using some free platforms out there.


In this part with three small subchapters, we will not only look at the anatomy of a good, useful AI Multi-Agent, but in subchapter 3 will discuss what components you can use for the specific parts, making things quite practical.

1. Humans vs Agents

Since we are trying to design something intelligent, we have no better example to model this something after than a human being. Here is a scheme of a typical human:

We have eyes (and other senses) to, uhm, sense the outside world. We have a mouth to communicate intelligently with each other. We have memory and (some) knowledge in our brain, and we also have some sort of reasoning engine, which makes decisions for us based on the processed senses and our knowledge. Last but not least, we have hands (and other body parts) to enact some sort of change in the external world.

But that’s exactly what we want in the AI agent as well! Can ChatGPT do all of the above? Of course not, it can only pretend to “talk” to you. However, it can serve as a key part of our multi-agent modeled after a “typical human” described above. Here it is:

This is a very general scheme, but it captures the high-level design of pretty much any AI agent we can imagine — self-driving car, autonomous robot, agent on the web, etc.

2. Agent Anatomy

Just as humans have a mouth it needs either a chat or voice interface to interact with humans (not so much with other AI agents — this can be done much more efficiently without natural language or even slower / error prone voice).

It has to have sensors to understand the environment it operates in — these may be cameras and lidars for self-driving cars or a combination of LLMs (large language models) with image-to-text models for web-based agents when they need to “understand” the websites they surf, etc.

It has to have a “processing module” that combines both the sensor and human input to “understand” what needs to be done.

It has to have a memory / knowledge base to consult depending on the inbound context — what people started calling by another hype-term “RAG” (retrieval augmented generation), which unfortunately narrows and simplifies this extremely important function.

It has to have a “brain” that plans the solution, critics it, refines it and formulates the final execution plan — in the scheme, it’s just one box, but in reality, this part is a pretty complex multi-agent itself, since we are trying to mimic a human brain by something much less complicated.

Finally, it has to have “hands” — an ability to interact with external software and systems to act on behalf of the human in the external world.

These AI multi-agents and their design have been studied for ages, e.g. in the book I recommended at the beginning of this article. Invention of LLMs and other Generative AI models makes it possible to build upon this research and implement it at the new level of technology, and finally start building AI agents that are useful in a general sense, as opposed to for specialized extremely narrow tasks.

3. Specific AI Agent “Anatomy Parts” Design

Let us move from our lousy human analogy to discussing what parts we can use today to build a versatile AI Multi-Agent like the one in the scheme. Let’s limit ourselves to an agent that operates on the internet — as opposed to the “real world”, as the latter task is quite a bit more complex.

Sensors. To sense the environment our agent operates in, it needs to be able to:

  • Read the websites, preferably the way humans do (as the websites have been built for humans, not for robots)
  • Discover and be able to use various APIs available on the internet and designed for computers

To build such sensors, we will need a combination of LLMs (large language models) and models that can understand images (or convert them to textual descriptions) plus a little bit of software code that can crawl and download data at different URLs. Then with the right prompt engineering, our LLM-based sensors will convert what they “read” into formats suitable for further processing.

Human Interaction. This is the part everyone is familiar with by now thanks to ChatGPT — you can type what you want, or say it with words, and it will be processed by another LLM-based module for further reasoning.

Process Input. This, also LLM-based, module takes whatever Sensors give it together with the current request from a Human, and tries to formulate a clear Task Request for our Plan Solution module — arguably, the main part of the “brain”. It is also absolutely critical for this module to consult the Knowledge Base via RAG and make it part of the context when formulating the Task Request.

Knowledge Base / RAG. This module stores the data and knowledge that may be relevant to our agent’s operations. This can be all kinds of publicly available data accessed via “regular” internet search, as well as so-called Vector Databases, which represent unstructured text via numerical vector embeddings. This provides a much faster search “by meaning” as opposed to simply “by keywords” and is a crucial part of our Agent.

Plan Solution. This is normally a bunch of different LLMs working together, as it has to take the Task Request, analyze what kind of resources (and “hands”) are available to the agent to execute this request, iteratively plan such an execution using various critique / step-by-step planning approaches, design sub-agents that are currently missing but needed for the task execution, and finally orchestrate execution using the “hands” or sub-agents available to our agent. This is an extremely interesting and fast-developing area of AI research. In some other, more specialized agents (e.g., in games), approaches such as Reinforcement Learning are quite useful as well.

“Hands” or Execute Actions. All of the above would be completely useless if our agent didn’t have “hands” to do something that a human asked of it. These hands are pieces of code that can call external APIs, press buttons on web-pages, or in some other way interact with existing software infrastructure.

All of the above is possible to design and create today, and times could not be more exciting for this activity. We at Beyond the Cloud are building not just such agents but also a platform to design and build them very easily without any programming knowledge.


Think a friend would enjoy this too? Share the newsletter and let them join the conversation.


Well, that's it for now. If you like my article, subscribe to my newsletter or connect with me. LinkedIn appreciates your likes by making my articles available to more readers.

Signing off - Marco


Top-rated articles:



Published by


Fascinating insights into AI multi-agent systems, looking forward to exploring the complexities and applications discussed in this episode!

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了