What are AI Multi-Agents and how to use them?
Typical Human vs Generic AI Multi-Agent

What are AI Multi-Agents and how to use them?

I am really passionate about AI. Has been for a long time, ever since I read the 3rd edition of the excellent classic - "Artificial Intelligence, a Modern Approach" - about 20 years ago, long before the latest AI hype driven by ChatGPT. That's when I learned about AI multi-agents. That's why 20 years and countless research papers and experiments later I am convinced - the time to build them is now. (That's what we are doing at Integrail but it's not the point of today's article).

ChatGPT or any other chatbot for that matter is not AI. We cannot throw away over 50 years of research just because some new architecture (transformers in our case - do read the classic paper if you are into the technical details) captures public imagination. But using this new architecture together with time-tested approaches -- that's where real power is unleashed.

So what are AI multi-agents? Look at the scheme of a typical human being below.

Typical Human Being

We have eyes (and other senses) to, uhm, sense the outside world. We have a mouth to communicate intelligently with each other. We have memory / knowledge in our brain and we have some sort of reasoning engine also in our brain that makes decisions for us based on the processed senses and our knowledge. Last but not least, we have hands (and other parts) to enact some sort of change in the external world.

But that's exactly what we want in the AI agent as well! Can ChatGPT do all of the above? Of course not, it can only pretend to "talk" to you. However, it can serve as a key part of our multi-agent modeled after a "typical human" described above. Here it is:

Generic AI Multi-Agent

This is a very general scheme, but it captures high-level design of pretty much any AI agent we can imagine - self-driving car, autonomous robot, agent on the web, etc. Just as humans have a mouth it needs either a chat or voice interface to interact with humans (not so much with other AI agents - this can be done much more efficiently without natural language or even slower / error prone voice).

It has to have sensors to understand the environment it operates in - it can be cameras and lidars for self-driving cars or a combination of LLMs (large language models) with image-to-text models for web-based agents when they need to "understand" the websites they surf etc.

It has to have a "processing module" that combines both the sensor and human input to "understand" what needs to be done.

It has to have a memory / knowledge base to consult depending on the inbound context - what people started calling by another hype-term RAG (retrieval augmented generation), which actually narrows and simplifies this extremely important function.

It has to have a "brain" that plans the solution, critics it, refines it, and formulates the final execution plan - in the scheme, it's just one box, but in reality, this part is a pretty complex multi-agent itself, since we are trying to mimic a human brain by something much less complicated.

Finally, it has to have "hands" - an ability to interact with external software and systems to act on behalf of the human in the external world.

This multi-agents and their design have been studied for ages. Invention of LLMs and other Generative AI models makes it possible to reuse this research and implement it at the new level of technology, and finally start building AI agents that are useful in a pretty general sense, as opposed to for specialized extremely narrow tasks.

This is exactly what we are building at Integrail - do subscribe and join us on this exciting journey! If you are interested in the AI and the changes it will bring - there is a lot we can learn from each other!


要查看或添加评论,请登录

Anton Antich的更多文章

社区洞察

其他会员也浏览了