Why AutoGen is a Game-Changer for AI Developers and Architects: My Personal View

Why AutoGen is a Game-Changer for AI Developers and Architects: My Personal View

I started playing with AutoGen - an open-source project introducing a novel framework for simplifying app development from Microsoft Research ([2303.04673] Cost-Effective Hyperparameter Optimization for Large Language Model Generation Inference (arxiv.org) [2308.08155] AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation (arxiv.org).

The focus is on intricate multi-agent conversations in which highly customizable bots interact synchronously, collaborating to address and solve various tasks effectively. The concept of imparting AI systems with the capability to collaborate and communicate through natural dialogue is fascinating and resonates with human-like interactions. AutoGen is a groundbreaking tool that significantly simplifies developing applications with GPT-4. It efficiently manages the intricate details, ensuring users understand the underlying complexities. I've been deeply impressed by the range and caliber of applications we can build using AutoGen. From assisting in intricate mathematical problems to facilitating automated code generation, these applications are a testament to the adaptability and potential of this methodology. With AutoGen, developers can effortlessly craft high-caliber AI applications catering to various sectors and domains.

Enabling Teams of AI Agents That Communicate and Cooperate

One of the standout features of AutoGen is its profound capability to grant developers an extensive degree of flexibility in dictating the behavior, conversation patterns, and collaboration methods of agents. This adaptability isn't just a luxury; it's an essential aspect to ensure that a wide variety of applications, each with its unique requirements, can be adequately catered to. What truly excites me about AutoGen is its innovative approach to programming conversations. It provides a dual-mode system where developers can leverage the technical precision of Python code for specific, detailed instructions, while also utilizing the more intuitive, human-like clarity of natural language for broader, overarching directives. This synergistic blend offers developers an unparalleled advantage, especially when designing intricate workflows. There might be scenarios where the intricacies of agent coordination logic necessitate a procedural expression, best conveyed using code.

Conversely, there are situations where the abstracted, high-level behaviors and objectives are more aptly captured using natural language, tapping into its nuance and inherent expressiveness. AutoGen, with its advanced features, effortlessly bridges these two paradigms, enabling a fluid and dynamic programming experience.

Developers can leverage the technical precision of Python code and the more intuitive, human-like clarity of natural language for broader, overarching directives.

How Autogen Works

At the heart of AutoGen lie two revolutionary ideas that are reshaping the landscape of collaborative application development: the introduction of 'conversable agents' and the pioneering concept of 'conversation programming'. What this means is that AutoGen doesn't just provide a platform for creating bots; it offers a holistic environment where these bots can be intricately tailored with specialized skills and functionalities. Whether it's writing code, executing commands, seeking real-time feedback from humans, or any other bespoke capability, AutoGen ensures each bot is crafted to perfection. But what truly sets it apart is the way these bots can seamlessly communicate with one another. Instead of relying on rigid, predefined protocols, they can engage in natural language dialogues, allowing for dynamic and organic problem-solving collaborations.

To illustrate this, consider the following scenario:

  • A "Coder Bot" that generates code snippets
  • An "Executor Bot" that runs the code
  • A "Human Feedback Bot" that gets inputs from a person

The Coder Bot may suggest some code to solve a problem. This code is passed to the Executor Bot, which tries running it. The Executor Bot can let the Coder Bot know if there are errors. Based on that feedback, the Coder Bot can fix the code and send back new code. This cycle continues until the code works. At any point, the Human Feedback Bot can also be pulled in to provide inputs if needed. So AutoGen coordinates the conversations between different bots that play complementary roles, making it much easier to build apps by breaking down complex tasks. The bots have simple interfaces for sending, receiving, and responding to messages, while the framework handles orchestrating the bot conversations smoothly in the background.

AutoGen doesn't just provide a platform for creating bots; it offers a holistic environment where these bots can be intricately tailored with specialized skills and functionalities

AutoGen Hello World

When embarking on a new technological endeavor, it is highly recommended to commence with a basic Hello World program that concentrates on the fundamental principles. To achieve this, five straightforward steps can be followed.

  1. Setup the environment
  2. Define the settings
  3. Create the first agent named "Assistant"
  4. Create the second agent named "User_Proxy"
  5. Let's get the ball rolling

In this Hello World project, two agents interact with each other to solve the assigned problem: draw a graph of the YTD evolution of two stocks (META and TESLA).

1 Setup the environment

To set up your environment for optimal performance, install Python version 3.8 or higher and execute the command "pip install pyautogen".

pip install pyautogen        

2 Define the settings

In this Hello World simple program, I'm using Azure OpenAI Service:

YOUR_AZURE_OPENAI_KEY: the key can be found in Keys and Endpoints.

NAME_OF_YOUR_DEPLOYMENT: the name can be found in Keys and Endpoints.

YOUR_DEPLOYMENT_NAME: can be found in Azure OpenAI Studio and represents your Deployment Name.

import autogen

config_list = [
    {
        'model': 'YOUR_DEPLOYMENT_NAME',
        'api_key': 'YOUR_AZURE_OPENAI_KEY',
        'api_base': 'https://NAMEOFYOURENDPOINT.openai.azure.com/',
        'api_type': 'azure',
        'api_version': '2023-08-01-preview',
    } 
]        

If you prefer, you can use a JSON file containing the configuration:

import autogen

config_list = autogen.config_list_from_json(
    "OAI_CONFIG_LIST",
    file_location=".",
    filter_dict={
        "model": {
            "gpt-4",
            "gpt-3.5-turbo",
        }
    }
)        

It first looks for environment variable "OAI_CONFIG_LIST" which needs to be a valid json string. If that variable is not found, it then looks for a json file named "OAI_CONFIG_LIST". Below an example

[
    {
        "model": "gpt-4",
        "api_key": "<your OpenAI API key here>"
    },
    {
        "model": "gpt-4",
        "api_key": "<your Azure OpenAI API key here>",
        "api_base": "<your Azure OpenAI API base here>",
        "api_type": "azure",
        "api_version": "2023-07-01-preview"
    },
    {
        "model": "gpt-3.5-turbo",
        "api_key": "<your Azure OpenAI API key here>",
        "api_base": "<your Azure OpenAI API base here>",
        "api_type": "azure",
        "api_version": "2023-07-01-preview"
    }
]        

3 Create the first agent named "Assistant"

The AssistantAgent, named assistant, is the agent who will delivery the task who is configured using the llm_config.

assistant = autogen.AssistantAgent(
    name="assistant",
    llm_config={
        "seed": 42,  # seed for caching and reproducibility
        "config_list": config_list,  
        "temperature": 0,  
    },  
)        

4 Create the second agent named "User_Proxy"

This agent represents me, the human! By default, the agent will prompt for human input every time a message is received. In this case I configured the human_input_mode to NEVER, so that the agent will never prompt for human input. Under this mode, the conversation stops when the number of auto reply reaches the max_consecutive_auto_reply or when is_termination_msg is True.

The parameter max_consecutive_auto_reply=10 is set to ensure that we limit the process to a maximum of 10 iterations.

user_proxy = autogen.UserProxyAgent(
    name="user_proxy",
    human_input_mode="NEVER",
    max_consecutive_auto_reply=10,
    is_termination_msg=lambda x: x.get("content", "").rstrip().endswith("TERMINATE"),
    code_execution_config={
        "work_dir": "coding",
        "use_docker": False,
    },
)        

5 Let's get the ball rolling

Once the participating agents are constructed properly, the magic truly begins. The conversation, now set in motion, can evolve and flow organically.

user_proxy.initiate_chat(
    assistant,
    message="""What date is today? Compare the year-to-date gain for META and TESLA and lot a chart.""",
)        

The agents, powered by their underlying algorithms and capabilities, can engage in dynamic exchanges, allowing the conversation to proceed seamlessly and, in many cases, without further human intervention.

Output

Now the only thing to do is to watch fascinated the two bots reason and solve the assigned task together. Nice, isn't it?


You can start playing with AutoGen from this GitHub Repo:

First Feedback and Personal Considerations for Solution Architects and Developers

I must say, AutoGen left me absolutely mind-blown with its remarkable capabilities! Nothing to add! On the other hand, most of the questions that arose for me during the tests concern certain aspects that I consider very important for developers and architects who will use AutoGen:

  • There are instances where the two entities take divergent routes to address a problem. While this diversity in approach can be advantageous, it's crucial for us, as developers, to establish a method for evaluating outcomes, particularly when they don't yield the desired results.
  • We need to fine-tune the maximum number of iterations. Challenging tasks or inadequate prompts might cause the system to hit this limit prematurely, halting the process.
  • We should be vigilant about the length of iterations. If they become excessively long, we risk exceeding the token size capacity, which could impede the system's functionality.
  • I suggest to implement a feedback loop within the system. Allow users to report issues, provide suggestions, or even share their successes. This feedback can be invaluable for refining and improving the framework.
  • Incorporate monitoring solutions to closely track both the performance and expenses associated with AI agents. As the number of interacting agents increases, expenses can escalate rapidly.
  • How will the future of AI interactions change when we can personalize outputs using our own data, metrics, and budgets? AutoGen provides a method called EcoOptiGen that efficiently tunes the settings (hyperparameters) of Large Language Models to make them perform better. A research study has shown that by adjusting these hyperparameters, the usefulness and performance of these models can be greatly enhanced. Documentation here.
  • Last but not least, AutoGen can seamlessly integrate with both cloud-based and local LLMs. This adaptability allows you to have greater control and flexibility in your application's design and performance. Read this blog post about this scenario.

Conclusion

I think that AutoGen is a really useful framework for developers. It allows them to effectively use LLMs and the multi-agent conversation concept is great for teamwork and communication, as it feels like real human interaction. AutoGen's coordination features are advanced, so developers can focus on creating imaginative experiences without worrying about the complex background processes. It was a smart decision to make AutoGen an open framework, as it allows the community to create new and innovative applications. I'm excited to see all the different experiences that developers will create using this amazing technology.


Mario


Stuart Richardson

Founder at U-V

11 个月

Could AutoGen be used to create complex full-stack apps based on any stack (for instance, the LAMP stack (JavaScript + Linux, Apache, MySQL, and PHP))? Perhaps one could embed a large wiki PDF (one for each language) into the pinecone vector database of the AI coding agent (to increase search speed by focusing on singular documents for code reference), and of course then breaking up different tasks for many different AI agents (front-end, back-end, coding, debugging, task management, optimization, etc.). Any thoughts?

回复

要查看或添加评论,请登录

社区洞察