Mastering Generative AI Agents: Balancing Autonomy and Control (2/3)
Generative AI Agents are powerful tools that seamlessly blend conversational interaction with automation. However, generative autonomy comes with risks – AI Agents can lose focus potentially go off track.
In part one of this series, we explored understanding conversational flow and settings to balance immediate responses and preprocessing for optimal interactions.
Now, we will dive into leveraging tools and parameters – supported by most LLMs from OpenAI , Microsoft Azure , Anthropic , or 谷歌 to name a few – to tailor responses, restrict tools for focused outputs, or enable dynamic decision-making, striking a balance between generative autonomy and guardrails to enhance performance and prevent deviations.
Controlling textual Responses with Tools
Controlling the appearance of an AI Agent’s result as described here is useful, but wouldn’t it be even better to have detailed control over every aspect of the AI Agent’s content creation? That’s where tools and parameters come into play. They allow you to
Tip: Sometimes less is more. While it’s tempting to use the most powerful Large Language Model to power an AI Agent, this can occasionally be counterproductive. Highly advanced models with strong reasoning capabilities might prioritize reasoning over making tool calls – doesn’t that sound human? To address this, you can use prompt engineering, opt for a "dumber" model, or combine both approaches.
Generated Parameters
When using Tool Actions with an LLM based AI Agent, you can enhance functionality by combining them with parameters. These parameters can be sourced either from user inputs or content generated by the AI Agent. For example, if you have a general tool for providing help with a description like “This tool provides assistance with user inquiries.”, you could achieve all this and more:
In detail the parameters could be (parameter name, type, description):
领英推荐
These parameters are located in the input object, once the tool branch is executed, and can be accessed using CognigyScript, for example: {{input.aiAgent.toolArgs.final_answer}}.
Default and Required
With tools, the AI Agent autonomously decides which tool to use and uses the default branch if uncertain. You can refine this decision-making by providing clear guidance through the AI Agent's instructions, the AI Agent Node's configuration, or detailed descriptions within the tools themselves.
Tip: Large Language Models vary in their behavior and capabilities. While their general usage is quite similar, some models excel in reasoning while others have unique feature sets. Cognigy harmonizes these differences to simplify integration, so in most cases, you don’t need to worry about the specifics. However, there are limitations – for instance, the tool selection varies, and the "required" feature is not available for all models or through all providers.
To ensure the AI Agent always uses one of the tools, you can set the Tool Choice in the AI Agent Node's Tool Settings to "Required". However, this approach carries some risks, as the AI Agent will always select the most likely tool, even when it might not be appropriate. To mitigate this, it can be beneficial to include a general tool that, unlike the default behavior, can utilize parameters and provide additional control – one of the key advantages of replacing the default branch with a combination of tools and parameters. This tool could be a named “provide_default_response” with a description like “This tool generates responses to user inputs, addressing small talk and general inquiries.” and the following parameter (parameter name, type, description): final_answer, String, "Generated assistant's response."
Tool Chaining
Another approach to leverage generated conversation instead of deterministic dialogs, is by defining conversational elements as tools and instructions, allowing the AI Agent to decide what to use and when. This approach mirrors how you might guide a human agent, trusting them to determine the best next step.
In this example, we have extended the help tool that now requires the product as a parameter. To enhance the user experience, we also added a tool called "display_available_products" with a description like, "This tool shows products for the user to choose from." Additionally, we included an instruction in the AI Agent Node stating, "Always use the display_available_products tool to assist in selecting a product." This setup gives the AI Agent greater autonomy. For instance, if the user requests help without specifying a product, the "display_available_products" tool is called first, and once the user selects a product, the help tool is triggered. Unfortunately, while this is a very helpful pattern, it has the downside of varying depending on the chosen language model and incorporates an element of randomness.
In the last example, there is a turn between the two tool calls, but tools can also be chained. For instance, if you have a multiply tool and a divide tool, and you ask a question like, “What is 4 divided by 2 and then multiplied by 3?”, the AI Agent will ensure the tools are called in the correct sequence, one after the other. This approach can be applied to various scenarios beyond simple math, such as input validation, invoking dependent business services, and more.
In part one of this series, we delved into controlling the LLM's responses, and in this part, we explored adding control to generative conversations through the use of tools. Stay tuned for the final article, where we’ll bring it all together by demonstrating how these techniques can enrich generative conversations with multimodal elements.
?? Awarded Speaker & ? AI/UX Enthusiast
3 个月You'll finde part 1 here https://www.dhirubhai.net/pulse/mastering-generative-ai-agents-balancing-autonomy-control-wolter-xuq9 And if you don't want to miss the last article in this series and other tutorials, you can simply follow me.