LLMOps-Monitoring for Agent AI Platforms
Debmalya Biswas
AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA
Introduction to AI?Agents
The discussion around ChatGPT, has now evolved into AI Agents.?
Bill Gates recently envisioned a future where we would have an AI Agent that is able to process and respond to Natural Language and accomplish a number of different Tasks. Gates used planning a trip as an example. Ordinarily, this would involve booking your hotel, flights, restaurants, etc. on your own. But an AI Agent would be able to use its knowledge of your preferences to book and purchase those things on your behalf.?
However, designing and deploying AI Agents remains c hallenging in practice. In a recent work, we focused on a reference architecture for an Agent AI Platform.
Given a user task, the goal of an AI Agent Platform is to identify (compose) an agent (group of agents) capable to executing the given task.
AI Agents follow a long history of research around Autonomous Agents, especially, Goal oriented Agents. A high-level approach to solving such complex tasks involves: (a) decomposition of the given complex task into (a hierarchy or workflow of) simple tasks, followed by (b) composition of agents able to execute the simple(r) tasks. This can be achieved in a dynamic or static manner.
In the dynamic approach, given a complex user task, the system comes up with a plan to fulfill the request depending on the capabilities of available agents at run-time. In the static approach, given a set of agents, composite agents are defined manually at design-time combining their capabilities.
Andrew Ng recently talked about this aspect from a performance perspective (Source article: https://www.deeplearning.ai/the-batch/issue-246/ ):
Today, a lot of LLM output is for human consumption. But in an agentic workflow, an LLM might be prompted repeatedly to reflect on and improve its output, use tools, plan and execute multiple steps, or implement multiple agents that collaborate. So, we might generate hundreds of thousands of tokens or more before showing any output to a user. This makes fast token generation very desirable and makes slower generation a bottleneck to taking better advantage of existing?models.
LLMOps?—?Monitoring?
Monitoring is an inherent aspect of any distributed systems platform, and can be considered as a critical requirement to materialize the Orchestration Layer of an Agent AI Platform.?
LLMOps-Monitoring (together with failure recovery) will become more critical as the Agent AI platforms become enterprise ready start supporting productionized deployments of AI?Agents.
The need for a monitoring mechanism is even more critical for AI Agent compositions because of their complexity and long running nature. We define it high-level as?
the ability to find out where in the process the execution is and whether any unanticipated glitches have?appeared?
Monitoring AI Agent compositions, similar to distributed systems, is difficult because of the following reasons:
领英推荐
Execution Status related?Queries
In this work, we only consider the first part, i.e., providing information about the current state of the execution. We discuss the capabilities and limitations of acquiring execution snapshots with respect to answering the following types of queries:
AI Agent Monitoring Infrastructure & Execution Lifecycle
We assume the existence of a coordinator and log manager corresponding to each agent as shown in the below figure. We also assume that each agent is responsible for executing a single task / operation.
The coordinator is responsible for all non-functional aspects related to the execution of the agent such as monitoring, transactions, etc. The log manager logs information about any state transitions as well as any messages sent/received by the agent. The state transitions and messages considered are as outlined in the below figure:
We assume that the composition schema (static composition) specifies a partial order for agent operations. We define the happened-before relation between agent operations as follows:
An operation a happened-before operation b (a --> b) if and only if one of the following holds: (1) There exists a control/data dependency between operations a and b such that a needs to terminate before b can start executing. (2)? There exists an operation c such that a --> c and c --> b.
An operation, on failure, is retried with the same or different agents until it completes successfully (terminates). Note that each (retrial) attempt is considered as a new invocation and would be logged accordingly. Finally, to accommodate asynchronous communication, we assume the presence of Input/Output (I/O) queues. Basically, each agent has an I/O queue with respect to its parent and component agents - as shown in Fig. 2.
Given synchronized clocks and logging (as discussed above), a snapshot of the hierarchical composition at time t would consist of the logs of all the “relevant” agent until time t.
The relevant agents can be determined in a recursive manner (starting from the root agent) by considering the agents of the invoked operations recorded in the parent agent's log until time t. If message timestamps are used then we need to consider the skew while recording the logs, i.e., if a parent agent's log was recorded until time t then its component agents’ logs need to be recorded unitl (t + skew). The states of the I/O queues can be determined from the state transition model.
References
AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA
5 个月Detailed article published in DataDrivenInvestor https://medium.datadriveninvestor.com/llmops-monitoring-for-agent-ai-platforms-dee474b2877f
Projectleider Algoritmen en AI bij het Ministerie van Economische Zaken
5 个月Markus Janssen Albert Sikkema