LLMOps-Monitoring for Agent AI Platforms
AI Agent Platform Monitoring Infrastructure & Agent Lifecycle

LLMOps-Monitoring for Agent AI Platforms

Introduction to AI?Agents

The discussion around ChatGPT, has now evolved into AI Agents.?

Bill Gates recently envisioned a future where we would have an AI Agent that is able to process and respond to Natural Language and accomplish a number of different Tasks. Gates used planning a trip as an example. Ordinarily, this would involve booking your hotel, flights, restaurants, etc. on your own. But an AI Agent would be able to use its knowledge of your preferences to book and purchase those things on your behalf.?

However, designing and deploying AI Agents remains c hallenging in practice. In a recent work, we focused on a reference architecture for an Agent AI Platform.

Fig 1: AI Agent Platform Reference Architecture

Given a user task, the goal of an AI Agent Platform is to identify (compose) an agent (group of agents) capable to executing the given task.

  • Orchestration Layer (Task decomposition into an Orchestration Engine executed by the Orchestration Engine)

AI Agents follow a long history of research around Autonomous Agents, especially, Goal oriented Agents. A high-level approach to solving such complex tasks involves: (a) decomposition of the given complex task into (a hierarchy or workflow of) simple tasks, followed by (b) composition of agents able to execute the simple(r) tasks. This can be achieved in a dynamic or static manner.

In the dynamic approach, given a complex user task, the system comes up with a plan to fulfill the request depending on the capabilities of available agents at run-time. In the static approach, given a set of agents, composite agents are defined manually at design-time combining their capabilities.

  • Agent Marketplace: This implies that there exists a marketplace / registry of agents?—?with a well-defined description of the agent capabilities and constraints. We have studied the discovery aspect of Agents in detail in [1].
  • Integration layer supporting different Agent Interaction Patterns, such as, Agent-to-Agent API, Agent API providing Output for Human consumption, Human triggering an AI Agent, AI Agent-to-Agent with Human in the Loop. The integration patterns need to be supported by the underlying LLMOps [2] platform.

Andrew Ng recently talked about this aspect from a performance perspective (Source article: https://www.deeplearning.ai/the-batch/issue-246/ ):

Today, a lot of LLM output is for human consumption. But in an agentic workflow, an LLM might be prompted repeatedly to reflect on and improve its output, use tools, plan and execute multiple steps, or implement multiple agents that collaborate. So, we might generate hundreds of thousands of tokens or more before showing any output to a user. This makes fast token generation very desirable and makes slower generation a bottleneck to taking better advantage of existing?models.

  • Shared memory layer enabling data transfer between Agents, storing interaction data such that it can be used to personalize future interactions.
  • Privacy & Security: Ensure that data shared by the user specific to this task, or user profile data that cuts across tasks; is only shared with the relevant Agents (authentication & access control). Refer to [3, 4] for a detailed discussion of AI Agents / Conversational Agents from a Privacy perspective.

LLMOps?—?Monitoring?

Monitoring is an inherent aspect of any distributed systems platform, and can be considered as a critical requirement to materialize the Orchestration Layer of an Agent AI Platform.?

LLMOps-Monitoring (together with failure recovery) will become more critical as the Agent AI platforms become enterprise ready start supporting productionized deployments of AI?Agents.

The need for a monitoring mechanism is even more critical for AI Agent compositions because of their complexity and long running nature. We define it high-level as?

the ability to find out where in the process the execution is and whether any unanticipated glitches have?appeared?

Monitoring AI Agent compositions, similar to distributed systems, is difficult because of the following reasons:

  • No global observer: Due to their distributed nature, we cannot assume the existence of an entity having visibility over the entire execution. In fact, due to their privacy and autonomy requirements, even the composite agent may not have visibility over the internal processing of its component agents.?
  • Non-determinism: AI Agents allow parallel composition of processes. Also, AI Agents usually depend on external factors for their execution. As such, it may not be possible to predict their behavior before the actual execution. For example, whether a flight booking will succeed or not depends on the number of available seats (at the time of booking) and cannot be predicted in advance.
  • Communication delays: Communication delays make it impossible to record the states of all the involved agents instantaneously. For example, let us assume that agent A initiates an attempt to record the state of the composition. Then, by the time the request (to record its state) reaches agent B and B records its state, agent A’s state might have changed.
  • Dynamic configuration: The agents are selected incrementally as the execution progresses (dynamic binding). Thus, the “components” of the distributed system may not be known in advance.

Execution Status related?Queries

In this work, we only consider the first part, i.e., providing information about the current state of the execution. We discuss the capabilities and limitations of acquiring execution snapshots with respect to answering the following types of queries:

  • Local queries: Queries which can be answered based on the local state information of an agent. For example, queries such as “What is the current state of Agent A’s execution?” or “Has A reached a specific state?”. Local queries can be answered by directly querying the concerned agent provider.
  • Composite queries: Queries expressed over the states of several agents. We assume that any query related to the status of a composition is expressed as a conjunction of the states of individual agent executions. Examples of status queries: “Have agents A, B and C reached states x, y and z respectively?” Such queries have been referred to as stable predicates in literature. Stable predicates are defined as predicates which do not become false once they have become true.?
  • Historical queries: Queries related to the execution history of the composition. For example, “How many times have agents A and B been suspended?”. If the query is answered using an execution snapshot algorithm, then it needs to be mentioned that the results are with respect to a time t_p in the past.
  • Relationship queries: Queries based on the relationship between states. For example, “What was the state of agent A when agent B was in state y?” Unfortunately, execution snapshot based algorithms do not guarantee answers for such queries. For example, we would not be able to answer the query unless we have a snapshot which captures the state of agent B when it was in state y. Such predicates have been referred to as unstable predicates in literature. Unstable predicates keep alternating their values between true and false?—?so are difficult to answer based on snapshot algorithms.

AI Agent Monitoring Infrastructure & Execution Lifecycle

We assume the existence of a coordinator and log manager corresponding to each agent as shown in the below figure. We also assume that each agent is responsible for executing a single task / operation.

Fig 2: Monitoring Infrastructure

The coordinator is responsible for all non-functional aspects related to the execution of the agent such as monitoring, transactions, etc. The log manager logs information about any state transitions as well as any messages sent/received by the agent. The state transitions and messages considered are as outlined in the below figure:

Fig 3: Agent execution lifecycle

  • Not - Executing (NE): The agent is waiting for an invocation.
  • Executing (E): On receiving an Invocation message (IM), the agent changes its state from NE to E.
  • Suspended (S) and Suspended by Invoker (IS): An agent, in state E, may change its state to S due to an internal event (Suspend) or to IS on the receipt of a Suspend message (SM). Conversely, the transition from S to E occurs due to an internal event (Resume) and from IS to E on receiving a Resume message (RM).
  • Canceling (CI), Canceling due to invoker (ICI) and Canceled (C): An agent, in state E/S/IS, may change its state to CI due to an internal event (Cancel) or ICI on the receipt of a Cancel message (CM). Once it finishes cancellation, it changes its state to C and sends a Canceled message (CedM) to its parent. Please note that cancellation may require canceling the effects of some of its component agents.
  • Terminated (T) and Compensating (CP): The agent changes its state to T once it has finished executing the operation. On termination, the agent sends a Terminated message (TM) to its parent. An agent may be required to cancel an operation even after it has finished executing the operation (compensation). An agent, in state T, changes its state to CP on receiving the CM. Once it finishes compensation, it moves to C and sends a CedM to its parent agent.

We assume that the composition schema (static composition) specifies a partial order for agent operations. We define the happened-before relation between agent operations as follows:

An operation a happened-before operation b (a --> b) if and only if one of the following holds: (1) There exists a control/data dependency between operations a and b such that a needs to terminate before b can start executing. (2)? There exists an operation c such that a --> c and c --> b.

An operation, on failure, is retried with the same or different agents until it completes successfully (terminates). Note that each (retrial) attempt is considered as a new invocation and would be logged accordingly. Finally, to accommodate asynchronous communication, we assume the presence of Input/Output (I/O) queues. Basically, each agent has an I/O queue with respect to its parent and component agents - as shown in Fig. 2.

Given synchronized clocks and logging (as discussed above), a snapshot of the hierarchical composition at time t would consist of the logs of all the “relevant” agent until time t.

The relevant agents can be determined in a recursive manner (starting from the root agent) by considering the agents of the invoked operations recorded in the parent agent's log until time t. If message timestamps are used then we need to consider the skew while recording the logs, i.e., if a parent agent's log was recorded until time t then its component agents’ logs need to be recorded unitl (t + skew). The states of the I/O queues can be determined from the state transition model.

References

  1. D. Biswas. Constraints Enabled Autonomous Agent Marketplace: Discovery and Matchmaking. In proc. of the 16th International Conference on Agents and Artificial Intelligence (ICAART), 2024.
  2. D. Biswas. Gen AI Architecture Patterns, 2023 (Link to the full article on LinkedIn: https://lnkd.in/e2M6AS5S ).
  3. D. Biswas. Privacy preserving Chatbot Conversations. 3rd IEEE International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), 2020.
  4. D. Biswas. Privacy Considerations of AI Agents, 2024 (Link to the full article on LinkedIn: https://www.dhirubhai.net/pulse/privacy-challenges-ai-agents-debmalya-biswas-meaqf/?trackingId=VwSnQolaRKSdWDvIwPVbDw%3D%3D )


Debmalya Biswas

AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA

5 个月
Humphrey Revius

Projectleider Algoritmen en AI bij het Ministerie van Economische Zaken

5 个月

要查看或添加评论,请登录

Debmalya Biswas的更多文章

社区洞察

其他会员也浏览了