登录查看更多内容

LLMOps-Monitoring for Agent AI Platforms

Debmalya Biswas

AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA

发布日期: 2024年6月8日

+ 关注

Introduction to AI?Agents

The discussion around ChatGPT, has now evolved into AI Agents.?

Bill Gates recently envisioned a future where we would have an AI Agent that is able to process and respond to Natural Language and accomplish a number of different Tasks. Gates used planning a trip as an example. Ordinarily, this would involve booking your hotel, flights, restaurants, etc. on your own. But an AI Agent would be able to use its knowledge of your preferences to book and purchase those things on your behalf.?

However, designing and deploying AI Agents remains c hallenging in practice. In a recent work, we focused on a reference architecture for an Agent AI Platform.

Fig 1: AI Agent Platform Reference Architecture

Given a user task, the goal of an AI Agent Platform is to identify (compose) an agent (group of agents) capable to executing the given task.

Orchestration Layer (Task decomposition into an Orchestration Engine executed by the Orchestration Engine)

AI Agents follow a long history of research around Autonomous Agents, especially, Goal oriented Agents. A high-level approach to solving such complex tasks involves: (a) decomposition of the given complex task into (a hierarchy or workflow of) simple tasks, followed by (b) composition of agents able to execute the simple(r) tasks. This can be achieved in a dynamic or static manner.

In the dynamic approach, given a complex user task, the system comes up with a plan to fulfill the request depending on the capabilities of available agents at run-time. In the static approach, given a set of agents, composite agents are defined manually at design-time combining their capabilities.

Agent Marketplace: This implies that there exists a marketplace / registry of agents?—?with a well-defined description of the agent capabilities and constraints. We have studied the discovery aspect of Agents in detail in [1].
Integration layer supporting different Agent Interaction Patterns, such as, Agent-to-Agent API, Agent API providing Output for Human consumption, Human triggering an AI Agent, AI Agent-to-Agent with Human in the Loop. The integration patterns need to be supported by the underlying LLMOps [2] platform.

Andrew Ng recently talked about this aspect from a performance perspective (Source article: https://www.deeplearning.ai/the-batch/issue-246/ ):

Today, a lot of LLM output is for human consumption. But in an agentic workflow, an LLM might be prompted repeatedly to reflect on and improve its output, use tools, plan and execute multiple steps, or implement multiple agents that collaborate. So, we might generate hundreds of thousands of tokens or more before showing any output to a user. This makes fast token generation very desirable and makes slower generation a bottleneck to taking better advantage of existing?models.

Shared memory layer enabling data transfer between Agents, storing interaction data such that it can be used to personalize future interactions.
Privacy & Security: Ensure that data shared by the user specific to this task, or user profile data that cuts across tasks; is only shared with the relevant Agents (authentication & access control). Refer to [3, 4] for a detailed discussion of AI Agents / Conversational Agents from a Privacy perspective.

LLMOps?—?Monitoring?

Monitoring is an inherent aspect of any distributed systems platform, and can be considered as a critical requirement to materialize the Orchestration Layer of an Agent AI Platform.?

LLMOps-Monitoring (together with failure recovery) will become more critical as the Agent AI platforms become enterprise ready start supporting productionized deployments of AI?Agents.

The need for a monitoring mechanism is even more critical for AI Agent compositions because of their complexity and long running nature. We define it high-level as?

the ability to find out where in the process the execution is and whether any unanticipated glitches have?appeared?

Monitoring AI Agent compositions, similar to distributed systems, is difficult because of the following reasons:

Bernard Marr 1 年前

Accelerating AI Initiatives with Multi-Cloud

VMware 1 年前

Navigating the Future of AI: Embracing Chat Interfaces

Data & Analytics 3 个月前

No global observer: Due to their distributed nature, we cannot assume the existence of an entity having visibility over the entire execution. In fact, due to their privacy and autonomy requirements, even the composite agent may not have visibility over the internal processing of its component agents.?
Non-determinism: AI Agents allow parallel composition of processes. Also, AI Agents usually depend on external factors for their execution. As such, it may not be possible to predict their behavior before the actual execution. For example, whether a flight booking will succeed or not depends on the number of available seats (at the time of booking) and cannot be predicted in advance.
Communication delays: Communication delays make it impossible to record the states of all the involved agents instantaneously. For example, let us assume that agent A initiates an attempt to record the state of the composition. Then, by the time the request (to record its state) reaches agent B and B records its state, agent A’s state might have changed.
Dynamic configuration: The agents are selected incrementally as the execution progresses (dynamic binding). Thus, the “components” of the distributed system may not be known in advance.

Execution Status related?Queries

In this work, we only consider the first part, i.e., providing information about the current state of the execution. We discuss the capabilities and limitations of acquiring execution snapshots with respect to answering the following types of queries:

Local queries: Queries which can be answered based on the local state information of an agent. For example, queries such as “What is the current state of Agent A’s execution?” or “Has A reached a specific state?”. Local queries can be answered by directly querying the concerned agent provider.
Composite queries: Queries expressed over the states of several agents. We assume that any query related to the status of a composition is expressed as a conjunction of the states of individual agent executions. Examples of status queries: “Have agents A, B and C reached states x, y and z respectively?” Such queries have been referred to as stable predicates in literature. Stable predicates are defined as predicates which do not become false once they have become true.?
Historical queries: Queries related to the execution history of the composition. For example, “How many times have agents A and B been suspended?”. If the query is answered using an execution snapshot algorithm, then it needs to be mentioned that the results are with respect to a time t_p in the past.
Relationship queries: Queries based on the relationship between states. For example, “What was the state of agent A when agent B was in state y?” Unfortunately, execution snapshot based algorithms do not guarantee answers for such queries. For example, we would not be able to answer the query unless we have a snapshot which captures the state of agent B when it was in state y. Such predicates have been referred to as unstable predicates in literature. Unstable predicates keep alternating their values between true and false?—?so are difficult to answer based on snapshot algorithms.

AI Agent Monitoring Infrastructure & Execution Lifecycle

We assume the existence of a coordinator and log manager corresponding to each agent as shown in the below figure. We also assume that each agent is responsible for executing a single task / operation.

The coordinator is responsible for all non-functional aspects related to the execution of the agent such as monitoring, transactions, etc. The log manager logs information about any state transitions as well as any messages sent/received by the agent. The state transitions and messages considered are as outlined in the below figure:

Not - Executing (NE): The agent is waiting for an invocation.
Executing (E): On receiving an Invocation message (IM), the agent changes its state from NE to E.
Suspended (S) and Suspended by Invoker (IS): An agent, in state E, may change its state to S due to an internal event (Suspend) or to IS on the receipt of a Suspend message (SM). Conversely, the transition from S to E occurs due to an internal event (Resume) and from IS to E on receiving a Resume message (RM).
Canceling (CI), Canceling due to invoker (ICI) and Canceled (C): An agent, in state E/S/IS, may change its state to CI due to an internal event (Cancel) or ICI on the receipt of a Cancel message (CM). Once it finishes cancellation, it changes its state to C and sends a Canceled message (CedM) to its parent. Please note that cancellation may require canceling the effects of some of its component agents.
Terminated (T) and Compensating (CP): The agent changes its state to T once it has finished executing the operation. On termination, the agent sends a Terminated message (TM) to its parent. An agent may be required to cancel an operation even after it has finished executing the operation (compensation). An agent, in state T, changes its state to CP on receiving the CM. Once it finishes compensation, it moves to C and sends a CedM to its parent agent.

We assume that the composition schema (static composition) specifies a partial order for agent operations. We define the happened-before relation between agent operations as follows:

An operation a happened-before operation b (a --> b) if and only if one of the following holds: (1) There exists a control/data dependency between operations a and b such that a needs to terminate before b can start executing. (2)? There exists an operation c such that a --> c and c --> b.

An operation, on failure, is retried with the same or different agents until it completes successfully (terminates). Note that each (retrial) attempt is considered as a new invocation and would be logged accordingly. Finally, to accommodate asynchronous communication, we assume the presence of Input/Output (I/O) queues. Basically, each agent has an I/O queue with respect to its parent and component agents - as shown in Fig. 2.

Given synchronized clocks and logging (as discussed above), a snapshot of the hierarchical composition at time t would consist of the logs of all the “relevant” agent until time t.

The relevant agents can be determined in a recursive manner (starting from the root agent) by considering the agents of the invoked operations recorded in the parent agent's log until time t. If message timestamps are used then we need to consider the skew while recording the logs, i.e., if a parent agent's log was recorded until time t then its component agents’ logs need to be recorded unitl (t + skew). The states of the I/O queues can be determined from the state transition model.

References

D. Biswas. Constraints Enabled Autonomous Agent Marketplace: Discovery and Matchmaking. In proc. of the 16th International Conference on Agents and Artificial Intelligence (ICAART), 2024.
D. Biswas. Gen AI Architecture Patterns, 2023 (Link to the full article on LinkedIn: https://lnkd.in/e2M6AS5S ).
D. Biswas. Privacy preserving Chatbot Conversations. 3rd IEEE International Conference on Artificial Intelligence and Knowledge Engineering (AIKE), 2020.
D. Biswas. Privacy Considerations of AI Agents, 2024 (Link to the full article on LinkedIn: https://www.dhirubhai.net/pulse/privacy-challenges-ai-agents-debmalya-biswas-meaqf/?trackingId=VwSnQolaRKSdWDvIwPVbDw%3D%3D )

Debmalya Biswas

AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA

5 个月

Detailed article published in DataDrivenInvestor https://medium.datadriveninvestor.com/llmops-monitoring-for-agent-ai-platforms-dee474b2877f

2 次回应

Humphrey Revius

Projectleider Algoritmen en AI bij het Ministerie van Economische Zaken

5 个月

Markus Janssen Albert Sikkema

3 次回应

查看更多评论

要查看或添加评论，请登录

Debmalya Biswas的更多文章

Agentic RAGs: consolidated querying of SQL & Document repositories

2024年10月29日

Agentic RAGs: consolidated querying of SQL & Document repositories

1. Introduction Given the way technology has evolved, we have become accustomed to thinking of certain solution…

11 条评论
Unifying Data & Gen AI / LLM platforms

2024年10月16日

Unifying Data & Gen AI / LLM platforms

AI / Gen AI challenges for a Data platform As a Data and AI/ML practitioner, I have always wondered as to why we have…

9 条评论
Conversational BI with Snowflake's Cortex Analyst

2024年10月3日

Conversational BI with Snowflake's Cortex Analyst

I have previously written about Conversational BI and the challenges in realizing them. With large language models…

9 条评论
Stateful and Responsible AI?Agents

2024年8月25日

Stateful and Responsible AI?Agents

Introduction to AI Agents The discussion around ChatGPT, has now evolved into AutoGPT. While ChatGPT is primarily a…

11 条评论
Conflicting Prompts, and the challenges in building Enterprise Prompt Stores

2024年8月17日

Conflicting Prompts, and the challenges in building Enterprise Prompt Stores

Introduction Prompts today are the primary mode of interaction with large language models (LLMs). Prompts need to be…

5 条评论
LLM Personalization: User Persona based Personalization of LLM generated Responses

2024年8月11日

LLM Personalization: User Persona based Personalization of LLM generated Responses

Introduction ChatGPT, or the underlying Large Language Models (LLMs) today, are able to generate contextualized…

5 条评论
Use-case based evaluation of LLMs

2024年7月21日

Use-case based evaluation of LLMs

Introduction We are at a critical juncture in the Generative AI adoption journey, where we are have started hearing…

6 条评论
Gen AI Privacy: Privacy Risks of LLMs

2024年7月6日

Gen AI Privacy: Privacy Risks of LLMs

Machine Learning (ML) Privacy Risks Let us first consider the Privacy attack scenarios in a traditional Supervised ML…

8 条评论
Responsible LLMOps: Integrating Responsible AI practices into LLMOps

2024年6月16日

Responsible LLMOps: Integrating Responsible AI practices into LLMOps

Abstract. While we see growing adoption of both LLMOps & Responsible AI practices in Gen AI implementations, the…

6 条评论
Delta Lake, Iceberg & Hudi: A Transactional Perspective

2024年6月9日

Delta Lake, Iceberg & Hudi: A Transactional Perspective

Abstract. Transactions with their ACID guarantees used to be the backbone of Database Management Systems.

2 条评论

See all articles

LLMOps-Monitoring for Agent AI Platforms

Debmalya Biswas

AI/Analytics @ Wipro | x- Nokia, SAP, Oracle | 50+ patents | PhD - INRIA

Introduction to AI?Agents

LLMOps?—?Monitoring?

领英推荐

Execution Status related?Queries

AI Agent Monitoring Infrastructure & Execution Lifecycle

References

Debmalya Biswas的更多文章

社区洞察

其他会员也浏览了

Unlocking the Power of Custom GPTs

AI Unplugged: The Friend Trap

Which GenAI platform should I be willing to 'go pro' and pay for?

Latest In Web3, AI & Emerging Tech

How to Save Costs on AI Development Without Losing Quality: Selecting Optimal AI Solutions for Your Business

Here come the AI chatbots

The AI Power Lineup: My Go-To Tools, Plus the Latest Game-Changing Updates

Chatbot AI – Smart AI Chatbot – Copy & Paste “1 Piece Of AI Code”

AI as a feature or AI as a product, who wins?

Low code LLM Agents with Pre-build RAG Pipeline - Introducing Lyzr

Introduction to AI?Agents

LLMOps?—?Monitoring?

领英推荐

Execution Status related?Queries

AI Agent Monitoring Infrastructure & Execution Lifecycle

References

Debmalya Biswas的更多文章

Agentic RAGs: consolidated querying of SQL & Document repositories

Unifying Data & Gen AI / LLM platforms

Conversational BI with Snowflake's Cortex Analyst

Stateful and Responsible AI?Agents

Conflicting Prompts, and the challenges in building Enterprise Prompt Stores

LLM Personalization: User Persona based Personalization of LLM generated Responses

Use-case based evaluation of LLMs

Gen AI Privacy: Privacy Risks of LLMs

Responsible LLMOps: Integrating Responsible AI practices into LLMOps

Delta Lake, Iceberg & Hudi: A Transactional Perspective

社区洞察

其他会员也浏览了

Unlocking the Power of Custom GPTs

AI Unplugged: The Friend Trap

Which GenAI platform should I be willing to 'go pro' and pay for?

Latest In Web3, AI & Emerging Tech

How to Save Costs on AI Development Without Losing Quality: Selecting Optimal AI Solutions for Your Business

Here come the AI chatbots

The AI Power Lineup: My Go-To Tools, Plus the Latest Game-Changing Updates

Chatbot AI – Smart AI Chatbot – Copy & Paste “1 Piece Of AI Code”

AI as a feature or AI as a product, who wins?

Low code LLM Agents with Pre-build RAG Pipeline - Introducing Lyzr