Generative AI Architectural Patterns

Generative AI Architectural Patterns

A short primer on the 5 most prevalent Generative AI architectural patterns today:

  1. Black-box LLM APIs
  2. Enterprise Apps in LLM App Store
  3. LLMOps?—?LLM fine-tuning to domain specific?SLMs
  4. Retrieval Augmented Generation (RAG)
  5. AI Agents: Multi-agent LLM Orchestration

Fig: Generative AI Architecture Patterns

1. Black-box LLM?APIs

This is your classic ChatGPT [1] example, where we have black-box access to a LLM API/UI. Similar LLM APIs can be considered for other Natural Language Processing (NLP) core tasks, e.g., Knowledge Retrieval, Summarization, Auto-Correct, Translation, Natural Language Generation (NLG).

Prompts are the primary interaction mechanism here and we are all still trying to perfect our Prompt Engineering skills -:)

Prompts refers to adapting the user input, providing the right context and guidance to the LLM API?—?to maximize the chances of getting the ‘right’ response. It has led to the rise of Prompt Engineering as a professional discipline, where prompt engineers systematically perform trials, recording their findings, to arrive at the ‘right’ prompt to elicit the ‘best’ response.

2. Enterprise Apps in LLM App?Store

OpenAI's recent announcement to launch a GPT App Store is interesting (link ). It is to be expected that other major players, e.g., Google, AWS, Hugging Face, will follow suit). The motive is clear - to become the preferred platform for Generative AI (GenAI) / Large Language Model (LLM) adoption. However, there is also a risk that enterprise apps published on the platform will overshadow the underlying platform.

It remains to be seen if the GenAI Apps Store will turn out be as much of a game changer as the Apple App Store was for iPhone / Mobile devices — interesting times ahead!

While Enterprise GPT Apps have the potential to be a multi-billion dollar marketplace and accelerate LLM adoption by providing an enterprise ready solution; the same caution needs to be exercised as you would do before using a 3rd party ML model?—?validate LLM/training data ownership, IP, liability clauses [2].

Data ownership: Data is critical for Supervised AI/ML systems, esp. so for LLMs which are often trained on public datasets, whose data usage rights for AI/ML training are not well defined and can evolve in future. For example, Reddit recently announced (link ) that it will start charging for Enterprise AI/ML models learning from its extremely human archives.

Given this, negotiation of ownership issues around not only training data, but input data, output data, and other generated data is critical.

On the other hand, it is also important to understand / assess how the Enterprise App Provider will be using the data received / generated as a result of its interactions with the users.

3. LLMOps?—?LLM fine-tuning to domain specific?SLMs

LLMs are generic in nature, as they are trained on public datasets, e.g., Wikipedia. To realize the full potential of LLMs for Enterprises, they need to be contextualized with enterprise knowledge captured in terms of documents, wikis, business processes, etc.

This is achieved by fine-tuning a LLM with enterprise knowledge / embeddings to develop a context-specific LLM [3].
Fig: Enterprise LLM contextualization strategy

Fine-tuning entails taking a pre-trained Large Language Model (LLM), and retraining it with (smaller) enterprise data. Technically, this implies updating the weights of the last layer(s) of the trained neural network to reflect the enterprise data and task.

Given this, access to the base model weights is needed to perform fine-tuning, which is not possible for closed models, e.g., ChatGPT.

This is where open-source pre-trained LLMs come to the rescue, e.g., Meta AI, who recently open-sourced their LLM - LLaMA . The Stanford Alpaca project showed that it is possible to fine-tune LLaMA for $600 - to a model performance comparable with ChatGPT.

So fine-tuning a LLM does not necessarily need to be very complex or expensive.

Given that the enterprise is responsible for the ML (fine-tuning) pipeline in this case, LLMOps (MLOps [4] for LLMs) is needed to deliver this in a scalable fashion.

LLMOps can be considered as more complex than usual MLOps pipelines, esp. to enable the continuous improvement feedback loop?—?Reinforcement Learning from Human Feedback (RLHF) [5].

LMFlow (link ) is a good example of an emerging MLOps framework for LLMs.

4. Retrieval Augmented Generation (RAG)

Fine-tuning is a computationally intensive process. RAG provides a viable alternative by providing additional context with the prompt, grounding the retrieval / responses to the given context.

The prompts can be relatively long, so it is possible to embed enterprise context within the prompt. For example, referring to the below solution architecture on Azure, the Cognitive Search Results are provided as additional context with the prompt?—?to limit the responses.

Fig: Integrating Azure Cognitive Search Results to contextualize Prompts (Source: [6])

Below is how the same RAG reference architecture looks like on Databricks:

Fig: Reference architecture to implement RAGs on Databricks (Source: [7])

5. AI Agents: Multi-agent LLM Orchestration

This is the future where enterprises will be able to develop new Enterprise AI Apps by orchestrating / composing multiple existing AI Apps.

The discussion around ChatGPT, has now evolved into AutoGPT. While ChatGPT is primarily a Chatbot that can generate text responses, AutoGPT is a more powerful AI Agent that can execute complex tasks, e.g., make a sale, plan a trip, make a flight booking, book a contractor to do a house job, order a pizza. LangChain (link ) is probably the most mature framework today to compose LLMs.

However, designing and deploying AI Agents remains challenging in practice. Below are some initial thoughts around the essential components / frameworks needed to materialize such an AI Agent Platform:

Fig: AI Agent Reference Architecture

Given a user task, the goal of an AI Agent Platform is to identify (compose) an agent (group of agents) capable to executing the given task.

  • Orchestration Layer (Task decomposition into an Orchestration Engine executed by the Orchestration Engine)

AI Agents follow a long history of research around Autonomous Agents, especially, Goal oriented Agents. A high-level approach to solving such complex tasks involves: (a) decomposition of the given complex task into (a hierarchy or workflow of) simple tasks, followed by (b) composition of agents able to execute the simple(r) tasks. This can be achieved in a?dynamic?or?static?manner.

In the dynamic approach, given a complex user task, the system comes up with a plan to fulfill the request depending on the capabilities of available agents at run-time. In the static approach, given a set of agents, composite agents are defined manually at design-time combining their capabilities.

  • Agent Marketplace: This implies that there exists a marketplace / registry of agents - with a well-defined description of the agent capabilities and constraints. We have studied the discovery aspect of Agents in detail in [8].
  • Integration layer supporting different Agent Interaction Patterns, such as, Agent-to-Agent API, Agent API providing Output for Human consumption, Human triggering an AI Agent, AI Agent-to-Agent with Human in the Loop. The integration patterns need to be supported by the underlying LLMOps? platform.
  • Shared memory layer enabling data transfer between Agents, storing interaction data such that it can be used to personalize future interactions.
  • Privacy & Security: Ensure that data shared by the user specific to this task, or user profile data that cuts across tasks; is only shared with the relevant Agents (authentication & access control).

References

  1. D. Biswas. ChatGPT internals, and its implications for Enterprise AI. (link )
  2. D. Biswas. Generative AI Design Patterns. (link )
  3. D. Biswas. Contextualizing Large Language Models (LLMs) with Enterprise Data. (link )
  4. D. Biswas. MLOps for Compositional AI. NeurIPS 2022 Workshop on Challenges in Deploying and Monitoring Machine Learning Systems (DMML). (link )
  5. E. Ricciardelli, D. Biswas. Self-improving Chatbots based on Reinforcement Learning. in: 4th Multidisciplinary Conference on Reinforcement Learning and Decision Making, 2019. (link )
  6. P. Castro. Revolutionize your Enterprise Data with ChatGPT: Next-gen Apps w/ Azure OpenAI and Cognitive Search. Azure AI Blog (link )
  7. Databricks blog. What is Retrieval Augmented Generation or RAG? (link )
  8. D. Biswas. Constraints Enabled Autonomous Agent Marketplace: Discovery and Matchmaking. In proc. of the 16th International Conference on Agents and Artificial Intelligence (ICAART), 2024, 124617.pdf ( scitepress.org )

Brian M. Green

Chief AI Ethics Officer | Founder of Health-Vision.AI | Expert in AI Governance, Strategy, and Responsible AI Practices

3 个月

This is a very good summary article. I really like the inclusion of the various figures. Is the Figure "AI Agent Reference Architecture" one you created?

Chuong Ho

Researcher | Technical Writing | Autodesk Expert Elite

8 个月

That awesome, thank you for sharing !

Mudassir Ahmed Mohammad

DevOps Solution Architect - Azure | Python | Kubernetes | LLMOps | GenAI

8 个月

Great work

Krishnan Sankarasubramanian

Principal Consultant @ Wipro Digital | Strategy Consulting, Corporate Strategy | Pre-Sales and Growth | Transformation delivery

9 个月

Excellent primer Debmalya Biswas very comprehensive

Steven Forth

CEO Ibbaka Performance - Leader LinkedIn Design Thinking Group - Generative Pricing

9 个月

Would you consider GANs (Generative Adversarial Networks) to be a pattern that should be added to these?

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了