Patterns with GenerativeAI
https://unsplash.com/@santesson89

Patterns with GenerativeAI

Introduction

One of the most exciting things we can do as digital explorers is to find new and emerging patterns as technology advances. In this short article I’d like to share a few patterns I’ve discovered along my journey.

Before exercising these patterns, it’s important to note they do not require any special knowledge of the inner workings of Generative AI, but you should understand the fundamentals of the following:

  1. Agent - Is software sitting in front of an LLM that allows human interaction. Since LLMs are deep and wide, agents are constrained by instructions and augmented by tools (see below)
  2. Instructions - Are used to constrain the scope of the agent to reduce the risk and increase the performance of the users interaction with the Agent.
  3. Tools - Are external functions that provide additional functionality to LLM. Tools are often used for Retrial-Augmented Generation (RAG) to provide access to specialized and/or more current information. NOTE: remember, all LLM models are a snapshot in time.
  4. Agent-of-Agents - Is in itself a pattern, but important to note this is synonymous with a router entry point that hands specialized workloads off to the most appropriate handler.
  5. Prompt - Is the textual entry point of an agent and one most are familiar with.
  6. Multimodal - Is the combination of voice, video, text, and images.
  7. Vectorizing? - Is the practice of turning any of the multimodal representations into numeric representations understood by the LLM.
  8. Tuning parameters - these are used to help shape how an Agent will respond.
  9. Safety settings - Are the parameters used to tune what an Agent may block or allow depending on the audience.

Now that that’s out of the way, let’s focus on the patterns:

Executor Patterns

The very first pattern in making Agent architectures business useful is to accept that one is not enough. This is also true in the way we (as humans) think. Our capability to switch context from critical to creative is important, and Agents are created with default constraints. This is especially true in programming when we apply singletons to the initialization of the Agent. As such you can accept that having multiple agents is inevitable, especially as you introduce tools.

Example: As a business I would like to have AI write an FAQ on a product and I want my FAQ to be fun and readable by people. So I write a list of questions and want to get the factual answers but in a friendly tone. In this scenario I will use a chain of agents and execute each agent in sequence per question.

Agent 1 is used to extract the contextual and factual information from the question and is tuned as (temp: 0.0, top-p: .92, top-k: 32) meaning it’s very factual and will stick to the domain of the question.

Agent 2 is used to then take those facts and present them in a user friendly way and tuned as (temp 0.9, top-p: .92, top-k: 40)?

And as pseudo code it looks like:?

for question in questions:
   facts = agent1.generate_content(
        ‘Find all of the product facts from following question: {question} for the
         product: {product} in this format: {format}’)         
# read the response into a list
list_of_facts = … 
# format the response
text_formatted_facts = formatFacts(list_of_facts)        
answer = agent2.generate_content(‘Write a user friendly response for this question: {question} using these product_facts: {text_formatted_facts}’)

# Assemble the question and answer and save to a database        

Pros and Cons

This approach is extremely useful as it provides a narrow workstream for the agent and will produce high quality results within the limitations of the Agent (meaning shorter answers won’t be truncated by the 8K limited response.) The downside of this approach is each iteration uses two calls which effectively cuts your calls per minute in half. Eg if your quota is 100 calls per minute, you will only be able to process 50 questions per minute.

This example demonstrates three patterns:

Chain of agents - similar to the chain of responsibility pattern, a chain of agents allows each agent to operate on a context and provide the right answer for that context.

The detective agent - This is a fact finder and is a crucial step in making RAG models stackable. You may have several detective agents in your chain to assemble a meaningful context. Those agents may interact with tools and each other.

The creative agent - Is responsible for making the output usable by people or other systems. The creative agent is crucial in interactive systems and can be used to augment information systems. (Explained below).

Expanding on the Chain of Agents

The following patterns expand on the chain of agents as each agent specializes.

The Builder Agent - The builder pattern is used to create programmatic assets and unlike the detective which is the fact finder, the builder is responsible for creating usable data structures. For example, If I wanted a list of information I could use in a web application, it is best served in JSON. If I want to create a dataset that I can later reference, I would extract JSON, convert it to SQL, and update my record set. The builder is unique in that it is capable of running in batch or as a stream and can augment data. The builder is used in tandem with a vectorizing engine to ensure fast and meaningful retrieval of data for future queries.

The Critic Agent - The critic is a useful pattern for fact checking output and often requires a grounding tool. Consider Google’s “Ground with Google Search” feature of Gemini. You can run text queries configured with this tool and it will help create content and citations. However, where you apply that agent can change your output quality. Instead of running this agent up front, you can run it as the last agent in your chain to validate, clean, and score your output quality. The critic pattern is used consistently to challenge the possible hallucinations of the model.

Storage Patterns

This topic has everything to do with versioning and time and ensuring you as a developer or architect are future proofing your systems.

Assumptions: You are using an LLM to create content or you’re augmenting existing content with embeddings to use KNN (cosine (desc) or euclidean (asc)) in a vector database like BigQuery or Postgres with PGVector.

Rules

  1. Never store your vectors in the same table, they will have a one to many mapping.
  2. Always store the model used per embedding, and always store the embedding length for client hinting.
  3. Understand the embedding life-cycle

LLM life-cycle: Every time a new model is released, you will most likely create new embeddings and will certainly want to experiment with multiple models and dimensional spaces.

Every time you query in vector space, your query MUST be vectorized using the supported model and will change over time.

Example:

Product Table

  • Id: uuid
  • title: string
  • description: string
  • attributes: { key_value_pairs }

Embedding versions:

  • Id: uuid
  • created: date
  • deprecated: date
  • terminated: date
  • preferred: bool
  • model_provider: string
  • model_name: string
  • embedding_length: int

Product embedding table

  • product_id: uuid
  • embedding_version_id: uuid
  • embeddings: float[]

Now, a client may request how to create embeddings and how to search for those over time, and allows data scientists to experiment while developers and production systems can simply run.

Conclusion

I sincerely hope these learnings may help others on the journey into LLMs and shortcut the development curve for making high quality generative agents. More importantly I hope this can improve how we think about productizing generative AI and provides a vernacular we can use to discuss these patterns.

This is cool. I had not thought of keeping my embeddings in a separate table and then joining, but I can see how this would be useful for change management and experimentation. The more I think about it, the more I'm becoming a fan of vector dbs where I can still use SQL. How does this affect use cases where i need to filter on some product attribute but it should be pre-filtering before the vector comparison for performance? What would happen with the indexes under the hood?

回复
Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

7 个月

The interplay between generative AI patterns and efficient embedding storage is crucial for scaling these models. Techniques like quantization and pruning can significantly reduce the memory footprint of embeddings without substantial loss in accuracy. Consider exploring advanced indexing structures like FAISS or Annoy to accelerate similarity search operations on large embedding datasets. You talked about in your post. What are your thoughts on using a combination of k-means clustering and dimensionality reduction techniques to optimize the storage and retrieval of embeddings for a specific domain, such as medical image analysis? Imagine you're working with a dataset of thousands of microscopic images, each represented by a high-dimensional embedding vector. How would you technically leverage these techniques to identify similar images based on subtle morphological features, potentially aiding in the diagnosis of rare diseases?

回复

要查看或添加评论,请登录

Ryan M.的更多文章

  • Agent of Agents

    Agent of Agents

    I've written about the importance of various AI architectures of the near future, today I want to focus on agent of…

  • Digital Commerce - Catalog Enrichment

    Digital Commerce - Catalog Enrichment

    In the incredible chaos of 40,000+ visitors from around the world at #NRF2025, if you missed our demo on digital…

  • Building the AI of tomorrow

    Building the AI of tomorrow

    SETI @ Home God's Debris - Scott Adams "The Network is the Computer" - John Gage for Sun Microsystems in 1984 Numerous…

  • Make it a Linux 2025

    Make it a Linux 2025

    As we move faster into the world of Generative AI integration with our operating systems, and the total power drain and…

    4 条评论
  • Distraction Free Tech

    Distraction Free Tech

    Today's distraction free tech recommendations Sticking with the theme posted yesterday about removing distractions…

  • Transitions

    Transitions

    Once upon a time in the land of traditional compute, I would have been excited about the latest announcement coming…

    1 条评论
  • The missing teammate - GenerativeAI

    The missing teammate - GenerativeAI

    I don't know about you, but I am tired of seeing all of these silly posts from CEOs and laymen saying that AI will be…

  • Generative AI made Easy with Google Cloud

    Generative AI made Easy with Google Cloud

    I'd recently read a post stating that "using Google AI is difficult" at least more difficult than the competition. I…

  • Digital Commerce Catalog and Generative AI

    Digital Commerce Catalog and Generative AI

    At Google, my business partners Logan Vadivelu, Paul Tepfenhart and I are constantly working with our retail partners…

    3 条评论
  • Generative AI Images & KSampler - Deep Dive

    Generative AI Images & KSampler - Deep Dive

    I've recently been exploring the depths of image creation in my 'spare time' and found most of the information online…

    6 条评论

社区洞察

其他会员也浏览了