Patterns with GenerativeAI
Introduction
One of the most exciting things we can do as digital explorers is to find new and emerging patterns as technology advances. In this short article I’d like to share a few patterns I’ve discovered along my journey.
Before exercising these patterns, it’s important to note they do not require any special knowledge of the inner workings of Generative AI, but you should understand the fundamentals of the following:
Now that that’s out of the way, let’s focus on the patterns:
Executor Patterns
The very first pattern in making Agent architectures business useful is to accept that one is not enough. This is also true in the way we (as humans) think. Our capability to switch context from critical to creative is important, and Agents are created with default constraints. This is especially true in programming when we apply singletons to the initialization of the Agent. As such you can accept that having multiple agents is inevitable, especially as you introduce tools.
Example: As a business I would like to have AI write an FAQ on a product and I want my FAQ to be fun and readable by people. So I write a list of questions and want to get the factual answers but in a friendly tone. In this scenario I will use a chain of agents and execute each agent in sequence per question.
Agent 1 is used to extract the contextual and factual information from the question and is tuned as (temp: 0.0, top-p: .92, top-k: 32) meaning it’s very factual and will stick to the domain of the question.
Agent 2 is used to then take those facts and present them in a user friendly way and tuned as (temp 0.9, top-p: .92, top-k: 40)?
And as pseudo code it looks like:?
for question in questions:
facts = agent1.generate_content(
‘Find all of the product facts from following question: {question} for the
product: {product} in this format: {format}’)
# read the response into a list
list_of_facts = …
# format the response
text_formatted_facts = formatFacts(list_of_facts)
answer = agent2.generate_content(‘Write a user friendly response for this question: {question} using these product_facts: {text_formatted_facts}’)
# Assemble the question and answer and save to a database
Pros and Cons
This approach is extremely useful as it provides a narrow workstream for the agent and will produce high quality results within the limitations of the Agent (meaning shorter answers won’t be truncated by the 8K limited response.) The downside of this approach is each iteration uses two calls which effectively cuts your calls per minute in half. Eg if your quota is 100 calls per minute, you will only be able to process 50 questions per minute.
This example demonstrates three patterns:
Chain of agents - similar to the chain of responsibility pattern, a chain of agents allows each agent to operate on a context and provide the right answer for that context.
The detective agent - This is a fact finder and is a crucial step in making RAG models stackable. You may have several detective agents in your chain to assemble a meaningful context. Those agents may interact with tools and each other.
The creative agent - Is responsible for making the output usable by people or other systems. The creative agent is crucial in interactive systems and can be used to augment information systems. (Explained below).
领英推荐
Expanding on the Chain of Agents
The following patterns expand on the chain of agents as each agent specializes.
The Builder Agent - The builder pattern is used to create programmatic assets and unlike the detective which is the fact finder, the builder is responsible for creating usable data structures. For example, If I wanted a list of information I could use in a web application, it is best served in JSON. If I want to create a dataset that I can later reference, I would extract JSON, convert it to SQL, and update my record set. The builder is unique in that it is capable of running in batch or as a stream and can augment data. The builder is used in tandem with a vectorizing engine to ensure fast and meaningful retrieval of data for future queries.
The Critic Agent - The critic is a useful pattern for fact checking output and often requires a grounding tool. Consider Google’s “Ground with Google Search” feature of Gemini. You can run text queries configured with this tool and it will help create content and citations. However, where you apply that agent can change your output quality. Instead of running this agent up front, you can run it as the last agent in your chain to validate, clean, and score your output quality. The critic pattern is used consistently to challenge the possible hallucinations of the model.
Storage Patterns
This topic has everything to do with versioning and time and ensuring you as a developer or architect are future proofing your systems.
Assumptions: You are using an LLM to create content or you’re augmenting existing content with embeddings to use KNN (cosine (desc) or euclidean (asc)) in a vector database like BigQuery or Postgres with PGVector.
Rules
LLM life-cycle: Every time a new model is released, you will most likely create new embeddings and will certainly want to experiment with multiple models and dimensional spaces.
Every time you query in vector space, your query MUST be vectorized using the supported model and will change over time.
Example:
Product Table
Embedding versions:
Product embedding table
Now, a client may request how to create embeddings and how to search for those over time, and allows data scientists to experiment while developers and production systems can simply run.
Conclusion
I sincerely hope these learnings may help others on the journey into LLMs and shortcut the development curve for making high quality generative agents. More importantly I hope this can improve how we think about productizing generative AI and provides a vernacular we can use to discuss these patterns.
Engineer
7 个月This is cool. I had not thought of keeping my embeddings in a separate table and then joining, but I can see how this would be useful for change management and experimentation. The more I think about it, the more I'm becoming a fan of vector dbs where I can still use SQL. How does this affect use cases where i need to filter on some product attribute but it should be pre-filtering before the vector comparison for performance? What would happen with the indexes under the hood?
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
7 个月The interplay between generative AI patterns and efficient embedding storage is crucial for scaling these models. Techniques like quantization and pruning can significantly reduce the memory footprint of embeddings without substantial loss in accuracy. Consider exploring advanced indexing structures like FAISS or Annoy to accelerate similarity search operations on large embedding datasets. You talked about in your post. What are your thoughts on using a combination of k-means clustering and dimensionality reduction techniques to optimize the storage and retrieval of embeddings for a specific domain, such as medical image analysis? Imagine you're working with a dataset of thousands of microscopic images, each represented by a high-dimensional embedding vector. How would you technically leverage these techniques to identify similar images based on subtle morphological features, potentially aiding in the diagnosis of rare diseases?