Product Patterns for Large Language Models: The Archetypes

Product Patterns for Large Language Models: The Archetypes

The advent of widely available Large Language Models (LLMs) has created a tremendous amount of excitement, and a decent amount of chaos.

Let's face it: LLMs are a solution looking for a problem, but given how powerful they are, we are all looking to discover the various problems they can solve in our specific product contexts. What then are the patterns that we should be on the look out for?

At Cisco, we have dozens of product and service groups that are working to harness the power of Large Language Models and Generative AI. We have a very strong learning community with a large number of AI enthusiasts, many of whom already have years of experience shipping AI capabilities using machine learning and data science. I am grateful to have the opportunity to connect with this community to learn what people are doing and how they are using Large Language Models in their products

From these discussions, I see some broad product patterns emerging, which I have captured in the form of the following archetypes:

  1. The Vizier, “How may I help?”
  2. The Judge, “Here’s what I think … “
  3. The General, “ Tell me what to do!”

I like to think of these archetypes as forming a royal court, a council that gathers to help you rule the realm (your product) and take on some of the difficult work of serving the people (your users). Each member of your council plays a different role. While they are all devoted to you, they think and operate in different ways. Let's dig in.

The Vizier

The first pattern I call the Vizier. They are trained on a body of knowledge that we want to give people access to. “How may I help?”, the Vizier asks, who is here to make sense of challenging problems, a Socratic partner of sorts. They will answer questions to explain complex material or help turn an intention into an appropriate course of action. The Vizier may be an advisor to you, or they may go directly to the people and help them with their problems. The Vizier is always ready and waiting to serve when called upon.

In this pattern, the user interacts directly with the LLM to get access to information available in the model. It is an inbound workflow, where users are deliberately and knowingly interacting with an AI. The pattern is usually implemented as either a conversational agent or co-pilot. The distinction between the two is important — a chatbot-style agent has an open, natural language interface, while the co-pilot is constrained to making suggestions that guide the user through a task or workflow.

This pattern is obvious and prevalent, typified of course by ChatGPT. Most companies will already be looking at possibilities to simplify their product experiences or enhance self-service workflows using The Vizier. For example, at Cisco, we have a vast body of training materials that customers need to understand their products. Simplifying access to that information is a real, tangible, and immediate benefit to our customers, and it is one of the first and earliest Gen AI capabilities we have launched. In fact, we now have a Vizier-style agent that interacts directly with users in our forums (we make sure to let users know they are interacting with a bot, of course).

The Judge

The next pattern is The Judge. Here we have an expert who can look at a situation and use their training to provide opinions, summaries, and judgments to our users. “Here’s what I think”, says the Judge, who is adept at boiling down a complex set of facts into something that is understandable and useful. The Judge can sometimes get things wrong though, so we may need to review their opinion before actually presenting it to the user, perhaps by a jury of their peers (i.e. other LLMs), or even our user’s peers (other humans). Sometimes, we do not agree with their decision, so they need the ability to appeal and ask for a new judgment.

In this pattern, we take a user’s situation and interpret it through an LLM. This is an outbound workflow, where the user generally does not get to interact directly with the Model — we are providing them with the output of the model to meet their need for a specific context. We do need to disclose to the user that the content is AI-generated and warn appropriately so they can inspect it and assess it accordingly. We may allow the user to regenerate the content in a different way if the results are poor, and we need to collect feedback so that we can continuously improve the quality of the output and use it in reinforcement loops.

This pattern shows up anywhere we want to help our users by generating the content they need to complete their task. The main activities are summarization, completion, and categorization. Examples include summarizing the recording of a meeting into a set of notes, boiling down a case into a summary, or analyzing a dataset for patterns. I would also put the coding use cases under this category, like Github Co-pilot, which I would actually not call a ‘co-pilot’ as it is really a completion pattern in the style of The Judge.

Sidebar: it is interesting to note that both the above patterns also call upon new design patterns in our products. Normally products are supposed to get things right — if we cannot get it right, why would anyone use it? But here, we are generating original content, and the user’s opinion of it is subjective, so we can give the user controls to generate content differently to meet their expectations. For example, Midjourney offers interesting methods to recreate your picture without changing the prompt by adding more or less variance, expanding or changing the perspective, and so on.

The General

The final pattern is the General. Here we have an agent who can help you execute your plans. “Tell me what to do!”, the General says. Give them orders, and they will carry them out to the best of their ability, returning dutifully with the results. The job must be done correctly, and there is a right way and a wrong way to do it. We have to make sure that the General is extremely well-trained on the tasks being given because the stakes are high and there is a lot on the line: coming back with the job done wrong means the plan will fail. We also need to have a robust system in place to ensure they complete their task correctly.

In this pattern, we are placing the LLM in a workflow and using it to accomplish some or all of the steps in that workflow. It is a backend workflow because the user does not know that the LLM is involved in what they are doing. The LLM is being used as a kind of engine that can take an input and produce a desired output needed for a sequence of operations. Andrej Karpathy has started calling this the "LLM OS", which is an interesting way to think about it. The goal is to improve a workflow by generating output that can be used in downstream operations.

In this pattern, we are incorporating operations like classification, categorization, labeling, and translation into a larger workflow. For example, in security products, we may want to categorize a document so that we can decide whether it is content that needs to be blocked. The user creates a policy to block malicious or unwanted content, and we will use the LLM to classify it as such. Clearly, if we start blocking content that should be allowed we will cause problems for end users, worse is if we allow content that should be blocked.

The General holds a huge amount of promise if we can get it to work. The opportunities are truly incredible. However, right now it is also the pattern with the most difficulty. There are a few problems:

  1. The probability problem: we need to remind ourselves that an LLM is a probability engine, not a logic engine, and there is a non-zero probability of generating the “wrong” answer. We are building workflows that need to have a guarantee of accuracy. What level of correctness are we okay with? What happens if the model is incorrect? In what ways can we check and monitor "correctness"?
  2. The explain-ability problem: it is difficult to ascertain why an LLM has arrived at a given decision. This is problematic because when we have a failure in reasoning, we don’t know why it happened or what we need to do to correct it. Is there a problem with the data? In the model? In the embeddings? In the prompt? When you change these things, how do you know if you are helping or hurting the model? Again, how will we test and monitor the correctness?
  3. The data problem: We’ve been collecting data for years, so there must be lot of it to feed the LLM, right? Wrong! A lot of the training data that we need to do these tasks just isn’t there. Most datasets do not contain all the contextual details that are needed for Generative AI: before LLMs came along, we would usually not store the whole documents and lengthy descriptions that an LLM needs to train itself, just the metadata.
  4. The drift problem: over time, our world may no longer match the world that was captured by the LLM. At some point the meaning of words will change, categories will expand or contract, and new concepts and ideas enter the milieu. At what point this happens is never certain, nor is it clear how we would introduce these changes into the LLM. This means that the training we give the General today will drift from reality over time.

While Generative AI has proven incredible at (surprise!) generating content, using LLMs in logical workflows is still an area of exploration. With the Vizier and the Judge, the output is presented for interpretation by the user, and acceptable answers fall within a range of responses. We can let users play with the output to get what they are looking for. With The General, the constraints on “correctness” are strict — right or wrong. It appears that LLMs will take some time to mature in this area.

LLM-Powered product development is coalescing into a set of standard implementations. These patterns identified here are a first cut at trying to understand the broad strokes for LLMs in our applications. What nuances do you see in these archetypes? What would you add or change? What other archetypes do you see emerging? Let me know what you think here on LinkedIn or on X @jrause.

Naomi Lurie

Product Marketing Leader

10 个月

Great description, John. I loved it! I’ll be thinking about those archetypes now

Naveen Venugopal

Operations Leader - Cloud Security Compliance and Infrastructure

10 个月

Quite interesting categorization, John. Good read too!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了