Reasoning with Large Language Models Instead of Asking Questions. How Agents will save the In-Context Prompting Problems.
Andrew Amann
Nuclear Sub Engineer ?? CEO of NineTwoThree Studio ??? Built 150+ AI, ML, Mobile & Web Apps ?? Top AI Agency in America ?? Launched 14 Startups??2 Acquisitions ?? 2 US Patents ?? 4 years, Inc 5000
Gathering accurate answers from a Large Language Model has proven to take skill and patience. While the technology companies accelerate the release of features for humans interfacing with machines - prompting has become a much-needed skill the humans are quickly required to learn. Levels of prompting are beginning to take shape as novices and experts try and use generative AI in their company. As time progresses, we are learning together how to guide the unpredictable LLM to provide more accurate and creative answers.
While it's fun to use ChatGPT and other AI chatbots built on an LLM - the harm of hallucinations is low. This phenomenon changes drastically when an enterprise or brand decides to incorporate the technology to be used by their employees or customers. The expectation is human-like responses with emotion, understanding, and - factual assistance about the company itself. Hallucinations are simply unacceptable.
Throughout this article, we will discuss the levels of prompting from simple to more complex with the intention of demonstrating how to control the LLM to provide intended results. The goal is to get LLMs to return 95%+ accurate results, have emotional understanding when someone is in distress - and never under any circumstance harm the brand or company.
Any agency developing products to integrate with LLM's will always have three options when deciding how to manage their workflow. Cost, Accuracy and Latency. While most of the topics discussed in this article are around prompting to improve accuracy - please note that these solutions could increase cost and latency as an after effect.
In Context Learning Prompting
Immediately following the release of ChatGPT 3.5 there was a change in perception that these LLMs can reason with the human and provide human like responses to any question imaginable. The responses were fun, artsy, cruel, and mean - but so are humans. Without knowing it, many people started performing "In Context Learning" to improve the results of the answer the LLM provide from a prompt.
Simple Prompt
In Context Learning
By providing the LLM with a few samples, the prompter is instructing the LLM to not just predict the next word - but rather provide an answer specifically in the order being requested. In this case, the word "last" was predicated to be the "next word" and therefore the answer of the simple prompt.
However, when In-Context learning was provided to the prompt - the machine understood its mission. It learned from the context and determined how to mimic the examples it provided. This In-Context learning is similar to how we instruct kids linguistical knowledge.
While In-Context Learning became popular amongst novices on ChatGPT - it was not useful for companies. Hallucinations would occur because the LLM was simply not smart enough to know about internal company language to elaborate to the prompter a correct response. It simply did not know the answer.
Plus, it is very tedious to constantly need a human to collect information to add to the prompt manually. Companies simply have too much internal data or knowledge to parse manually. A machine needs to know what is important for recall at the time of the prompt.
Introduction to RAG
We go in-depth on Retrieval Augmented Generation (RAG) in our case study. For a quick recap, think of a giant knowledge base of information about a company stored in a database that can be "retrieved" and inserted into a prompt.
For instance, which items on a restaurant menu contain dairy? Rather than typing each one into a google search, a prompt could be crafted to include the menu items and then have a conversation around the confidential information. Then the model can make inferences on the items even though the LLM has never seen these items before.
Limitations of RAG
This alone is a revolutionary process that has allowed companies to use LLMs based on their knowledge base. It has catapulted the LLM world into enterprise apps and forced new problems to be solved around privacy and data security. But there are also major limitations of RAG.
The model must answer in one prompt
All that is happening behind the scenes of a RAG is that an engineer has intercepted the question, found relevant information that could help the model respond, and injected that into the prompt. Re-prompting the original question in hopes of getting a better answer.
This provides a very costly solution for enterprise as the cost of the prompt dramatically increases as more information is added. As stated in the introduction of this article, in the hunt for accuracy, something must give. In this case, cost and context window size become the "give."
Imagine a company wanted to understand what total amount of all invoices was sent to the state of Texas for 2024. First, the model would have to understand what an invoice is. Next, it would have to define the difference between "sent" and "received" as the office of this company could also be in Texas. And lastly, the model would have to aggregate the total amount from each invoice.
But herein lies the crux of the problem. In the hunt for the total value of all invoices - which step should come first? If the model aggregated the total of all invoices first, then filtered down to only the ones in Texas - it COULD find the right answer - but at a massive cost!
Assigning a human to this task and watching how they might solve this answer would provide insight into the scientific papers from Google about "Chain of Thought Prompting" proposed first here by Google.
Chain of Thought Prompting to Rescue RAG
Starting with the basic prompting techniques, most of us throw questions at the bot like hot cakes hoping the incredible AI will return us a quick win so we can go back to being lazy and watch Friends reruns...
The answer is 4 hours - yet the confidence of ChatGPT is inspiring.
It is incredibly important to understand that once you get the model to hallucinate - everything thereafter the initial prompt will be an attempt to prove out the first answer. The model will double down if you will, to ensure that the first prompt remains intact.
If the human is not clever enough to catch the fallacy, then the proceeding prompts will continue to spiral like a third-grade spelling bee champion starting to spell Pneumonia with a capital N.
Let's ask ChatGPT to "show its work..."
Confidently Inadequate.
Chain of Thought Prompting
Chain-of-thought prompting is a technique that guides Large Language Models to follow the process of thought to derive the desired outcome. It is a prompting strategy that simply explains to the machine "how" a human would solve the problem and then allows the machine to infer the reasoning for another problem.
Using the same lawn mower example from above, let's provide a "Chain of thought" of how a human would solve the problem and ask the machine to replicate the process.
This is in-context prompting combined with chain of thought reasoning and as you can see the model "shows it's work". It is very easy to understand (better than me I might add) at explaining the math it used to derive the answer.
Asking follow up questions still yields the chain of thought from the context provided previously.
When building platforms that integrate with Large Language Models, a common term for Chain of Thought is "priming the model." At NineTwoThree we constantly work on improving our model priming with our prompts for our clients and each client prompt differs vastly depending on the use case. However, over time, we should be able to collect reasoning techniques that are domain agnostic. For example, determining someone's age is forever going to be a LLM limitation. Why? Because the foundation model will be released before the prompt occurred in time. So, if it's June 4th, 2023 and the foundation model was updated in 2022 - asking for the age of a person might be off by one year using the subtraction method. A simple solution would be to prime the model to first look up the birth year of the individual, then ask an agent to determine today's date, and calculate the difference. But we are getting ahead of ourselves, entering stage left, agents....
Web Agents and Action Agents
We have defined RAG as adding knowledge to the prompt and Chain of Thought as reasoning with the machine to enforce desired outcomes, but we are still plagued by the cost of running these prompts at scale for enterprise size problems. We can't keep stuffing information into the prompt to solve all our problems. Worse yet, some information is not available in the LLM foundational model OR the company's knowledge base.
Introducing Action Agents.
Language models are limited to textual responses based on knowledge it already has. Asking the model for the "weather in Pheonix" or "the stock price of Tesla" will yield empty results. Large Language models are also hilariously bad with cause-and-effect type scenarios. All these limitations cannot be solved through the prompt like our previous solutions attempted.
SayCan Approach to Reasoning
One approach is called "SayCan" which breaks the problem down into two probabilities. The probability of whether the answer to the question makes sense "Say" and the probability that the question can be executed as the feasible next step "Can". It would help to use the example provided in the famous paper by Google 2204.01691.pdf ( arxiv.org )
In the main example of the paper, the question to the LLM was "How would you put an apple on the table?" The machine has no idea where the apple currently is, what tools it has at its disposal, or even how far the table is from the apple. It also doesn't know the other objects in the room or which objects are linked together. Is the apple on the counter or the floor?
The SayCan architecture explains that each "action" to be paired with a probability. Understanding that the room has certain objects and items - an affordance can be applied to each item dynamically based on the prompt. If the user asks for a coke, the coke will score high. As you can see in the sample of the apple and the table "pick up the coke" scores a -30 for the question about the apple and the table because it is not referenced by the LLM.
The second part of the value function is "what is possible" in the current real-world scenario. Where are we in the process of putting the apple on the table? Currently, the apple has not yet been located. The probability of being able to place the apple or even pick up the apple is not even possible - because we have not yet "found" the apple. Therefore, the probability of finding the apple or walking to the counter scores high in the question about "How would you put an apple on the table?"
Deducting this story about the apple further you can imagine that using the affordances of the steps provided to instruct the machine to "step through" the process of which instruction appears in which order. First, find the apple, then find the table, then go to the counter, then pick up the apple, then go to the table, and place the apple. Each time the probability of the skill affordances changes per step based on previous completed steps - while the original prompt question remains intact.
Web Agents approach for Reasoning
The SayCan approach is great when everything is known. Using a RAG or AI vision determines what the Can response probabilistically can do. But what if the prompter and the machine BOTH don't know the answer? How can you reason with unavailable information?
A simple thought experiment would be to ask the LLM to determine the largest city in the United States that has is NOT a state capital. There are two obvious unknowns in this prompt. The first is a list of all the cities in the United States along with their population size. The second is a list of capitals. Lastly, the machine would have to remove any city that is not a capital and provide the answer. But how does the machine obtain that information?
WebGPT.
WebGPT is an agent that is trained on the internet by humans. The training is simple. Based on a prompt like "What is the largest city in the United States" - where does the human go on the internet to find that answer?
Step one, search google.
Step two, skip the ads.
Step three, click on Wikipedia.
Step four, scroll to the table that lists the cities.
Step five, highlight the city names in the table.
Step six, return answer...
By simply mimicking humans the machine can learn the patterns needed to effectively use the internet as a tool. After only about 6,000 iterations of random questions and answers - WebGPT can handle a remarkably high percentage of questions by retrieving the answer online.
It's a great agent to have in the tool belt as it can "hunt" for information. Let's start putting the pieces back together.
Combining WebGPT, RAG, SayCan, and Other Strategies to ReAct Agents. Reasoning and Acting
Through this article we have discussed prompting, chain of thought, SayCan, and RAG which brings us to ReAct. As the name states, it is a combination of reasoning and acting to provide more accurate results and empower agents to complete more complex tasks.
The best way to explain react is to imagine trying to teach a third grader multiplication. There is no "one way" to complete a task - rather depending on the memorization of the child (RAG) and the process they have been taught (Chain of Thought) and the tool they have at their disposal (Agents) the student can "learn" how to multiply numbers.
In specific order, the goal is to empower agents to take action to find information, then use in-context learning to figure out which Chain of Thought Reasoning the model should deploy, and finally observe the response to check the probability of accuracy based on the current step.
The basis of ReAct is that the system will loop through Thought, Action and Observation until it comes up with the answer.
Going back to our apple on the table, we would start the prompt by stating
You are in the middle of a room. Looking around you see an apple on a counter, 5 chairs around a table with a Coca-Cola can, and cabinets.
From here, the ReAct methodology would step through the process by using the affordances assigned from the SayCan paper.
The process would continue.
And on and on until the last observation is "To put the apple on the table I need to grab the apple on the counter, walk 2 meters to the table, and place the apple down on the table."
How To Implement Agents in an AI Workflow?
The basis of ReAct is that the system will loop through Thought, Action and Observation until it produces the answer.
Going back to our apple on the table, we would start the prompt by stating
You are in the middle of a room. Looking around you see an apple on a counter, 5 chairs around a table with a Coca-Cola can, and cabinets.
From here, the ReAct methodology would step through the process by using the affordances assigned from the SayCan paper.
The process would continue.
And on and on until the last observation is "To put the apple on the table I need to grab the apple on the counter, walk 2 meters to the table, and place the apple down on the table."
How To Implement Agents in an AI Workflow?
Agents can be thought of as having access to tools. Tools can be WebGPT, writing Python code, searching SQL databases, annnnnd prompting an LLM. Notice that the latter is a piece of the puzzle - not the answer. Are there questions that never need an LLM to solve? Of course!
Providing an agent as the first line of defense to the question is a powerful iteration of Ai models. It improves many of the problems of hallucinations, cost, and latency that in context prompting contains. Each and every method of prompting above is an improvement to a previous methodology of prompting - but at the end of the day - it's still prompting.
If a question was asked, "what is the weather in Pheonix?", no tokens need to be used in using the weather.com API to retrieve "78 degrees and Sunny." Using an LLM is overkill.
In fact, it's overkill for many situations. "How many invoices do I have in May?" There is no need to search the vector database. The SQL table that was used to build the vector database can retrieve this answer MUCH faster and with more accuracy. Legacy engineering enforces QA on the response too - ensuring accuracy.
An Example to Bring it All Together and Explain How to Understand the Future of LLMs by Using Agents
How many cities in America have a mayor with a French ancestry?
Well, this question might involve a bit of human research to resolve. And legacy engineering does not have the ability to get this answer without requesting the user to provide more information. But with the LLM as one of the tools, the LangChain agent can orchestrate a response with the tools at its disposal.
Tool 1: WebGPT "Go find me all the cities in the united states."
Tool 2: TimeCheck: return now().
Tool 3: WebGPT "List out all the active mayors in the United States."
Tool 4: SQL coder: Create a table. Column A has a list of cities. Column B has a list of active mayors in the United States.
Tool 5: Calculator: VLookup the mayors with the cities. Return only the cities that have mayors.
Tool 6: RAG: Here is a list of mayors in the United States. I am going to ask you which mayors are French. You can determine which ones are French by reviewing their name. here is how you would know the name is french:
Sound: French last names tend to have a distinct sound when pronounced. While there are exceptions, many French surnames have a melodious flow and may end in vowels or certain consonants, like "t," "x," "n," or "s."
Historical Origin: Understanding the historical origins of a last name can provide clues. Many French surnames have roots in occupations (e.g., Leblanc for "the white," referring to a fair-haired person) or geographical features (e.g., Dupont for "of the bridge").
Genealogical Records: Genealogical databases or records can be helpful in tracing the origin of a last name. French genealogical records may provide insights into the surname's history, including its geographical distribution and any notable individuals bearing the name.
Cultural Context: Considering the cultural context can also aid in identifying a French last name. If the surname is associated with French-speaking regions or communities, it's more likely to be French.
Language and Etymology: Analyzing the linguistic roots and etymology of the surname can reveal its French origins. Many French surnames have evolved from Old French, Latin, or other Romance languages.
Only Tool 6 used an LLM and the LLM was used to reason with something that is indeterministic. Typically, humans understand the origins of French Last names. Therefore, the LLM can be utilized here.
In Conclusion
In conclusion, the evolution of Large Language Models (LLMs) and their integration into business workflows presents both challenges and opportunities. As we navigate the complexities of prompting techniques, from simple queries to advanced strategies like Retrieval Augmented Generation (RAG) and Chain of Thought Prompting, we must balance accuracy, cost, and latency. The development of agents, such as WebGPT and SayCan, further enhances our ability to harness the power of LLMs, enabling them to reason and act upon a vast array of information. By combining these approaches into a cohesive ReAct strategy, we can guide LLMs to deliver high-quality, contextually relevant responses that protect brand integrity and provide valuable insights. As we continue to refine these methods, the potential for LLMs to revolutionize enterprise applications and customer interactions is immense, promising a future where human-like AI interactions are not only possible but also practical and reliable.
RESOURCES
Multi-agent LLMs offer promising solutions for in-context learning. Fun example helps identify machine errors. Excited to see future developments!
Social Media Strategist | Fashion Designer
6 个月Exploring the future of LLMs with in-context learning and multi-agent systems for accurate results. Exciting possibilities ahead!