Navigating AI Complexity with a custom ChainOfThought Class
Navigating the maze of GPT and LLM applications, this article brings a cost-effective approach to tackle complex problems with higher accuracy. Drawing inspiration from 'Chain of Thought Prompting' and harnessing the power of the popular NLP app framework Langchain, we venture beyond the known. The spotlight shines on the extended Langchain LLMChain class—our unique ChainOfThought class—promising more accurate problem-solving and streamlined debugging. Stay tuned as we first unpack the concept of 'Chain of Thought Prompting', delve into the intricacies of Langchain, and explore the utility of Langchain Chains – crucial stepping stones to understanding and utilizing the unique ChainOfThought class.
Chain Of Thought Prompting
In January 2022, the Google Brain team released a groundbreaking research paper on "Chain of Thought Prompting" (Link to the paper). The technique introduced in this paper is a novel approach to enhance the reasoning capabilities of large language models (LLMs), especially in multi-step reasoning tasks.
In contrast to the standard prompting, where models are asked to directly produce the final answer, 'Chain of Thought Prompting' encourages LLMs to generate intermediate reasoning steps before providing the final answer to a problem. The advantage of this technique lies in its ability to break down complex problems into manageable, intermediate steps. By doing this, the model-generated 'chain of thought' can mimic an intuitive human thought process when working through multi-step problems.
An example of one shot prompting:
Or zero shot - chain of thought prompting:
The 'Chain of Thought' technique has emerged as a significant breakthrough. It possesses the potential to enhance the accuracy of numerous prompts considerably. Nevertheless, it presents us developers with a fresh challenge. The inclination might be to leverage the complete output of the model's prediction within our applications. However, it wouldn't be prudent, as the applications are essentially concerned with the final result, not the steps the LLM model undertook to deduce that final conclusion.
# Incldes reasoning alongside the final result!
result = llm.predict(prompt)
Can we solve this in a neat way? What if we wish to create a whole pipeline of complex chain of thought problems, passing each step's result to the next?
Langchain
LangChain (link), a library initiated by Harrison Chase in late 2022, serves as a framework built around Large Language Models (LLMs) with numerous applications in chatbots, generative question-answering, summarization, and beyond. Supported in both Python and JS, LangChain's fundamental idea is to allow the chaining of various components to create advanced use cases for LLMs. These components may include prompt templates, LLMs like GPT-3 or BLOOM, developers can build agents that decide on actions using LLMs, and elements of memory.
Langchain Chains
At the core of LangChain is the 'Chain' interface. A Chain is defined as a sequence of calls to components, which could include other chains.
Consider a simple example where we use an LLM (large language model) like GPT-3 to generate a company name given a certain product type. For this, we can use LangChain's LLMChain class, which is designed to chain an LLM with a prompt template.
Here's a basic example:
from langchain.llms import OpenAI
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
llm = OpenAI(temperature=0.9)
prompt = PromptTemplate(
? ? input_variables=["product"],
? ? template="What is a good name for a company that makes {product}?",
)
chain = LLMChain(llm=llm, prompt=prompt)
# Run the chain only specifying the input variable.
print(chain.run("colorful socks")) ?# Output: Colorful Toes Co.
Langchain's SimpleSequentialChain is designed for those scenarios where you need to run multiple chains in a sequence. The aim is not just to execute these chains one after the other, but more importantly, to feed the output of one chain as the input into the next. This makes the SimpleSequentialChain a powerful tool for multi-step problem-solving tasks.
Imagine a relay race, where each runner (chain) runs a stretch (computes a task), then hands the baton (the output) to the next runner. That's what happens in a SimpleSequentialChain. The race starts with an initial input. This input gets processed by the first chain, which creates an output. This output now serves as the input for the next chain in line. This sequence continues until all chains have had their turn, leading to a final output.
For example:
prompt1 = PromptTemplate(
input_variables=["product"],
template="What is a good name for a company that makes {product}?",
)
prompt2 = PromptTemplate(
input_variables=["company_name"],
template="What is a catchy slogan for {company_name}?",
)
# Define two chains
chain1 = LLMChain(llm=llm, prompt=prompt1)
chain2 = LLMChain(llm=llm, prompt=prompt2)
# Combine the two chains into a SimpleSequentialChain
seq_chain = SimpleSequentialChain(components=[chain1, chain2])
# Run the chain only specifying the input variable for the first chain.
print(seq_chain.run("colorful socks")) # Output: Colorful Toes Co. - Making Your Feet Happy!
Output Parsing in LangChain: A Double-Edged Sword
LangChain's output parser (link) is a handy tool for converting well-structured string results, such as those ending with a JSON-like output, into a Python dictionary. This provides an easy way to handle and manipulate the LLM's output. However, this convenience can turn into a hurdle when we venture into more complex territory, particularly with the Chain of Thought Prompting technique.
Despite the flexibility to create a custom regex-based OutputParser, the process can become messy, particularly when parsing a part of the result. Furthermore, integrating this with a SimpleSequentialChain introduces additional complexity. The sequential processing of tasks can lead to situations where the OutputParser cannot accurately parse each chain's output, leading to difficulties in obtaining the desired structured output.
In such intricate scenarios, we need a more flexible and robust solution. This brings us to the proposal of a new class, the ChainOfThought class, designed to seamlessly handle sequential processing and chain of thought prompting. Let's dive into how this class can help mitigate these challenges.
领英推荐
Custom ChainOfThought chain class
In response to the challenges of incorporating the 'Chain of Thought Prompting' technique within sequential language model operations, allow me to introduce the custom Chain of Thought (CoT) class. This class aims to ease the parsing of intermediate reasoning steps and the final result from the model's output, thus producing a structured output better suited to specific application requirements. The CoT class simplifies the handling of complex multi-step reasoning tasks, providing a valuable addition to the toolkit for advanced Large Language Model applications.
class CoTChain(LLMChain)
def __init__(self, **kwargs):
# 1. Custom delimiter for your specific format
result_delimiter = kwargs.pop('result_delimiter', None)
if not result_delimiter:
raise ValueError("Missing result_delimiter as the final answer delimiter of the chain of thought")
super().__init__(**kwargs)
self.result_delimiter = result_delimiter
class Config:
# Allow extra fields like 'result_delimiter'
extra = Extra.allow
def _call(self, inputs: Dict[str, Any], run_manager: Optional[CallbackManagerForChainRun] = None,) -> Dict[str, str]:
response = self.generate([inputs], run_manager=run_manager)
return self.create_outputs(response, run_manager)[0]
def create_outputs(self, llm_result: LLMResult, run_manager: Optional[CallbackManagerForChainRun] = None,) -> List[Dict[str, Any]]:
text = llm_result.generations[0][0].text
# 2. Auto parsing of chain of thought by delimiter and the actual output
output, chain_of_thought = self.extract_chain_of_thought_and_result(text)
result = [{self.output_key: output, "full_generation": text}]
if run_manager and self.verbose: # Log chain of thought when verbose = True
run_manager.on_text(f"\n\n\033[1m> Chain of Thought:\n{chain_of_thought}...\033[0m", end="\n")
return result
def extract_chain_of_thought_and_result(self, text):
# 3. Extension possibilities - a more complex regex parser for your own use case
cot_pattern = f"(.*){self.result_delimiter}"
result_pattern = f"{self.result_delimiter}(.*)"
steps_match = re.search(cot_pattern, text, re.DOTALL)
result_match = re.search(result_pattern, text, re.DOTALL)
if steps_match and result_match:
chain_of_thought = steps_match.group(1).strip()
output = result_match.group(1).strip()
else:
chain_of_thought = ""
output = text
return output, chain_of_thought:
This class offers three major benefits:
After all, nothing beats a good example. So, let's imagine we're developing an AI bot designed to assist with a police investigation, where multi-step reasoning is crucial. Our bot will need to play two roles: a 'forensic detective' and an 'investigation coordinator'. Each has their own distinct chain of thought.
Let's take a look at both of the bot roles which use a one shot chain of thought prompt (silly I know ??)
Forensic Detective
You are a forensic detective. Given the forensic evidence, it is your job to infer the possible features of the suspect
Here are some examples of how I infer features from evidence:
Evidence: The suspect left a size 9 shoe print, a rare foreign coin and a receipt from a vegan restaurant.
Inference:
- From the size 9 shoe print, we can infer that the suspect likely has relatively large feet as size 9 is above average.
- The presence of a rare foreign coin might suggest the suspect has recently travelled abroad or has a keen interest in foreign currencies.
- A receipt from a vegan restaurant could indicate that the suspect follows a vegan diet.
Conclusion:
1. The suspect has large feet.
2. The suspect may have recently travelled abroad.
3. The suspect is likely vegan.
Now let's analyze a new piece of evidence:
Forensic Evidence: {evidence}
Inference:s
->>>>>>>>>> Models nexe word prediction starts here
Investigation Coordinator
You are an investigation coordinator. Given the attributes of a suspect, it is your job to suggest possible investigation paths
Here are some examples of how I suggest investigation paths from forensic leads:
Leads:
The suspect has large feet, may have recently travelled abroad, and is likely vegan.
Inference:
- From the large feet, we can contact shoe stores that sell larger sizes and inquire about any recent purchases.
- If the suspect has recently travelled abroad, we can coordinate with border agencies to get travel records.
- Being vegan, we could visit local vegan restaurants or grocery stores and inquire if they've noticed any individual who might fit the description.
Conclusion:
1. Contact shoe stores selling larger sizes.
2. Request travel records from border agencies.
3. Visit local vegan restaurants and grocery stores for potential leads.
Now let's analyze new leads:
Leads:
{forensic_leads}
Inference:
->>>>>>>> Models next word prediction starts here
Note that in both templates, the keyword "Inference:" is used to initiate the chain of thought that the model needs to generate, and the actual result we are interested in appears after the "Conclusion:" keyword."
In action:
forensic_chain = CoTChain(
llm=llm,
prompt=forensic_detective_template,
output_key="forensic_leads",
verbose=True,
result_delimiter="Conclusion:"
)
coordinator_chain = CoTChain(
llm=llm,
prompt=coordinator_prompt,
output_key="follow_ups",
verbose=True,
result_delimiter="Conclusion:"
)
investigation_chain = SequentialChain(
chains=[forensic_chain, coordinator_chain],
input_variables=["evidence"],
output_variables=["forensic_leads", "follow_ups"],
verbose=True
)
new_evidence = "The suspect left a size 11 shoe print, a ticket stub from a French film festival, and a membership card to a local astronomy club."
result = investigation_chain(new_evidence)
Results ??:
***** Evidence ****
The suspect left a size 11 shoe print, a ticket stub from a French film festival, and a membership card to a local astronomy club.
***** Forensic Leads *****
1. The suspect has large feet.
2. The suspect has an interest in French culture or cinema.
3. The suspect has an interest in astronomy or science.<|im_end|>
***** Follow Ups *****
1. Contact shoe stores selling larger sizes.
2. Investigate any recent purchases related to French culture or cinema.
3. Obtain any recent purchases related to astronomy or science.
While the chain of thought was printed out with vervbose=True for debugging purpose:
> Chain of Thought (Forensic Detective)
- From the size 11 shoe print, we can infer that the suspect likely has relatively large feet as size 11 is above average.
- The ticket stub from a French film festival might suggest that the suspect has an interest in French culture or cinema.
- A membership card to a local astronomy club could indicate that the suspect has an interest in astronomy or science.
> Chain of Thought (Investigation Coordinator)
- From the large feet, we can contact shoe stores that sell larger sizes and inquire about any recent purchases.
- We can also try to investigate any recent purchases related to French culture or cinema such as books, movies, or theatre tickets.
- If the suspect has an interest in astronomy or science, we can try to obtain any recent purchases related to these fields such as telescopes or books on astrophysics
Final words
Indeed, the power of the Chain of Thought approach extends far beyond the playful example of a police investigation bot. It's a technique with substantial practical applications, particularly in the realm of complex Natural Language Processing tasks, such as Text-to-SQL translation.
Consider this, for instance: Text-to-SQL tasks involve converting natural language queries into their SQL counterparts, a task that can be complex and nuanced. In recent research (link), it was shown that by breaking down this task into smaller, more manageable sub-problems, the performance of Large Language Models (LLMs) on challenging Text-to-SQL datasets, such as Spider, can be significantly enhanced.
Cost-effectiveness is another aspect worth highlighting. While the superior performance of GPT-4 is widely acknowledged, it comes with higher costs which might not be feasible for all users or projects. In such cases, the Chain of Thought technique offers a compelling alternative. By enhancing the reasoning capabilities of the more affordable GPT-3.5 model, we can approach complex tasks with a level of effectiveness that might rival that of GPT-4.
The potential for real-world applications is exciting. When combined with the Chain of Thought technique multiple times, we can push the boundaries of what LLMs can accomplish, enabling them to handle complex reasoning tasks with more finesse. Thus, I look forward to further explorations and innovative uses of this technique in various AI applications.
Feel free to reach out if you have any thoughts!
Thanks,
Shai
Cyber Security Consultant & CISO/DPO | Business Management & Information Systems Graduate with a Specialization in Cybersecurity
1 年Helpful! And interesting Excellent article.
Engineering Group Leader | CNCF Ambassador | Public Speaker
1 年??????
Machine Learning | Engineering | Innovation
1 年Excellent article, really informative!
Software Developer at Sygnia
1 年Super interesting and well written, Great article Shai!
Cyber Security Analyst
1 年Great read!