Maximizing Effectiveness of Large Language Models (LLMs): Advanced Prompt Engineering Techniques
In the previous articles, we understood LLMs, explored using the playground, and looked at some cost-saving strategies. Today let us look at how to use these LLMs effectively to achieve an optimized result for our use cases.
1. RAG (Retrieval-Augmented Generation)?
Retrieval-augmented generation (RAG) is a technique that enhances the accuracy and reliability of generative AI model with facts fetched from external sources.
Fetching the knowledge from our external source and feeding it to the LLM as context in prompt can help the LLM to provide better results and reduce hallucination.
Here is one of my projects to get started with RAG using Langchain: https://github.com/mahima5598/Cost-Saving-RAG
Paper on RAG: https://arxiv.org/pdf/2005.11401.pdf
Improving RAG is possible through fine-tuning, achieved via a process called Retrieval Augmented Fine Tuning (RAFT). We will delve into the concept of fine-tuning in subsequent articles before discussing RAFT in detail.
2. Sequence of Prompts
We earlier discussed "How to generate effective prompts" and learned that adding examples can help the LLM understand the task better, leading to desired outputs.
Chain-of-Thoughts (CoT) is a technique aimed at improving the few-shot learning method by requesting the step-by-step process leading to a solution i.e. the chain of thoughts to reach the solution. This can be accomplished through various approaches, ?by either providing an example of the thought process or by automatically asking it to think step by step as illustrated in the figure below.
This method is usually used when the complex task requires reasoning capabilities to solve it.
Paper on CoT: :https://arxiv.org/pdf/2201.11903v6.pdf
Now that we have grasped the concept of Chain-of-Thoughts (COT), let's explore some other techniques that draw inspiration from it.
Paper on CoT-SC: https://arxiv.org/pdf/2203.11171.pdf
Paper on ToT: https://arxiv.org/pdf/2305.10601.pdf
Paper on GoT: https://arxiv.org/pdf/2308.09687.pdf
3. ReAct (Reason-Action)
One drawback of Chain-of-Thoughts (COT) is its susceptibility to hallucination. This limitation can be mitigated by structuring the prompt to include both Reason and Action (ReAct) as per the below format.
This technique is primarily employed for Multi-Hop QA, which involves answering a question after conducting multiple steps of reasoning.
领英推荐
Paper on ReAct: https://arxiv.org/pdf/2210.03629.pdf
ReAct heavily relies on the quality of the information it retrieves; uninformative search results can disrupt the model's reasoning process and make it challenging to recover and reformulate thoughts.
4. DSP(Directional Stimulus Prompting)?
While ReAct helps answer multi-Hop QA using reason and action, DSP provides a direction to the LLM by providing a hint. The hint is generated by a tuneable policy model optimized for this task.
This method harnesses reinforcement learning, a machine learning technique that enables machines to learn based on a reward system, to optimize LLMs.
Paper on DSP: https://arxiv.org/pdf/2302.11520.pdf
5. Prompt Tuning
Prompt tuning, also referred to as soft prompt tuning, employs soft prompts to guide the LLM towards performing a specific task more effectively. Unlike hard prompts, which are manually written by humans, these soft prompts are tuneable embeddings produced by the smaller model and are added to the query.
This technique is particularly useful when utilizing the model for multiple tasks or when a recyclable universal prompt is needed.
Paper on Prompt Tuning: https://arxiv.org/pdf/2104.08691.pdf
In addition to the above method, there are two similar methods that demonstrate effective performance:
Paper on Prefix Tuning: https://arxiv.org/pdf/2101.00190.pdf
Paper on PSP: https://arxiv.org/pdf/2204.04413.pdf
Conclusion
We explored advanced techniques like RAG, CoT, ReAct, DSP, and Prompt Tuning to optimize large language models (LLMs). Each method offers unique strategies for improving LLM performance, from integrating external source to refining reasoning processes and providing directional hints. These approaches signify the evolving landscape of AI-driven text generation, offering promising avenues for more accurate and reliable results for various tasks.
Despite the existence of these techniques, LoRa stands out as the method offering the best performance for obtaining task-specific output from LLMs. We will delve into this further in the next article.
Happy reading ?? . For more such articles subscribe to my newsletter: https://lnkd.in/guERC6Qw
I would love to connect with you on Twitter: @MahimaChhagani. Feel free to contact me via email at [email protected] for any inquiries or collaboration opportunities.