Maximizing Effectiveness of Large Language Models (LLMs): Advanced Prompt Engineering Techniques

Maximizing Effectiveness of Large Language Models (LLMs): Advanced Prompt Engineering Techniques

In the previous articles, we understood LLMs, explored using the playground, and looked at some cost-saving strategies. Today let us look at how to use these LLMs effectively to achieve an optimized result for our use cases.


1. RAG (Retrieval-Augmented Generation)?

Retrieval-augmented generation (RAG) is a technique that enhances the accuracy and reliability of generative AI model with facts fetched from external sources.

Fetching the knowledge from our external source and feeding it to the LLM as context in prompt can help the LLM to provide better results and reduce hallucination.

Here is one of my projects to get started with RAG using Langchain: https://github.com/mahima5598/Cost-Saving-RAG

Paper on RAG: https://arxiv.org/pdf/2005.11401.pdf

Improving RAG is possible through fine-tuning, achieved via a process called Retrieval Augmented Fine Tuning (RAFT). We will delve into the concept of fine-tuning in subsequent articles before discussing RAFT in detail.


2. Sequence of Prompts

We earlier discussed "How to generate effective prompts" and learned that adding examples can help the LLM understand the task better, leading to desired outputs.

Chain-of-Thoughts (CoT) is a technique aimed at improving the few-shot learning method by requesting the step-by-step process leading to a solution i.e. the chain of thoughts to reach the solution. This can be accomplished through various approaches, ?by either providing an example of the thought process or by automatically asking it to think step by step as illustrated in the figure below.

This method is usually used when the complex task requires reasoning capabilities to solve it.

Paper on CoT: :https://arxiv.org/pdf/2201.11903v6.pdf

Now that we have grasped the concept of Chain-of-Thoughts (COT), let's explore some other techniques that draw inspiration from it.

  • Multiple CoTs- It is also referred to as self-consistency with CoT (COT-SC). This method selects the most coherent answer by considering various set of reasoning paths.

Paper on CoT-SC: https://arxiv.org/pdf/2203.11171.pdf

  • Tree-of-thought(ToT) - It helps us to solve reasoning problems with self evaluating choices
  • Graph-of-Thought(GoT) - It helps us solve reasoning problems with feedback loops.

Paper on ToT: https://arxiv.org/pdf/2305.10601.pdf

Paper on GoT: https://arxiv.org/pdf/2308.09687.pdf


3. ReAct (Reason-Action)

One drawback of Chain-of-Thoughts (COT) is its susceptibility to hallucination. This limitation can be mitigated by structuring the prompt to include both Reason and Action (ReAct) as per the below format.

This technique is primarily employed for Multi-Hop QA, which involves answering a question after conducting multiple steps of reasoning.

Paper on ReAct: https://arxiv.org/pdf/2210.03629.pdf

ReAct heavily relies on the quality of the information it retrieves; uninformative search results can disrupt the model's reasoning process and make it challenging to recover and reformulate thoughts.


4. DSP(Directional Stimulus Prompting)?

While ReAct helps answer multi-Hop QA using reason and action, DSP provides a direction to the LLM by providing a hint. The hint is generated by a tuneable policy model optimized for this task.

This method harnesses reinforcement learning, a machine learning technique that enables machines to learn based on a reward system, to optimize LLMs.

Paper on DSP: https://arxiv.org/pdf/2302.11520.pdf


5. Prompt Tuning

Prompt tuning, also referred to as soft prompt tuning, employs soft prompts to guide the LLM towards performing a specific task more effectively. Unlike hard prompts, which are manually written by humans, these soft prompts are tuneable embeddings produced by the smaller model and are added to the query.

This technique is particularly useful when utilizing the model for multiple tasks or when a recyclable universal prompt is needed.

Paper on Prompt Tuning: https://arxiv.org/pdf/2104.08691.pdf

In addition to the above method, there are two similar methods that demonstrate effective performance:

  • Prefix Tuning: This method involves adding soft prompts to every layer of the pre-trained model, along with inserting tokens into the decoder to provide guidance for improved output generation.
  • Pre-trained Soft Prompts (PSP): Developed to enhance summary generation, this technique combines prompt tuning and prefix tuning approaches while incorporating inner prompts such as interval, sequential, and fixed-length. Its objective is to capture the structure within the source document, facilitating a deeper understanding of its semantics.

Paper on Prefix Tuning: https://arxiv.org/pdf/2101.00190.pdf

Paper on PSP: https://arxiv.org/pdf/2204.04413.pdf


Conclusion

We explored advanced techniques like RAG, CoT, ReAct, DSP, and Prompt Tuning to optimize large language models (LLMs). Each method offers unique strategies for improving LLM performance, from integrating external source to refining reasoning processes and providing directional hints. These approaches signify the evolving landscape of AI-driven text generation, offering promising avenues for more accurate and reliable results for various tasks.


Despite the existence of these techniques, LoRa stands out as the method offering the best performance for obtaining task-specific output from LLMs. We will delve into this further in the next article.


Happy reading ?? . For more such articles subscribe to my newsletter: https://lnkd.in/guERC6Qw

I would love to connect with you on Twitter: @MahimaChhagani. Feel free to contact me via email at [email protected] for any inquiries or collaboration opportunities.

要查看或添加评论,请登录

Mahima Chhagani的更多文章

社区洞察

其他会员也浏览了