登录查看更多内容

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

Vlad Bogolin

AI/ML Engineer & Researcher | Large Language Models (LLMs)

发布日期: 2024年7月31日

Today's paper introduces a new zero-shot prompting method called Plan-and-Solve (PS) Prompting to improve the reasoning capabilities of large language models. The method guides models to first devise a plan to break down complex tasks into subtasks, then carry out those subtasks step-by-step. This approach aims to address common pitfalls in existing zero-shot methods like calculation errors and missing reasoning steps.

Method Overview

The Plan-and-Solve (PS) Prompting method works by providing more detailed instructions to large language models in a zero-shot setting. The key idea is to prompt the model to first devise a plan for solving the problem by breaking it down into subtasks, and then carry out those subtasks systematically.

The method has two main components:

A planning stage where the model is asked to understand the problem and create a step-by-step plan to solve it.
An execution stage where the model carries out the plan and solves the problem step-by-step.

One possible prompt would be: “Q: [X]. A: Let’s first understand the problem and devise a plan to solve the problem. Then, let’s carry out the plan and solve the problem step by step”

To further improve performance, they introduce an enhanced version called PS+ prompting. This adds more detailed instructions like "extract relevant variables and their corresponding numerals" and "calculate intermediate results". These additional prompts aim to reduce calculation errors and ensure important reasoning steps are not missed.

The prompts are designed to be general enough to work across different types of reasoning tasks without requiring task-specific examples or fine-tuning. This allows the method to leverage the embedded knowledge and reasoning capabilities of large language models in a zero-shot manner.

领英推荐

? Are You Doing RAG Right?

Pascal Biese 4 个月前

? Time for LLMs?

Pascal Biese 10 个月前

?? Getting RAG Right: All in One Go

Pascal Biese 4 个月前

Results

The PS and PS+ prompting methods consistently outperformed existing zero-shot baselines across 10 datasets covering arithmetic, commonsense, and symbolic reasoning tasks. On arithmetic reasoning, PS+ prompting achieved an average accuracy of 76.7%, compared to 70.4% for the standard zero-shot chain-of-thought method.

Notably, PS+ prompting performed comparably to or even exceeded some few-shot methods that use manually crafted examples, despite being fully zero-shot. It also reduced calculation errors and missing-step errors compared to existing zero-shot methods.

Conclusion

This paper introduces an effective zero-shot prompting strategy that improves the reasoning capabilities of large language models across diverse tasks. By guiding models to plan and systematically solve problems, it addresses key limitations of existing methods and achieves strong performance without requiring examples or fine-tuning. For more information please consult the?full paper.

Congrats to the authors for their work!

Wang, Lei, et al. "Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models." arXiv preprint arXiv:2305.04091 (2023).

AI Paper of the Day

1,033 位关注者

Swastik K

Search Specialist @ HerKey | ex-Engati

3 个月

Have tried similar methods and seen good results.

1 次回应

要查看或添加评论，请登录

查看全部

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

Vlad Bogolin

AI/ML Engineer & Researcher | Large Language Models (LLMs)

Method Overview

领英推荐

Results

Conclusion

AI Paper of the Day

1,033 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

?? Has OpenAI Lost Its Edge?

??Top ML Papers of the Week

Build Your Own Real-Time Multimodal RAG Applications!

Enhancing Reasoning in Transformer-Based Large Language Models via Symbolic Templates

Top LLM Papers of the Week (March Week-3 2024)

How exactly LLM generates text?

Understanding CoALA (Cognitive Architectures for Language Agents) Through a ReAct Agent Example Using LangChain

Evaluating LLM and RAG Systems

Are Long-LLMs A Necessity For Long-Context Tasks?

Product problem considerations when building Large Language Model based applications

Method Overview

领英推荐

Results

Conclusion

AI Paper of the Day

1,033 位关注者

From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge

2024年11月26日

WildLMa: Long Horizon Loco-Manipulation in the Wild

2024年11月25日

TüLU 3: Pushing Frontiers in Open Language Model Post-Training

2024年11月24日

Multimodal Autoregressive Pre-training of Large Vision Encoders

2024年11月23日

LLaVA-o1: Let Vision Language Models Reason Step-by-Step

2024年11月22日

AnimateAnything: Consistent and Controllable Animation for Video Generation

2024年11月21日

RedPajama: an Open Dataset for Training Large Language Models

2024年11月20日

Generative World Explorer

2024年11月19日

Rapid Response: Mitigating LLM Jailbreaks with a Few Examples

2024年11月18日

Cut Your Losses in Large-Vocabulary Language Models

2024年11月17日

社区洞察

其他会员也浏览了

?? Has OpenAI Lost Its Edge?

??Top ML Papers of the Week

Build Your Own Real-Time Multimodal RAG Applications!

Enhancing Reasoning in Transformer-Based Large Language Models via Symbolic Templates

Top LLM Papers of the Week (March Week-3 2024)

How exactly LLM generates text?

Understanding CoALA (Cognitive Architectures for Language Agents) Through a ReAct Agent Example Using LangChain

Evaluating LLM and RAG Systems

Are Long-LLMs A Necessity For Long-Context Tasks?

Product problem considerations when building Large Language Model based applications