登录查看更多内容

From prompt magic to prompt engineering?

Gantry

Continuously improve your ML-powered products

发布日期: 2022年12月14日

+ 关注

Welcome to Continual Learnings

A weekly newsletter for practitioners building ML-powered products.

What we're reading this week

Email course on conformal prediction: Conformal prediction is a field of ML I’m currently learning about. It promises to provide robust uncertainty estimation for all machine learning models, without needing to change how they are trained. If it works, it could help address some of the main challenges that make out-of-distribution detection and active learning challenging today. This course looks like a promising starting point to learn about the field.

How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources: This is less relevant to production ML today, but if you’re wondering how we got from 2020’s GPT-3 to ChatGPT, this article does a great job of breaking down the sources of improvement.

TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models: Most LLMs are trained once and updated infrequently, leading to models that quickly become outdated in their knowledge of the world. This paper makes two contributions: first, a benchmark to measure LLM quality over time, and second, a demonstration of the feasibility of continually training LLMs on new data as it arrives.

Production ML papers to know

In this series, we cover important papers to know if you build ML-powered product.

When Less is More: Least to Most Prompting

Large language models (LLMs) like GPT-3 and ChatGPT have captured the heart of techno-twitter and were one of the catalysts of the current AI hype cycle.

However, it took a while after the release of GPT-3 for the promise of LLMs to capture the public attention it has today. The reason why is that there’s an art — these days generously referred to as “prompt engineering” — to crafting inputs for these models that produce compelling responses.

This week, we’re covering one of the formational approaches in prompt engineering: least-to-most prompting.

Background: chain-of-thought prompting

This paper builds on an earlier approach to prompt engineering: chain-of-thought prompting.

In a standard approach to prompting, you might provide some example input / output pairs as part of the prompt, with the hope that the model will be able to generalize to the new input you care about. However, for complicated input / output mappings, standard prompting often leads to poor generalization: with only a few examples, the model can’t figure out the pattern that relates inputs and outputs.

Chain-of-thought prompting refers to providing a sequence of logical steps to get from input to output alongside each example. For example, in the figure above, rather than just providing a world problem alongside the expected answer, the authors also show the LLM how to break the word problem into a sequence of smaller problems and solve those sequentially.

The surprising finding from the paper, which has proven to be consistently true in practice as well, is that this style of prompting leads to better performance and generalization. LLMs respond well to seeing examples of the reasoning behind the answer, not just the answer itself.

From chain-of-thought to least-to-most prompting

Chain-of-thought prompting is an improvement over standard prompting, but it struggles with easy-to-hard generalization, where the model is asked to solve problems harder than the examples provided in the prompt.

That’s where least-to-most prompting comes in. The approach works in two stages:

Stage 1: Problem Reduction. Split the complex problem to a set of easier subproblems
State 2: Sequentially solve subquestions. Solve each subproblem, with answers helping solve the next one

Each stage requires examples. In problem reduction, the prompt passed to the model contains examples of breaking the problem into pieces. In sequential subquestion solving, the prompts show how to answer subproblems, as well as linking each subproblem to the previous and next subproblems to be solved.

The figure below, from the paper, illustrates an example.

领英推荐

Is DeepSeek R1 Right for Your Business?

Plain Concepts 3 周前

GPT-4 Accepts Image Inputs, Here’s What That Means for…

super.AI 1 年前

Under-thinking in LLMs: Understanding the Phenomenon…

Setu Chokshi 3 周前

Complex Reasoning across a range of tasks

The approach was tested on symbolic manipulation, compositional generalization, and math reasoning challenges - and the results “show that least-to-most prompting can indeed generalize to problems harder than those demonstrated.”

For symbolic manipulation, the paper used the last-letter-concatenation task, where the input is a list of words and the output is the concatenation of the last letters of the words in the list. This is a task that is trivial for humans but hard for traditional LLMs.

Here, the subproblems are the individual list items, with the next list item then incrementally added. The lists used in demonstration contain at most three words, while the test lists contain four or more words.

The table below shows that least to most prompting significantly outperforms the baseline approaches of standard prompting and chain-of-thought prompting, especially as the list length increases.

For compositional generalization, the paper uses SCAN, a popular benchmark that requires mapping natural language commands to action sequences - for example, the command ‘look thrice after jump’ should return JUMP LOOK LOOK LOOK.

Least-to-most prompt examples are used to demonstrate how to reduce a long command to a list of short commands, and then how to map these commands to action sequences. The approach is again baselined against chain-of-thought prompting and standard prompting.

The results are shown in the table below, taken from the paper, and we can see that least to most prompting far outperforms the baseline methods.

For math reasoning, the paper users the numerical reasoning subset in DROP (which contains 5,850 problems),and the GSM8K dataset (containing linguistically diverse grade school math word problems).

Adding an additional baseline method - Zero-Shot - least to most prompting still outperforms the other approaches, as we can see from the table below.

And, the papers’ authors believe their approach would perform even better on math reasoning tasks that require more steps to solve.

Least to most prompting: from prompt magic to prompt engineering?

Least to most prompting does have some limits: for example, accuracy tails away as the symbolic manipulation task increases in difficulty.

Furthermore, not all problems can be solved by least-to-most prompting as they may not be reducible or easy to reduce.

But being able to demonstrate easy-to-hard generalization has been an inspirational result in the emerging field of prompt engineering.

Check out the paper is here.

Thanks for reading!

Feel free to get in touch if you have any questions: you can message us on socials or simply reply to this email.

You can also find previous issues on?our blog?and on?twitter.?

The Gantry team

From prompt magic to prompt engineering?

Gantry

Continuously improve your ML-powered products

Welcome to Continual Learnings

What we're reading this week

Production ML papers to know

When Less is More: Least to Most Prompting

Background: chain-of-thought prompting

From chain-of-thought to least-to-most prompting

领英推荐

Complex Reasoning across a range of tasks

Least to most prompting: from prompt magic to prompt engineering?

Thanks for reading!

Continual Learnings

1,223 位关注者

Gantry的更多文章

社区洞察

其他会员也浏览了

PROMPT ENGINEERING

?? LLMs Are Improving Themselves

The Art of Prompt Engineering

The LLMOps Lifecycle: Managing Large Language Models Effectively

Prompt Engineering, Fine-Tuning LLMs, or RAG: Which Is Best for Your Applications?

GPT-4o Mini: Bridging the Gap Between Cost and Capability in AI

Fine-tuning GPT-3.5 Turbo: A short intro for software engineers

Mastering Prompt Engineering: A Structured Approach

Designing a GPT: A Comprehensive Guide to Do's and Don'ts

Impossible Distillation: How to Make High-quality Lemonade out of Small, Low-quality Model.

Welcome to Continual Learnings

What we're reading this week

Production ML papers to know

When Less is More: Least to Most Prompting

Background: chain-of-thought prompting

From chain-of-thought to least-to-most prompting

领英推荐

Complex Reasoning across a range of tasks

Least to most prompting: from prompt magic to prompt engineering?

Thanks for reading!

Continual Learnings

1,223 位关注者

Gantry的更多文章

How to teach an old model new tricks

Putting Responsible AI into Practice

How to measure language model performance

Why do ML Projects Fail?

Monolith: The Recommendation System Behind TikTok

MLOps at Industrial-Scale: Lessons from Google

How do people actually operationalize ML in 2022?

Do You Really Need a Feature Store?

社区洞察

其他会员也浏览了

PROMPT ENGINEERING

?? LLMs Are Improving Themselves

The Art of Prompt Engineering

The LLMOps Lifecycle: Managing Large Language Models Effectively

Prompt Engineering, Fine-Tuning LLMs, or RAG: Which Is Best for Your Applications?

GPT-4o Mini: Bridging the Gap Between Cost and Capability in AI

Fine-tuning GPT-3.5 Turbo: A short intro for software engineers

Mastering Prompt Engineering: A Structured Approach

Designing a GPT: A Comprehensive Guide to Do's and Don'ts

Impossible Distillation: How to Make High-quality Lemonade out of Small, Low-quality Model.