From prompt magic to prompt engineering?

From prompt magic to prompt engineering?

Welcome to Continual Learnings

A weekly newsletter for practitioners building ML-powered products.

What we're reading this week

Email course on conformal prediction: Conformal prediction is a field of ML I’m currently learning about. It promises to provide robust uncertainty estimation for all machine learning models, without needing to change how they are trained. If it works, it could help address some of the main challenges that make out-of-distribution detection and active learning challenging today. This course looks like a promising starting point to learn about the field.

How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources: This is less relevant to production ML today, but if you’re wondering how we got from 2020’s GPT-3 to ChatGPT, this article does a great job of breaking down the sources of improvement.

TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models: Most LLMs are trained once and updated infrequently, leading to models that quickly become outdated in their knowledge of the world. This paper makes two contributions: first, a benchmark to measure LLM quality over time, and second, a demonstration of the feasibility of continually training LLMs on new data as it arrives.

Production ML papers to know

In this series, we cover important papers to know if you build ML-powered product.

When Less is More: Least to Most Prompting

Large language models (LLMs) like GPT-3 and ChatGPT have captured the heart of techno-twitter and were one of the catalysts of the current AI hype cycle.

However, it took a while after the release of GPT-3 for the promise of LLMs to capture the public attention it has today. The reason why is that there’s an art — these days generously referred to as “prompt engineering” — to crafting inputs for these models that produce compelling responses.

This week, we’re covering one of the formational approaches in prompt engineering: least-to-most prompting.

Background: chain-of-thought prompting

This paper builds on an earlier approach to prompt engineering: chain-of-thought prompting.

No alt text provided for this image

In a standard approach to prompting, you might provide some example input / output pairs as part of the prompt, with the hope that the model will be able to generalize to the new input you care about. However, for complicated input / output mappings, standard prompting often leads to poor generalization: with only a few examples, the model can’t figure out the pattern that relates inputs and outputs.

Chain-of-thought prompting refers to providing a sequence of logical steps to get from input to output alongside each example. For example, in the figure above, rather than just providing a world problem alongside the expected answer, the authors also show the LLM how to break the word problem into a sequence of smaller problems and solve those sequentially.

The surprising finding from the paper, which has proven to be consistently true in practice as well, is that this style of prompting leads to better performance and generalization. LLMs respond well to seeing examples of the reasoning behind the answer, not just the answer itself.

From chain-of-thought to least-to-most prompting

Chain-of-thought prompting is an improvement over standard prompting, but it struggles with easy-to-hard generalization, where the model is asked to solve problems harder than the examples provided in the prompt.

That’s where least-to-most prompting comes in. The approach works in two stages:

  • Stage 1: Problem Reduction. Split the complex problem to a set of easier subproblems
  • State 2: Sequentially solve subquestions. Solve each subproblem, with answers helping solve the next one

Each stage requires examples. In problem reduction, the prompt passed to the model contains examples of breaking the problem into pieces. In sequential subquestion solving, the prompts show how to answer subproblems, as well as linking each subproblem to the previous and next subproblems to be solved.

The figure below, from the paper, illustrates an example.

No alt text provided for this image

Complex Reasoning across a range of tasks

The approach was tested on symbolic manipulation, compositional generalization, and math reasoning challenges - and the results “show that least-to-most prompting can indeed generalize to problems harder than those demonstrated.”

For symbolic manipulation, the paper used the last-letter-concatenation task, where the input is a list of words and the output is the concatenation of the last letters of the words in the list. This is a task that is trivial for humans but hard for traditional LLMs.

No alt text provided for this image

Here, the subproblems are the individual list items, with the next list item then incrementally added. The lists used in demonstration contain at most three words, while the test lists contain four or more words.

The table below shows that least to most prompting significantly outperforms the baseline approaches of standard prompting and chain-of-thought prompting, especially as the list length increases.

No alt text provided for this image

For compositional generalization, the paper uses SCAN, a popular benchmark that requires mapping natural language commands to action sequences - for example, the command ‘look thrice after jump’ should return JUMP LOOK LOOK LOOK.

Least-to-most prompt examples are used to demonstrate how to reduce a long command to a list of short commands, and then how to map these commands to action sequences. The approach is again baselined against chain-of-thought prompting and standard prompting.

The results are shown in the table below, taken from the paper, and we can see that least to most prompting far outperforms the baseline methods.

No alt text provided for this image

For math reasoning, the paper users the numerical reasoning subset in DROP (which contains 5,850 problems),and the GSM8K dataset (containing linguistically diverse grade school math word problems).

Adding an additional baseline method - Zero-Shot - least to most prompting still outperforms the other approaches, as we can see from the table below.

No alt text provided for this image

And, the papers’ authors believe their approach would perform even better on math reasoning tasks that require more steps to solve.

Least to most prompting: from prompt magic to prompt engineering?

Least to most prompting does have some limits: for example, accuracy tails away as the symbolic manipulation task increases in difficulty.

Furthermore, not all problems can be solved by least-to-most prompting as they may not be reducible or easy to reduce.

But being able to demonstrate easy-to-hard generalization has been an inspirational result in the emerging field of prompt engineering.

Check out the paper is here.

Thanks for reading!

Feel free to get in touch if you have any questions: you can message us on socials or simply reply to this email.

You can also find previous issues on?our blog?and on?twitter.?

The Gantry team

要查看或添加评论,请登录

Gantry的更多文章

社区洞察

其他会员也浏览了