Take a deep breath applies to LLMs as well
I recently reviewed the academic paper 'LARGE LANGUAGE MODELS AS OPTIMIZERS', published by Google DeepMind.
?The paper discusses "Optimization by PROmpting (OPRO)", a proposed method for utilizing large language models (LLMs) to overcome challenges posed by the absence of gradients in various optimization problems. In OPRO, optimization tasks are described in natural language and LLMs generate new solutions during each optimization step based on a prompt containing previously generated solutions and their values. The new solutions are then evaluated and added to the prompt for subsequent steps. The authors demonstrated the effectiveness of OPRO in linear regression and traveling salesman problems, and also in optimizing prompts to maximize task accuracy, with results showing OPRO-optimized prompts outperforming those designed by humans.
?Large Language Models (LLMs) are beneficial for optimization tasks due to their capacity to understand natural language. This allows individuals to describe their optimization tasks informally, without needing formal specifications. An example is prompt optimization, where the objective is to find a prompt that maximizes task accuracy; this can be done by providing a high-level text summary accompanied by input-output examples. This natural language capability makes LLMs user-friendly and accessible for optimization tasks.
?Few excerpts from the paper:
?Benchmarks: The primary evaluation benchmarks are GSM8K (Cobbe et al., 2021) and Big-Bench Hard (BBH) (Suzgun et al., 2022). GSM8K is a benchmark of grade school math word problems with 7,473 training samples and 1,319 test samples, where chain-of-thought prompting (Wei et al., 2022) and the zero-shot instruction “Let’s think step by step.” (Kojima et al., 2022) have drastically improved the performance over the standard prompting. BBH is a suite of 23 challenging BIG-Bench tasks (Srivastava et al., 2022) that covers a wide range of topics beyond arithmetic reasoning, including symbolic manipulation and commonsense reasoning. Each task contains up to 250 examples in total.
?Implementation details: We set the temperature to be 0 when evaluating the performance of generated instructions, in which case the scorer LLM greedily decodes. Unless otherwise specified, we set the default temperature to be 1.0 for optimizer LLMs to generate diverse and creative instructions. At each optimization step, we prompt the optimizer LLM with the meta-prompt 8 times to generate 8 instructions, then we add these instructions with their training scores to the optimization trajectory in the meta-prompt. Our meta-prompt at each step contains the best 20 instructions so far and 3 randomly picked exemplars from the training set. We study the effect of different hyperparameters in ablation studies (Section 5.3). Appendix C.2 presents the full meta-prompts for different optimizer LLMs.
领英推荐
?Key Findings: "Take a deep breath and work on this problem step-by-step" is the most impactful top instruction for the PaLM-2 model.??Use this carefully, there is no guarantee this will work on every LLM model.?In some cases, I have found this to improve accuracy on the OpenAI GPT models.?
?From academia to the corporate world:
The key takeaway is that LLMs are complex and their inner workings can be hard to conceptualize at times. They are auto-regressive, meaning they generate sequences of text one token at a time by conditioning each new token on the previous ones. Therefore, learning prompting strategies, iterating, and experimenting is absolutely key. Empower your employees by providing the right tools and environment for them to iterate and experiment, create their own learnings and best practices, and share them.
?Reach out if you have questions.?
#GPT #AI #GenerativeAI
?