Prompt Engineering | Directional Stimulus Prompting...
I missed my flight back home today and hence had to stay an extra night in Austin. This was a boon in disguise. It gave me some additional time to research on a prompting technique. So, tonight I read the paper https://arxiv.org/pdf/2302.11520. In this article I am going to summarize my learnings from this paper and also illustrate where this can be utilized
In the rapidly advancing field of natural language processing (NLP), large language models (LLMs) like GPT-3, InstructGPT, and ChatGPT have made remarkable strides. These models exhibit impressive capabilities, such as few-shot prompting, in-context learning, and the ability to perform a variety of tasks. However, despite their vast potential, LLMs often fall short of consistently generating desired outputs in more specific or nuanced tasks. This gap in fine-tuned control over LLM outputs has been a major challenge for ensuring a consistent behaviour from the LLMs.
A novel solution has emerged from a team at the University of California, Santa Barbara, and Microsoft Research. The team introduces Directional Stimulus Prompting (DSP), a framework designed to guide LLMs more effectively toward desired outputs by using a small, tunable policy model.
The Challenge with Black-Box LLMs
LLMs like GPT-4 and PaLM are often referred to as "black-box" models because their internal parameters are not accessible for direct tuning. Users interact with these models through API calls, where they provide text-based prompts and receive responses. Although these models are incredibly powerful, their ability to generate task-specific outputs often hinges on the quality of the prompt.
This is where prompt engineering—the process of crafting specific inputs to elicit desired responses—comes into play. While manual and automated prompt engineering have seen success, they have limitations, especially when dealing with tasks that require more fine-grained or instance-specific responses. For example, in tasks like summarization or dialogue generation, models may not fully align their outputs with specific target behaviors, such as including key details or following a particular reasoning pattern.
The DSP framework addresses this issue by introducing a small policy model that generates instance-specific directional stimulus prompts, effectively guiding the LLM's response toward a more desirable outcome.
What Is Directional Stimulus Prompting?
The core concept behind DSP is the introduction of a directional stimulus—a discrete token or set of tokens that act as nuanced hints or clues for the LLM to follow. These stimuli guide the model's generation process by emphasizing key elements or aspects of the desired output.
For instance, in a summarization task, the directional stimulus might consist of important keywords that the summary should cover. In a dialogue response generation task, it could be a specific set of dialogue acts that capture the underlying intention of the response.
Unlike other methods that require additional external knowledge or complex retrieval mechanisms, DSP generates the directional stimulus solely based on the input query. This approach allows for fine-grained control without the computational inefficiencies of directly tuning the LLM's parameters.
How DSP Works
The DSP framework operates by training a small, tunable policy model to generate the directional stimulus for each input instance. This policy model can be trained through supervised fine-tuning (SFT) and/or reinforcement learning (RL).
By training the policy model instead of the LLM itself, DSP sidesteps the challenges of directly tuning black-box models, making the approach more efficient and scalable.
Applications and Results
The authors of the DSP framework tested it across several NLP tasks, including summarization, dialogue response generation, and chain-of-thought reasoning. The results were promising:
How have I used it?
I wanted to solve a task to summarize conversations and extract key topics, but wanted some desired behaviors to follow as part of the extraction. I took inspiration from this approach and passed desired behaviour as hints in the prompt. Let me give an example below:
Let us assume that a meeting happened between a travel assistant and a traveller. The travel planner now wants to send out a minutes of meeting.
Sample Chat Transcript:
Traveler: "I need to book a flight to Paris for next Tuesday and a hotel near the city center."
Travel Assistant: "I’ll look into flights for next Tuesday and get back to you. Do you have a hotel preference?"
Traveler: "No specific preference, just something comfortable. I’ll also need transportation from the airport to the hotel."
Travel Assistant: "Got it. I’ll also arrange transportation for you."
I can now write a prompt with an injected directed stimuli or a hint as below
Prompt Example :
Extract the action items from the following conversation between a traveler and a travel assistant. Focus on identifying tasks for both the traveler and the travel assistant, such as ‘Travel assistant will book the flight,’ ‘Traveler will confirm the dates,’ and similar tasks.
领英推荐
Transcript:
Traveler: "I need to book a flight to Paris for next Tuesday and a hotel near the city center."
Travel Assistant: "I’ll look into flights for next Tuesday and get back to you. Do you have a hotel preference?"
Traveler: "No specific preference, just something comfortable. I’ll also need transportation from the airport to the hotel."
Travel Assistant: "Got it. I’ll also arrange transportation for you."
Based on the extracted key topics and action items, draft a professional email summarizing the conversation between the traveler and the travel assistant. Use phrases like "Summary of our discussion," "Next steps," and "Looking forward to your response" to maintain a professional tone. Please specify the action items separately for the traveller and the travel assistant
Example of output :
Subject: Summary of Travel Arrangements – Trip to Paris
Dear [Traveler's Name],
I hope this message finds you well. Below is a summary of our recent conversation regarding your upcoming trip to Paris:
Summary of our discussion:
Next steps:
For me:
For you:
Looking forward to your confirmation, and I'll send you the details of the arrangements shortly thereafter.
Best regards, [Your Name] Travel Assistant
Additional points based on questions on this article:
Question: What does the policy model do? Mainak Sarkar
Answer:
The policy model plays a key role in guiding large language models (LLMs) toward specific desired outputs through Directional Stimulus Prompting (DSP). The policy model (e.g., T5) generates auxiliary prompts known as "directional stimulus prompts" that act as instance-specific hints to guide LLMs to align with target outputs. This helps improve performance on tasks such as summarization, dialogue generation, and reasoning.
The policy model is optimized through:
In summary, the policy model provides fine-grained control and guidance for black-box LLMs by producing input-specific hints that help steer LLMs toward the desired behavior
GenAI expert, Global Data Governance and Privacy leader , EVP Product & Technology , Business Advisory, Global Data Management Leader
6 个月Quite interesting. I am trying to understand the concept of the policy model in your own example. How is the policy model different from the actual LLM generating the final response ?