Precision in Prompting: Key to Effective LLM Interactions

Precision in Prompting: Key to Effective LLM Interactions

Introduction

My previous article explored various methods to loading and performing local inference with Large Language Models (LLM). This time, our attention turns to the art of presenting prompts to LLMs. Fine-tuning Language Learning Models (LLMs) brings a focus on the critical importance of prompt formatting. These specialized models, unlike their general-purpose counterparts, may depend heavily on the use of specific keywords and carefully structured prompts to elicit the right responses. This article delves into why getting the prompt syntax just right matters so much, highlighting how even minor deviations can result in less-than-ideal outcomes. We'll also explore the role of templating languages, such as jinja, in achieving consistent prompt formatting and show how these concepts come to life through practical examples using llama.cpp python. This is different than prompt engineering where the focus is on the content of the text itself while prompt formatting which is the intent of this article, is about it's presentation to the model.

Pretrained vs. Instruction-Tuned Models

In machine learning, particularly for language models like GPT, we distinguish between "pretrained models" and "fine-tuned models" during their development and deployment.

Pretrained Models refer to the initial versions trained on diverse datasets to grasp language fundamentals, including grammar, syntax, and some world knowledge. This phase is mostly unsupervised, with the model learning from vast amounts of text to build a broad understanding. Pretrained models are also known as foundational model or base model.

Fine-Tuned Models are developed by further training the pretrained model on a more focused dataset. This second phase, often supervised, tailors the model to specific tasks like emotion analysis or medical inquiries, refining its capabilities to perform particular functions with greater accuracy. Fine-tuning transforms a generalist model into a specialist, leveraging its foundational knowledge for precise applications. This shift is key to developing AI solutions that meet specific needs with enhanced precision and efficiency. While not all fine-tuned LLM models require specific prompt syntax, however it's essential to confirm before using them whether they expect prompt inputs in a particular format or if plain, regular text will be sufficient.

Why Presenting the Right Prompt Matters

When working with AI language models, the approach to framing questions or tasks is crucial and comes down to two key aspects: the content of the prompt and its presentation. Prompt engineering focuses on crafting the text with precise wording to elicit accurate responses. Equally important, but less frequently discussed, is the method of presenting these prompts, involving the use of special symbols or formats to distinguish between the task instructions and contextual information. This structured approach is essential for fine-tuning the models, enabling them to understand and respond more effectively.

Training Large Language Models (LLMs) to recognize specific formats or keywords significantly boosts their capacity for handling specialized tasks and understanding context that might elude broader models. Maintaining the correct prompt format is crucial because deviations can impact output quality. For example, using markers like [INST] in prompts for models such as LLaMA-2 clarifies our requests by distinguishing different prompt parts, thereby enhancing the model's response accuracy and relevance. This approach is akin to using a map to provide clear directions; the text serves as the directions, while the format acts as the map guiding the interpretation. Understanding this, is vital for developers, as it not only improves model performance but also expands the potential for innovative AI applications. Experimentation with prompt presentation continues to unveil new ways to refine our interactions with AI, marking an exciting frontier for developers aiming to unlock further capabilities of these advanced technologies.

Prompt Generation

Consider the llama2 chat model that has been fine-tuned to interact in a chat format, involving a user and an assistant. In contrast, the pre-trained llama2 model could only predict or generate the next piece of text based on the provided input and don't have the capability to understand or respond to user queries. However, the chat version of this model, built upon the foundational model, undergoes supervised learning. It's specially trained with a certain prompt structure to grasp questions within their context and give answers. This model uses a unique prompt format, incorporating specific keywords to mark the beginning and end of the conversation segments from both the user and the assistant.

Typically, the prompt includes a history of the conversation, containing the user's messages and the assistant's previous responses, capped off with the user's latest question or comment. This specially formatted prompt is then fed to the model, which, having been trained to recognize this structure, starts generating the assistant's response.

The Huggingface tokenizer class offers a handy method for assembling prompts in the correct format according to the model. The hugging face models often ship with prompt template in jinja template language format which can be found in the model's tokenizer_config.json file. The apply_chat_template method of tokenizer class is provided with information about the user and roles, to apply the jinja template and produce the proper prompt format for the model. This method of generating prompts is versatile and can be adapted for any model, as long as its jinja template is included in its tokenizer_config.json. The apply_chat_template method takes care of formatting and generating the final prompt output which is compatible with model and can be fed to generate output.

Instead of using tokenizer class, user can also choose to go with manual prompt creation but for that it's crucial, though, to pay attention to the placement of keywords and the punctuation around them, like spaces and newline characters, to ensure the model produces consistent outputs. Compliance with the precise prompt syntax is key for reliable results from the model.

Here is a code snippet to generate prompt for llama2 fine tuned mode for chat using apply_chat_template method.

tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-chat-hf')

chat = [
   {"role": "system", "content": "You are a helpful, respectful and honest assistant."},
   {"role": "user", "content": "Hi there, write me 3 random quotes"},
]

prompt=llama_tokenizer.apply_chat_template(
        chat, tokenize=False,
        chat_template=tokenizer.chat_template)

print(prompt)        

In the script mentioned, the tokenizer.chat_template is set up using a Jinja template from the tokenizer_config.json file. If the Jinja template is not present in this file, then chat_template will be initialized as None. In that case, users may supply a jinja template string for that prompt syntax obtained through alternative methods. If no template is available, users can opt for creating prompts manually, a process that is detailed later in the article

Llama 2 compatible prompt.

Text highlighted in yellow represents the system content, also known as context, while text in green indicates the task or user query. All other characters belong to the prompt syntax and should remain precisely as they are. Notice the use of '\n' and space just after [INST] and before [/INST] keyword. The <s> token signals the beginning of content, while [INST] and [/INST] denote the start and end of instructions, respectively. The <<SYS>> token specifies the conversation's context or the desired behavior of the model, like emulating a mathematician or musician, or setting a specific scenario that might better guide the response.

In previous code snippet replacing "meta-llama/Llama-2-7b-chat-hf" with "HuggingFaceH4/zephyr-7b-beta" will result in prompt compatible with zephyr fine tuned model.

Zephyr compatible prompt

Some other prompt formats

ChatML compatible prompt: [ Template Source ]

Mistral compatible prompt [ Template Source ]

<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s> [INST] I'd like to show off how chat templating works! [/INST]

Mixtral compatible prompt [ Template Source ]

"<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s>[INST] I'd like to show off how chat templating works! [/INST]"

The last two prompt for mistral and mixtral are almost the same except a space right after </s> which is not present in mixtral. Ignoring that level of details could effect end result generated by mistral or mixtral.

The Hugging Face tokenizer offers support for generating precise prompts. While it's possible to create the same syntax manually without tools, it's crucial to ensure that keywords and punctuation exactly match the required syntax for consistent results. The apply_chat_template method depends on the availability of the tokenizer_config.json file; however, the jinja format template is typically released with the model. If it's not accessible via tokenizer.chat_template, one can manually input the prompt template. Understanding the correct prompt syntax format is essential, especially for fine-tuned models, to achieve the desired output.

Manual prompt generation

Creating prompts manually without jinja is straightforward but is confined to simple question-answer scenarios. Managing an extended conversation history without jinja becomes overly complicated, which is why manual prompt creation is better suited to single-query formats. For managing longer conversation histories more efficiently, the apply_chat_template method, used in conjunction with a Jinja template within the tokenizer class, presents a superior approach.

template='<s>[INST] <<SYS>>\n{context}\n<</SYS>>\n\n{question} [/INST]'
prompt=template.format(
         context="You are helpful assistant", 
         question="Hi there, write me 3 random quotes"
      )
output=llm(prompt)
        

The code mentioned above employs the Python string format method to compile the final prompt, which is subsequently input into the Large Language Model (LLM) for inference.

Let's do some experiments

This section explores how slight modifications in prompt syntax can impact the output of the model. It aims to provide insight and evidence on the significance of following the correct prompt syntax closely. Please be aware that all inference results presented in this article were performed using 4-bit quantized models. It's possible that using unquantized models might yield better and more consistent results.

Zephyr 7B fine tuned model

The analysis uses the Zephyr 7B fine tuned model with 4 bit quantization in conjunction with llama-cpp-python. By invoking the model's prediction call multiple times, the script examines any variations or inconsistencies in the results which may not be visible in a single run.

from llama_cpp import Llama
import llama_cpp

model = './zephyr-7b-beta.Q4_K_M.gguf'

llm=llama_cpp.Llama(model_path=model,
                verbose=True, n_gpu_layers=-1)


prompt = '<|system|>\nYou are a helpful and creative assistant.</s>\n<|user|>\nHi there, write me 3 random quotes</s>\n'

for i in range (10):
    stream = llm(prompt, max_tokens=200, echo=False, temperature=0.9)
    print(stream['choices'][0]['text'])        

Output from one of the iteration.

Throughout all the iterations, the formatting and presentation of the results remained consistent. The contents however were changing due to temperature value. The output from all of the iterations can be found at link.

Let's try to remove keyword (e.g <s>\n) from the end and see what will happen

prompt = '<|system|>\nYou are a helpful and creative assistant.</s>\n<|user|>\nHi there, write me 3 random quotes'        

Output from one of the iteration:

The removal of last key <s> keyword turned off chat capability and force model to behave just like standard text generation or like foundational model. Output of all iterations can be seen at link.

Now let's remove all newline characters from the original complete prompt and see what happens.

prompt = '<|system|>You are a helpful and creative assistant.</s><|user|>Hi there, write me 3 random quotes</s>'        

Output from one of the iteration (see link for all the outputs):

The output varied, displaying double "<" instead of a single '<' character, and the first quote didn't start on a new line. While the generated quotes were somewhat aligned with our expectations, the presentation inaccuracies could potentially cause issues in the content produced for other prompts. This inconsistency was noted in a few instances across numerous trials, underscoring the overall unpredictable nature of the output and emphasizes the crucial importance of adhering strictly to the original prompt syntax.

Removing |user| and placing the question directly after |system| also resulted in the model delivering the correct output in the anticipated format. This suggests that using just the system keyword to present a question is effective for this model. However, this approach might not apply to other models, as each may respond differently to changes in prompt syntax.

Finally let's change the prompt to plain query without any keywords and see what we get.

prompt = 'Hi there, write me 3 random quotes'        

The absence of prompt keywords results in the model's output resembling text generation (like foundational model) that picks up directly from where the prompt text concludes. Although few occurrence did output desired answer but overall lacking consistency in output. Output from all iterations can be seen at link.

Llama2 7B Chat tuned model

This time, we'll experiment with a different query using the llama2 7B chat fine-tuned model (4 bit quantized). Our focus will be on examining how the model's reasoning abilities are influenced by changes in prompt formatting. We'll compare the effects on output quality between using a properly formatted prompt and raw input, sticking exclusively to these two situations. The system context is omitted this round setting to empty string.

With proper formatted prompt:

from llama_cpp import Llama
import llama_cpp

model = './llama-2-7b-chat.Q4_K_M.gguf'

llm=llama_cpp.Llama(model_path=model,
                verbose=True, n_gpu_layers=-1)


prompt="<s>[INST] <<SYS>>\n\n<</SYS>>\n\nAlex has a collection of 50 books. He decides to donate 15 books to the local library. If Alex's friend, Jamie, donates twice as many books as Alex did to the library, how many books does Jamie donate? [/INST]"

for i in range (10):
    stream = llm(prompt, max_tokens=200, echo=False, temperature=0.9)
    print(stream['choices'][0]['text'])        

Output from one of the iteration:

Output from all 10 iterations can be found at link.

With raw prompt:

Switching the prompt from keyword-enhanced text to raw text results in a varied mix of responses, including several that lack the correct answer.

prompt = "Alex has a collection of 50 books. He decides to donate 15 books to the local library. If Alex's friend, Jamie, donates twice as many books as Alex did to the library, how many books does Jamie donate?"        

Output from one of the iteration:

The llama model struggled to produce accurate results with raw text inputs, often yielding incorrect outputs. The outcomes from all 10 iterations are documented in link. However, compared to Zephyr, the llama2 model attempted to understand and respond with answer even when presented with unformatted raw prompts.

Final thoughts

Not all fine-tuned Large Language Models (LLMs) require or follow a custom prompt syntax. The need for custom prompt syntax largely depends on the specific application, the design of the fine-tuning process, and how the model is intended to be interacted with post-fine-tuning.

When Custom Prompt Syntax Is Used

Custom prompt syntax is often employed in scenarios where:

  • Specific Input/Output Structure is Needed: Certain applications may require a well-defined structure for inputs and outputs. For example, if a model is fine-tuned for a chatbot, custom prompts might be designed to differentiate between user commands and bot responses.
  • Enhancing Model Understanding: Custom syntax can help the model better understand the context or the specific task it needs to perform, especially if it was fine-tuned on data that included such syntax.
  • Compatibility with Existing Systems: Integrating a fine-tuned model into an existing system might necessitate adapting the prompt format to align with the system's requirements.

When Custom Prompt Syntax Is Not Necessary

However, custom prompt syntax is not always necessary:

  • General Fine-Tuning: If the fine-tuning does not target a highly specialized task that requires distinct input/output formats, the model may respond adequately to natural language prompts without needing a custom syntax.
  • Adaptive Models: Some models are designed to be highly adaptive to the input format. They can understand and respond appropriately to prompts that are structured in plain language, making custom syntax less critical.
  • User-Friendly Applications: For applications targeting end-users without technical expertise, it might be preferable to design interactions that use natural language prompts to avoid confusion or the need for specialized knowledge.

The choice to use custom prompt syntax with a fine-tuned LLM model is influenced by the goals of the fine-tuning, the nature of the application, and the target users. While custom syntax can enhance precision and clarity in model interactions for certain tasks, many applications benefit from the intuitive and flexible nature of natural language interactions, requiring no special prompt syntax.

Finding prompt syntax

Gathering a detailed list of prompt syntax for open-source Large Language Models (LLMs) requires some research due to the lack of a centralized source. Key strategies include:

  • Model Documentation: Starting with the official documentation of LLMs like GPT, BERT, or LLaMA can be very informative, offering prompt examples and usage guidelines. Huggingface model card are excellent source of information.
  • GitHub Repositories: Exploring the code, examples, and discussions on GitHub provides insights into prompt structures.
  • Research Papers: Papers on LLMs may include prompt examples, offering a deep dive into their application and design.
  • Community Input: Forums like Stack Overflow and Reddit, along with AI-specific discussions, share firsthand experiences with various prompts.
  • Educational Content: Blogs, tutorials, and courses on AI often present practical prompt examples and crafting tips.

Identifying the LLM of interest and focusing on related resources can streamline finding applicable prompt syntax.

References

Great insights on prompt formatting for LLMs! ?? What impact do varied prompts have on outcomes? Arshad Mehmood

Vikas Tiwari

Co-founder & CEO ?? Making Videos that Sell SaaS ?? Explain Big Ideas & Increase Conversion Rate!

5 个月

Looking forward to reading your insights on prompt formatting nuances for LLMs!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了