Understanding Prompt Engineering Hyperparameters for Enhanced Performance of LLMs
Last week, we explored how to create effective prompts to ensure the desired result from Large Language Models (LLMs). This week, as promised, we will dig deeper into understanding hyperparameters and how to adjust them to suit different scenarios.
In our previous discussion, I outlined several parameters: temperature, top p, top k, max tokens, stop sequence, frequency penalty, and presence penalty. We learned that temperature controls the degree of randomness in the language model's responses. In addition to that, temperature influences the softmax function of the LLM. A higher temperature (more entropy) leads to a more uniform output distribution, while a lower temperature (less entropy) results in a sharper output distribution as shown in below gif.
Now that we have understood the concept of temperature, let's have a look at the other parameters.
1. Context Window - Managing the extent of input text for review:?
The context window is like the working memory of LLM i.e. the input text that it can analyze for generating responses. For example, if you provide an article to the LLM and want to summarize each paragraph (containing 60-70 tokens) into a single line, adjusting the context window to 60 tokens will enable the model to scan through the article paragraph wise without the context of the previous paragraph. Below are the context window capacities of the 2 popular LLMs:
GPT-4: 128,000 tokens
Gemini: 1,000,000 tokens
Widening the context window comes with both advantages and disadvantages. On one hand, it improves the model's understanding and its ability to produce relevant text. Conversely, it also imposes a significant computational burden, resulting in the need of more processing power and time.
2. Max Token - Limiting the amount of output text generated:
Utilizing Max tokens or Token Limit can regulate the length of output tokens produced by the LLM. OpenAI estimates that one token equates to approximately 4 characters (Token Calculator: https://platform.openai.com/tokenizer).
For example, setting the max token limit to 27-30 will prompt the LLM to generate concise, one-liner answers, which could be useful in some cases.
3. Top p, Top k - Variation in the generation of the tokens:
Temperature, top p, and top k parameters help control the randomness of the output. It's crucial to adjust these values to achieve the desired output. While temperature affects the softmax function, top p and top k use different approaches to select tokens from the pool of selected random tokens by the softmax function.
In top k, the LLM picks from the next best k tokens with the highest probability. In top p (nucleus sampling), the LLM selects the smallest number of top tokens such that their cumulative probability is at least p.
4. Stop sequence - Indicating when to stop:
The characters specified in stop sequence dictate where the LLM will stop text generation. For example, if you set the stop word as ".", the LLM will generate text only until it encounters the "." character.
5. Frequency and Presence penalty - Controlling token repetition:
Frequency penalty?parameter helps the model in avoiding the generation of repetitive tokens, such as words or phrases. As the frequency of a token's repetition increases, so does the penalty associated with that token, thereby limiting its generation.?
Unlike frequency penalty, presence penalty applies the same penalty to repeated tokens regardless of their frequency of repetition. This encourages the model to utilize tokens other than those subjected to the presence penalty.
Personal Experience:
In my experience,? if the goal is to generate responses in a scripted manner similar to a classifier, setting the temperature to a lower value and defining max tokens value, top p and top k can help mitigate the generation of irrelevant or fake data and reduce the cost associated with generating output tokens.
However, if the objective is to enhance the LLMs’ creativity and enable it to engage with customers more dynamically or with longer texts , increasing the temperature value, along with considering frequency penalty, and presence penalty become equally valuable.
Examples of the use cases of gpt-3.5-turbo model:
For customer service chatbot for a Tennis Club I used the below parameters:
temperature=0.27,
max_tokens=70,
top_p=0.56,
frequency_penalty=0,
presence_penalty=0
For an AI that helps humans through difficult times I used the below parameters:
temperature=1,
max_tokens=105,
top_p=1,
frequency_penalty=0,
presence_penalty=0.15,
You can adjust the above based on the response of the LLM to input values.
Conclusion:
In conclusion, understanding and adjusting parameters are key to optimizing the performance of Large Language Models (LLMs) to unlock its full potential. These parameters play critical roles in controlling the randomness of responses, managing input and output text lengths, and controlling token variation for specific objectives. Whether aiming for scripted responses or fostering creativity and dynamic engagement, the appropriate selection of parameter values based on model's response is essential.
?? Try out tweaking these parameters yourself:
?? ChatGPT playground: https://platform.openai.com/playground/
?? Gemini playground: https://huggingface.co/spaces/Roboflow/Gemini
?? Mistral AI playground: https://mistral-playground.azurewebsites.net/
In my upcoming article we will further discuss how to reduce cost of using these LLMs in the best possible way. Hope you enjoyed the article. Feel free to leave a comment if the article was useful to you or you have anything to discuss.
Happy reading ?? . For more such articles subscribe to my newsletter: https://lnkd.in/guERC6Qw
I would love to connect with you on Twitter: @MahimaChhagani. Feel free to contact me via email at [email protected] for any inquiries or collaboration opportunities.
NSV Mastermind | Enthusiast AI & ML | Architect AI & ML | Architect Solutions AI & ML | AIOps / MLOps / DataOps Dev | Innovator MLOps & DataOps | NLP Aficionado | Unlocking the Power of AI for a Brighter Future??
9 个月Can't wait to dive into this article on maximizing LLM performance! ??