ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Top K, Top P, Temperature

Anuradha Mohapatra

Generative AI Consultant at HCL | Data Science | NLP | Machine Learning | Six Sigma Black Belt certified (ASQ)

å‘å¸ƒæ—¥æœŸ: 2024å¹´8æœˆ7æ—¥

Do you remember, during the 90s in Indian households, we were tuning the TV antenna to get a better visual or audio output on the TV? Keeping the same nostalgic thought in mind, let's discuss a few parameters that influence LLM output.

While working with LLMs in playgrounds such as Gemini, Azure, and Hugging Face, you may have observed controls/sliders to adjust the LLM output. Parameters like max_tokens, top_k, or top_p are mentioned in each slider. Next word generation is how the LLM generates the answer to the user. Let's see how these parameters influence our LLM model in generating the output.

Please don't confuse these parameters with the training parameters. The training parameters are learned during the training time. However, these configuration parameters are invoked at inference time and give you control over things like how creative the output can be and the maximum number of tokens in the completion.

Max_new_tokens is probably the simplest of these parameters, and you can use it to limit the number of tokens that the model will generate.

Let's discuss Top_k and Top_p.

The entire dictionary of words that the model uses has its probability distribution. (Well, we are talking about the output from the transformer's SoftMax layer. To understand how you are getting the output, please read the "Attention is All You Need" paper.)

Say, for example, we have the following words and their probability distribution:

é¢†è‹±æŽ¨è

Standards Wars: A brief history of Japan's losses and pyric victories - Updated

Standards Wars: A brief history of Japan's losses andâ€¦

James (Jim) H. 1 å¹´å‰

Memoirs of GB Episode 35: The Mouse Trap

Dr.Dinesh Chandrasekar (DC) 3 å¹´å‰

Throughput in Wi-Fi 7: 4096 QAM

Alethea Communications Technologies 11 ä¸ªæœˆå‰

Apple: 0.2, Orange: 0.1, Blueberry: 0.02, Banana: 0.03

The straightforward approach would be for the model to always choose the word with the highest probability. However, that implies repeated sequences of words. So, here, random sampling can be used to introduce some randomness. But at the same time, there has to be some balance between the randomness so that it should produce some sensible output. Two settings, top_p and top_k, are sampling techniques that we can use to help limit the random sampling and increase the chance that the output will be sensible.

You can specify a top_k value, which instructs the model to choose from only the k tokens with the highest probability. Say, if you have set k to 3, you are instructing the model to choose from the 3 tokens with the highest probability only. If you refer to the example here, the 3 highest probability words are Apple: 0.2, Orange: 0.1, and Banana: 0.03.

This approach will introduce some randomness but at the same time emphasize highly probable completion words. This, in turn, makes your text generation more likely to sound reasonable and thus makes sense.

We can have a Top_p setting as well, where to restrict the random sampling, we choose combined probabilities that do not exceed a p value that is less than or equal to p (<=p). Say, if you have set p to 0.3, you are instructing the model to choose from the combined probabilities. So, as per our example, the options are Apple: 0.2, Orange: 0.1.

One more parameter that you can use to control the randomness of the model output is temperature. This parameter influences the shape of the probability distribution that the model calculates for the next token. So, if you opt for a higher value of temperature, higher is the randomness. But if you want to set the temperature value low, the randomness will decrease. It also depends on the objective of your task. If you are a writer and want to load your content with creativity, you will use a higher temperature setting. But if you are working on scientific content, implies less temperature, less randomness, and a straight answer are better for you.

I hope this briefing helps you to understand these concepts better. Please share your input and thoughts.

Ashish Srivastava

Principal Consultant at AI & Cloud Native Labs - HCL Tech| Gen AI | Cloud Architect | AWS

7 ä¸ªæœˆ

Very Informative ??

èµž

å›žå¤

1 æ¬¡å›žåº”

æŸ¥çœ‹æ›´å¤šè¯„è®º

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Anuradha Mohapatraçš„æ›´å¤šæ–‡ç«

Probabilistic vs. Deterministic vs. Hybrid Approaches in GenAI s/w Development

2024å¹´11æœˆ3æ—¥

Probabilistic vs. Deterministic vs. Hybrid Approaches in GenAI s/w Development

The integration of #GenerativeAI (GenAI) models into software development presents unique challenges, primarilyâ€¦

3 æ¡è¯„è®º
Principles of Responsible AI

2023å¹´6æœˆ18æ—¥

Principles of Responsible AI

Responsible AI With the launch of OpenAI ChatGPT, AI has again resurfaced in the limelight. Now it is a hot topic inâ€¦
An Overview of Reinforcement Learning (RL)

2023å¹´6æœˆ6æ—¥

An Overview of Reinforcement Learning (RL)

ChatGPT is trained using Reinforcement Learning. You must have heard or read about it.

4 æ¡è¯„è®º
ML and DL chatting over a coffee

2023å¹´4æœˆ26æ—¥

ML and DL chatting over a coffee

On a bright sunny day, ML (Machine Learning) was at the airport and he saw DL (Deep Learning). He invited DL to joinâ€¦

6 æ¡è¯„è®º
Process to shortlist a Six Sigma project

2023å¹´4æœˆ5æ—¥

Process to shortlist a Six Sigma project

Objective: The objective of the Six Sigma methodology is to identify & eliminate the defects in a business processâ€¦

See all articles

Top K, Top P, Temperature

Anuradha Mohapatra

Generative AI Consultant at HCL | Data Science | NLP | Machine Learning | Six Sigma Black Belt certified (ASQ)

é¢†è‹±æŽ¨è

Anuradha Mohapatraçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

2-Step RACH in 5G-NR

Flash Memory in Embedded Systems

The EVs you can trust: from Electric Vehicles to Electronic Voting

LTE PDCCH

LTE PDCCH Part 1 (DCI and RNTI)

USB Protocol in Depth â€“ Protocol Layer

View the past of any location from a computer screen

Lawyers will have a field day with 5G.

My Second Brain

Traffic Signals - Part 9 - Isolated Modes of Operation

é¢†è‹±æŽ¨è

Anuradha Mohapatraçš„æ›´å¤šæ–‡ç«

Probabilistic vs. Deterministic vs. Hybrid Approaches in GenAI s/w Development

Principles of Responsible AI

An Overview of Reinforcement Learning (RL)

ML and DL chatting over a coffee

Process to shortlist a Six Sigma project

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

2-Step RACH in 5G-NR

Flash Memory in Embedded Systems

The EVs you can trust: from Electric Vehicles to Electronic Voting

LTE PDCCH

LTE PDCCH Part 1 (DCI and RNTI)

USB Protocol in Depth â€“ Protocol Layer

View the past of any location from a computer screen

Lawyers will have a field day with 5G.

My Second Brain

Traffic Signals - Part 9 - Isolated Modes of Operation

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†