WTF is Pre-training, Fine-tuning, and Prompt Engineering?

Claire (Yue) Xiao

PM Lead @ Google, ex-Facebook, ex-eBay

发布日期: 2023年9月12日

(From the discussion with LLM expert and leader @Meta: Bing Liu)

In the world of Natural Language Processing, the terms "Pre-training," "Fine-tuning," and "Prompt engineering" often crop up, leaving many scratching their heads. In this post, we'll break down these concepts to help you navigate the fascinating realm of Large Language Models.

Understanding Key Definitions and Concepts:

1. Pre-training:

At its core, a Large Language Model (LLM) aims to predict the next word in a sequence based on the words that came before it. Pre-training is the initial phase where we train the LLM to predict this next word more accurately. It's like teaching the model the nuances of language before fine-tuning it for specific tasks.

2. Fine-tuning:

Once we have a pre-trained LM, it's time to adapt it for real-world applications beyond text completion. Consider a chatbot; we want it to understand and respond to user queries. Fine-tuning involves giving the model thousands of prompt-response pairs to learn how to generate relevant responses. It tailors the LLM to produce output in the desired form for different use cases.

3. Prompt Engineering:

Prompt engineering is another way to shape the LLM's output to suit the real-world needs. It's a cost-effective alternative to fine-tuning since it doesn't alter the model's parameters. Instead, it produces desired response by tweaking the input to the model. There are various prompt engineering techniques, including:Zero-shot: Providing a detailed instruction to the model without providing examplesFew-shot: Providing a few examples in additional to the instructionChain-of-thought: Guiding the model to follow a logical chain of reasoning or step-by-step thinking in problem-solving.

When to Use Fine-tuning vs. Prompt Engineering:

Prompt engineering is cost-effective and has a one-time impact on the output, making it ideal for quick adjustments. However, you will have to pay a variable cost during the inference time, because prompt engineering content increases the # of tokens significantly.
Fine-tuning requires a substantial number of labeled examples, and its impact on the model's parameters is permanent but comes at a higher cost initially. This is the fixed cost that you will pay upfront (gather labeled examples, hire ML specialist to train the model)

Given the lower initial cost and ease of execution, prompt engineering is typically the first choice before considering fine-tuning.

Curious about how to gauge the quality of the AI product after training? Stay tuned for our next episode: Evaluating the Quality of Generative AI.

Other useful readings: Prompt Engineering Guide