Benefits of Training LLMs with RLHF
What are the benefits of training LLMs with RLHF?
The best way to show this is with examples. Let’s have a look!
At a high-level, reinforcement learning with human feedback (RLHF) is a technique for training large language models that has been critical to OpenAI's ChatGPT and InstructGPT models, Anthropic's Claude, and more.
RLHF enables training LLMs to understand instructions and generate helpful responses.
Next-token Prediction and RLHF
Standard LLMs are trained to predict the next token. If you just use a simple prompt like shown in the image, it will just continue the prompt. (we’re using OpenAI’s davinci model)
Ideally, the model should have responded with the translation. However, davinci wasn’t trained that way. The LLM doesn’t understand the request and desired output. RLHF helps with this by tuning LLMs to understand instructions.?
RLHF LLM
The expected response is a direct translation. In contrast, if we use one of OpenAI’s RLHF LLMs (text-davinci-003), we get exactly what we expected. This model has been tuned to satisfy instructions using the impressive next-token prediction capabilities of language models.
Let’s continue looking at more advanced examples…
Code Generation with LLMs
Let’s say we want to use LLMs to generate code snippets. A non-RLHF model like davinci outputs the following.?
Many things are wrong with the output but we can start to see why a good next-word prediction capability alone is not enough.
领英推荐
How about the RLHF model? Let’s try it!
We didn’t even need to structure the prompt to instruct the model. We just used plain English. @karpathy puts it best: “The hottest new programming language is English”. This is all possible with the latest developments in LLMs + RLHF.
But there’s more…
LLMs for Reasoning
Besides basic text generation tasks and code generation, another important area in LLMs is reasoning. Let’s take a look at arithmetic reasoning as an example. LLMs generally struggle with this but they are improving.
The non-RLHF output seems correct but with a bunch of text that’s not needed.
The RLHF model output is a direct clean answer. No additional text. The model has been tuned to address math problems like this. This is a basic example but with more complex examples, it becomes more challenging, and more careful aligning is needed. Deep expertise is needed to annotate datasets for RLHF LLMs. Our team can assist if you need help with this.
AI Safety
Beyond the tasks above, another important feature that matters with LLMs is ensuring they are safe and that they don't generate harmful responses. RLHF is also a key component of training safe LLMs. Red teaming is essential to ensure safer LLM systems.
Example of non-RLHF output:
Now let's look at the RLHF LLM output. We can see that this particular model is tuned to avoid returning unsafe explanations about activity that’s considered illegal. Safety is a key area in AI and RLHF is helping to make a lot of progress in this area.
If you found this content useful, find more examples in one of our recent blog posts:?