Benefits of Training LLMs with RLHF

Benefits of Training LLMs with RLHF

What are the benefits of training LLMs with RLHF?

The best way to show this is with examples. Let’s have a look!

At a high-level, reinforcement learning with human feedback (RLHF) is a technique for training large language models that has been critical to OpenAI's ChatGPT and InstructGPT models, Anthropic's Claude, and more.

RLHF enables training LLMs to understand instructions and generate helpful responses.

Next-token Prediction and RLHF

Standard LLMs are trained to predict the next token. If you just use a simple prompt like shown in the image, it will just continue the prompt. (we’re using OpenAI’s davinci model)

Ideally, the model should have responded with the translation. However, davinci wasn’t trained that way. The LLM doesn’t understand the request and desired output. RLHF helps with this by tuning LLMs to understand instructions.?

No alt text provided for this image

RLHF LLM

The expected response is a direct translation. In contrast, if we use one of OpenAI’s RLHF LLMs (text-davinci-003), we get exactly what we expected. This model has been tuned to satisfy instructions using the impressive next-token prediction capabilities of language models.

No alt text provided for this image

Let’s continue looking at more advanced examples…

Code Generation with LLMs

Let’s say we want to use LLMs to generate code snippets. A non-RLHF model like davinci outputs the following.?

Many things are wrong with the output but we can start to see why a good next-word prediction capability alone is not enough.

No alt text provided for this image

How about the RLHF model? Let’s try it!

We didn’t even need to structure the prompt to instruct the model. We just used plain English. @karpathy puts it best: “The hottest new programming language is English”. This is all possible with the latest developments in LLMs + RLHF.

No alt text provided for this image

But there’s more…

LLMs for Reasoning

Besides basic text generation tasks and code generation, another important area in LLMs is reasoning. Let’s take a look at arithmetic reasoning as an example. LLMs generally struggle with this but they are improving.

The non-RLHF output seems correct but with a bunch of text that’s not needed.

No alt text provided for this image

The RLHF model output is a direct clean answer. No additional text. The model has been tuned to address math problems like this. This is a basic example but with more complex examples, it becomes more challenging, and more careful aligning is needed. Deep expertise is needed to annotate datasets for RLHF LLMs. Our team can assist if you need help with this.

No alt text provided for this image

AI Safety

Beyond the tasks above, another important feature that matters with LLMs is ensuring they are safe and that they don't generate harmful responses. RLHF is also a key component of training safe LLMs. Red teaming is essential to ensure safer LLM systems.

Example of non-RLHF output:

No alt text provided for this image

Now let's look at the RLHF LLM output. We can see that this particular model is tuned to avoid returning unsafe explanations about activity that’s considered illegal. Safety is a key area in AI and RLHF is helping to make a lot of progress in this area.

No alt text provided for this image

If you found this content useful, find more examples in one of our recent blog posts:?

https://www.surgehq.ai/blog/introduction-to-reinforcement-learning-with-human-feedback-rlhf-series-part-1

要查看或添加评论,请登录

社区洞察

其他会员也浏览了