Understanding how OpenAI gets ChatGPT to follow human instruction
Dalle-2's representation of "3D render of the structure of a Large Language Machine Learning Model, digital art".

Understanding how OpenAI gets ChatGPT to follow human instruction

TLDR: Large Language Models (LLMs) that are trained on vast amounts of unprocessed text may not be precisely aligned with the user's intended purpose. This misalignment can be reduced by supplementing the training process with labeled examples that incorporate human feedback. Additionally, model performance can be further enhanced by employing human-guided reinforcement learning.

A few years ago, when generative AI for text was not yet significantly advanced, I experimented with various AI’s to assist with text generation and editing. However, I found that none of them were truly helpful until I discovered ChatGPT . I am now a fan of ChatGPT, more so than I expected to be (it even helped edit this article).

The reason for this is that ChatGPT is remarkably useful, providing helpful answers that are responsive to the framing of the input it receives. This is a deliberate design feature on the part of ChatGPT's creator, OpenAI . Today, I will delve into the theoretical underpinnings of their design by exploring the paper published on InstructGPT .

What are the authors trying to accomplish?

Modern large language models (LLMs) have billions of parameters and require vast amounts of text data to acquire a comprehensive understanding of language concepts and ideas. Due to limitations in scaling, labeled datasets of sufficient size cannot be created for the model, so these datasets are usually sourced from the internet and used for training models for generic language operations like predicting the next word in a sequence or filling in a masked word in a sentence.

However, this can cause issues because the training objective of the model (predicting the next word) may not align with the ultimate goal of the model, which is to generate useful text for humans (which includes components like following instructions and remaining truthful). The authors aim to address this misalignment by modifying the training incentives while working within the constraints of scaling a supervised dataset.

What are the key elements?

The InstructGPT approach starts with a pretrained GPT-3 model and then fine-tunes the training process using labeled, curated responses. This is known as supervised fine-tuning (SFT) and can be thought of as a transfer learning approach where the core model concepts are learned from raw text and transferred to an objective with a smaller labeled dataset. Sampled outputs are then taken form this SFT model and ranked by human labelers, creating a dataset that is used to train another model, called a Reward Model (RM), which can reflect the users' preferences. Finally, this RM model is used to further optimize the SFT model using a reinforcement learning technique called Proximal Policy Optimization .

No alt text provided for this image

It's worth noting that only the first and third steps in the process involve modifying the actual model, while the second step serves as preparation for reinforcement learning at scale. Interestingly, when looking at the model preference scores (which indicate usefulness by measuring the likelihood of a user preferring one model to another), it's possible to observe the specific impacts of each of the two steps that affect model development.

No alt text provided for this image

OpenAI discovered that this training methodology resulted in a significant improvement in the user labeler perception of the models' usefulness (see above), as well as in the truthfulness of the model outputs. Additionally, they noted that there was minimal performance impact on traditional NLP datasets as a result of the fine-tuning and reinforcement learning process. They also observed that concepts such as "follow instructions" were able to generalize to topics far beyond those included in the training set. As a result, OpenAI now uses InstructGPT models for all of their APIs.

What can you use yourself?

For those conducting research in information understanding, such as multimodal machine learning, this case can serve as a template for integrating human preferences into training processes that heavily depend on vast amounts of unstructured inputs. Additionally, there is a substantial potential for enhancing the presented methods by developing more effective ways of translating preferences into machine instruction. For example, the paper itself highlights that the ranking system used to train the reward model may be less than optimal.

For those working on more practical machine learning problems, the paper serves as a reminder that problem framing is crucial.? While language models are usually more useful when they are larger, the researchers found that with this approach, labelers preferred a 1.3 billion parameter network with fine-tuning and reinforcement learning to 175 billion parameter networks that had either no additional training or only fine tuning. The correct framing of the problem made a difference that was more significant than a 100x increase in parameters. As we think about creating target labels and selecting loss functions for any machine learning application, this is an example worth keeping in mind.

Finally, if you just like to use ChatGPT to make your editing faster and answer quick questions, this helps give some intuition as to how this product became so eminently useful as compared to previous LLMs.


About this post

This article is part 2 of a 5 part series summarizing and analyzing academic literature in machine learning, specifically focusing on Generative AI. This edition is on the method OpenAI uses to train its Large Language Models (LLMs) to follow human instruction .?Other topics include SHAP for model explainability , Multimodal Generative AI , the Transformer Architecture , and?the ADAM optimizer .

Kevin Ortiz (He/Him)

Talent Specialist and Future Web Developer @ Scalable Path

7 个月

Like InstructGPT, ChatGPT’s training process involves a machine learning technique called fine-tuning, which aims to improve the performance of a pre-trained model on a specific task. Pre-trained models have been trained on a large amount of data, typically for a different task than the one they are being fine-tuned for. The pre-trained model used for ChatGPT was trained to predict the next word in a sentence based on the context of the previous words. The training dataset included a vast amount of text data from books, websites, and other sources. While this training was successful, it needed further refinement for the model to provide personalized and accurate outputs. I highly recommend this article from my colleague, Calin Cretu, a Machine Learning Engineer, that describes how ChatGPT works: https://www.scalablepath.com/machine-learning/chatgpt-architecture-explained

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了