Large Language Models - How are the OpenAI GPT models trained?
Ananya Ghosh Chowdhury
Data and AI Architect at Microsoft | Public Speaker | Startup Advisor | Career Mentor | Harvard Business Review Advisory Council Member | Marquis Who's Who Listee | Founder @AIBoardroom
Large Language Models (LLMs) are based on the principles of neural networks, which are networks of artificial neurons connected together in layers. Each neuron can receive inputs from other neurons and produce an output determined by its weights, which are adjusted as the model is trained. LLMs work by taking in large amount?of text data and use that to learn the relationships between words and phrases.?
?
There are two types of language models:?
?
GPT stands for Generative pre-trained transformer. GPT-3, GPT-4,?ChatGPT?are all Large language models that utilizes deep learning to perform specific tasks; they?are aligned with user intent on a wide range of tasks by fine-tuning with human feedback.?
?
How are the?OpenAI?language models trained to follow instructions ??
The process starts with a set of labeler-written prompts and prompts submitted through the?OpenAI?API,?OpenAI?collects a dataset of labeler demonstrations of the desired model behavior which they use to fine-tune GPT-3 using supervised learning. Next, a dataset of rankings of model outputs is collected and used to further fine-tune this supervised model using reinforcement learning from human feedback : the resulting models are called?InstructGPT. In human evaluations, outputs from the 1.3B parameter?InstructGPT?model are preferred to outputs from the 175B GPT-3, as they show improvements in truthfulness and reductions in toxic output generation while having minimal performance regressions on public NLP datasets. The?results show that fine-tuning with human feedback is a promising direction for aligning language models with human intent.??
To start with,?there is a pretrained language model,?a distribution of prompts on which the model is to produce aligned outputs, and a team of trained human labelers , then the following three steps are followed:?
Step 1: Supervised fine-tuning (SFT):Collect demonstration data, and train a supervised policy -?
The labelers provide demonstrations of the desired behavior on the input prompt distribution ,?a pretrained GPT-3 model is fine-tuned on this data using supervised learning.??
Step 2: Reward model (RM) training :?Collect comparison data, and train a reward model -?
?A dataset of comparisons between model outputs is collected, where labelers indicate which output they prefer for a given input; a reward model is then trained to predict the human-preferred output.??
Step 3: Reinforcement learning via proximal policy optimization (PPO) on this reward model:?Optimize a policy against the reward model using PPO -?
The output of the RM is used as a scalar reward; the supervised policy is fine tuned to optimize this reward using the PPO algorithm.??
Steps 2 and 3 can be iterated continuously; more comparison data is collected on the current best policy, which is used to train a new RM and then a new policy.??
?
Different OpenAI?Models:?
?
GPT-4?
GPT-4 is a large-scale, multimodal , transformer-style model?(accepting text inputs and emitting text outputs today, with image inputs coming in the future)??pre-trained to predict the next token in a document, using both publicly available data (such as internet data) and data licensed from third-party providers, then fine-tuned using Reinforcement Learning from Human Feedback (RLHF). It?improves on GPT-3.5 and can understand as well as generate natural language or code.?GPT-4 substantially improves the previous?OpenAI?models in the ability to follow user intent,?to understand and generate natural language text, particularly in more complex and nuanced scenarios.?GPT-4 is optimized for chat but works well for traditional completions tasks.?
The figure below shows the performance of GPT-4 in a variety of languages compared to prior models in English on Massive Multitask Language Understanding :?
?
The latest model is?gpt-4?and can handle a maximum of?8,192 tokens.?
GPT-3.5?
GPT-3.5 models can understand and generate natural language or code. The most capable and cost effective model in the GPT-3.5 family is?gpt-3.5-turbo?which has been optimized for chat but works well for traditional completions tasks as well; it can handle a maximum of?4,096 tokens?
领英推荐
?
GPT- 3?
GPT-3 is one of the largest publicly-disclosed language models — it has 175 billion parameters and was trained on 570 gigabytes of text. For comparison, its predecessor, GPT-2 (which is functionally similar to GPT-3) has 1.5 billion parameters and was trained on 40 gigabytes of text. While GPT-2 displayed some zero-shot generalization to downstream tasks, GPT-3 further displayed the ability to learn more novel tasks when given examples in context. It?has an unusually large set of capabilities, including text summarization, chatbot behavior, search, code generation, and essay generation?
?
ChatGPT?
ChatGPT?is a sibling model to?InstructGPT, it is a?conversational AI model that can chat with the users, answer follow-up questions, and challenge incorrect assumptions.?
?
DALL-E?
DALL·E is a AI system that can create realistic images and art from a description in natural language, it can create a new image with a certain size, edit an existing image, or create variations of a user provided image.?
?
Whisper?
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification.?
Embeddings?
Embeddings are a numerical representation of text that can be used to measure the relatedness between two pieces of text, and are?useful for search, clustering, recommendations, anomaly detection, and classification tasks.?text-embedding-ada-002?is designed to replace the previous 16 first-generation embedding models at a fraction of the cost.?
?
Moderation?
The Moderation models?provide classification capabilities that look for content in the following categories: hate, hate/threatening, self-harm, sexual, sexual/minors, violence, and violence/graphic.?
?
Bringing it all together, GPT models are LLMs that are pre-trained with massive amount of data and can perform variety of natural language processing tasks (including Natural Language Understanding and Natural Language Generation) ; these models are finetuned by?using Reinforcement Learning from Human Feedback (RLHF) to understand human intent and can be further be fine-tuned for specific use cases.??
This could?change the way businesses run today and reinvent existing enterprise systems.?For example, a GPT model can be used to generate a personalized report for a customer, based on their preferences and past interactions and?allow businesses to provide a more customized and personal experience for their customers, which could lead to increased customer satisfaction and loyalty. Additionally, GPT models could be used to streamline and automate many business processes, such as customer service or order fulfillment. This would free up employees to focus on more creative and strategic tasks, leading to increased efficiency and productivity. To summarize, GPT models have the potential to revolutionize the way businesses operate, making them more customer-centric and efficient.?
?
?
References:?
?
Senior IT Leader Harnessing Transformational Technologies to Drive Business Growth | Sales GTM Strategy Leader | Applied Data & AI/ML Practitioner-Scholar | Data Science Fellow (DS4A) | Board Member
1 年Very thorough analysis and breakdown of LLMs and how they are trained. Thanks Ananya Ghosh Chowdhury !!
Leadership | Full Stack| Enterprise Architecture | Micro services| LLM | Generative AI | Building next-generation platforms | Distributed Systems
1 年So informative.