Explore the Evolution of GPT-3, the World's Most Influential Language Model: From its Humble Beginnings to Today's ChatGPT - Happy Friday!
Gregory Renard
Head of Applied AI / ML · FDL DOE SETI NASA 2022 AI Award · 20+ Yrs in NLP & Frugal AI · Driving Companies to Success & Excellence · TEDx, Stanford & UC Berkeley Lecturer · Co-Initiator of AI4Humanity for France
Happy Friday, Tech enthusiasts!
Are you ready to dive deep into the world of deep tech? Because today, we're going to be talking about the evolution of GPT-3, one of the most influential language models in the field.?
From its humble beginnings as a language generator to its current status as a powerful tool for in-context learning, world knowledge, ChatGPT, GPT-3 has come a long way.?
So grab your favorite happy Friday beverage and get ready to explore the depths of deep tech of OpenAI and GPT-3, ChatGPT history !!!
...
GPT-3, or Generative Pre-training Transformer 3, is a language model developed by OpenAI that has the ability to generate language, learn context, and possess a wealth of world knowledge. These abilities are achieved through large-scale pretraining on 300 billion tokens of diverse data, including language modeling objectives, training corpora, and model size for storing knowledge.
While the initial GPT-3, also known as DaVinci in the OpenAI API, has demonstrated reasonable responses to queries and decent performance on many benchmarks, it has been found to underperform smaller models such as T5 on certain tasks.?In comparison to today's ChatGPT standard, the initial GPT-3 may not be considered "smart," as demonstrated by Meta's OPT model, which is viewed as subpar compared to text-DaVinci-002. However, OPT may still serve as a good approximation of the initial GPT-3.
Despite its initial weaknesses, the abilities of the initial GPT-3 serve as important foundations for the development of more advanced abilities through training on code, instruction tuning, and reinforcement learning with human feedback (RLHF that I’m absolutely in love as people centric believer and human augmentation with AI).
OpenAI's GPT-3 language model has undergone a series of evolutions, culminating in the release of ChatGPT in November 2022.?
The initial GPT-3 model, released in July 2020, was fine-tuned to create the Codex model in July 2021. This was followed by the release of the instruction tuning paper in March 2022, which introduced models such as davinci-instruct-beta and text-davinci-001.?
From April to July 2022, OpenAI beta tested the code-DaVinci-002 model, also known as Codex, which is considered to be the most advanced GPT-3 variant for natural language processing due to its training on both text and code and subsequent instruction tuning.?
Subsequent models, such as text-davinci-002 and text-davinci-003, have focused on improving zero-shot abilities through instruction or prompt tuning (a type of model I’m absolutely in love with and already deployed in production for many applied AI projects in different companies I'm involved in), while ChatGPT has prioritized the ability to model dialog context. These models have been developed through a process of fine-tuning and instruction tuning, which has allowed them to unlock and elicit their existing abilities, while also adjusting their skillsets towards different applications and prioritizing alignment with humans(*).
(*1) Zero-Shot: Zero-shot learning is a machine learning paradigm in which a model is able to perform a task or classify an instance without having received explicit training on examples of that task or class. This is achieved through the use of a model that has been trained on a diverse and comprehensive dataset and has the ability to generalize its knowledge to new tasks or classes. In zero-shot learning, the model is not provided with any additional training data for the new task or class but is expected to be able to complete the task or classify the instance based on its preexisting knowledge. This technique is useful in situations where it is not feasible to gather a large quantity of training data for a specific task or class, or when the task or class is rare and obtaining a sufficient number of examples for training is challenging. Zero-shot learning can be difficult to achieve due to the inherent difficulty in generalizing knowledge to new situations. However, it has the potential to significantly expand the capabilities of machine learning models and allow them to perform a wide range of tasks and classify a diverse range of instances.
(*2) Alignment: Alignment with humans refers to the ability of a machine-learning model to behave in ways that are consistent with human expectations and preferences. This can involve following instructions or performing tasks in a way that is expected or desired by humans, or generating output that is considered to be appropriate or acceptable by humans. Alignment with humans is an important consideration when developing and deploying machine learning models, as it helps to ensure that the model is functioning in a way that is beneficial and useful to humans. In the context of language models, alignment with humans might involve generating text that is easy for humans to understand, following instructions or prompts in a way that is expected or desired or generating responses that are appropriate for a given context or situation. Ensuring alignment with humans can be challenging, as it requires taking into account the diverse preferences and expectations of different individuals.
Code-davinci-002 and text-davinci-002 are two advanced language models developed by OpenAI that exhibit a range of important abilities. These models have the ability to respond to human instructions, generalize to unseen tasks, and demonstrate complex reasoning with chain-of-thought. The development of these abilities is likely the result of both instruction tuning and training on code, although the specific mechanism by which training on code leads to complex reasoning and chain-of-thought abilities is not yet fully understood. In addition to these abilities, code-davinci-002 and text-davinci-002 also exhibit long-term dependency capabilities, which allow them to understand and generate text with multiple layers of context. These models represent a significant advancement in language modeling and have the potential to be deployed in a variety of applications.
领英推荐
There are currently few strict statistical comparisons available between the language models text-davinci-002, text-davinci-003, and ChatGPT due to the recent release of text-davinci-003 and ChatGPT and the lack of availability of ChatGPT through the OpenAI API. However, it is believed that some initial descriptive comparisons can still provide insights into the underlying mechanisms of these models.?
All three models are instruction-tuned, with text-davinci-002 being a supervised instruction-tuned model and text-davinci-003 and ChatGPT being instruction-tuned using Reinforcement Learning with Human Feedback (RLHF). The use of RLHF is the main distinguishing factor between these models and is responsible for the emergence of certain abilities, including informative responses, impartial responses, the ability to reject improper questions, and the ability to reject questions outside of the model's knowledge scope.?
It is important to note that these abilities are intrinsic to the model and are not injected by RLHF, but are instead triggered or unlocked by RLHF. Additionally, the ability to know what it does not know is not achieved through the use of rules but is also unlocked by RLHF. It is thought that ChatGPT trades in-context learning for the ability to model dialog history, while text-davinci-003 recovers the in-context learning ability lost by text-davinci-002 and improves zero-shot ability, potentially through the use of LM mixing during the RL tuning stage.
The GPT-3.5 series, while a major advancement in natural language processing (NLP) research, still lacks certain desired properties. One notable limitation is the inability to overwrite beliefs on-the-fly. This means that once the model expresses a belief about something, it can be difficult to correct it even if the belief is incorrect. For example, ChatGPT may insist that 43,911 is a prime number even after acknowledging that 43,911 equals 357 x 123. While there does seem to be a hierarchy of belief strength, with some beliefs being more strongly held than others, this limitation can still be problematic in certain situations.
Another limitation of GPT-3.5 is its inability to perform formal reasoning within strict systems such as mathematics or first-order logic. While the model is able to handle reasoning with ambiguity, such as generating a procedure for cooking pizza or a proof sketch of a theorem, it is not able to perform strict, formal reasoning that does not tolerate ambiguity. This means it is unable to derive strict proofs that require no mistakes in intermediate steps.
A third limitation of GPT-3.5 is its inability to directly search the internet for information. While there has been a recent paper published on WebGPT, it is not currently available for public use. It is important to note that while the GPT-3.5 series has a significant amount of knowledge and reasoning capabilities, it may be more efficient to offload the knowledge portion to an external retrieval system and allow the language model to focus on reasoning tasks.
Finally, GPT-3.5 currently lacks the ability to perform commonsense reasoning and explain its reasoning process. While the model is able to generate responses that seem to exhibit common sense, it is not able to fully understand and explain the underlying reasoning behind those responses. This is a difficult task for any AI system, as common sense knowledge is often implicit and not explicitly stated. Overall, while the GPT-3.5 series represents a significant step forward in NLP research, there is still much room for improvement and further development.
So, in conclusion, and before moving the happy hour time, it’s important to remind the GPT-3.5 series of language models has made significant advances in natural language processing, with a range of abilities that have been developed through various stages of training and tuning.?
From the initial generation ability and world knowledge gained through pretraining, to the ability to follow instructions and generalize to new tasks through instruction tuning, and the potential for complex reasoning through training on code, the GPT-3.5 models offer a range of capabilities that make them highly valuable tools for a wide range of applications.?
The addition of alignment with humans through supervised instruction tuning and reinforcement learning with human feedback (RLHF) has further enhanced the capabilities of these models, enabling them to generate more informative and impartial responses while also being able to reject questions outside of their knowledge scope.
Overall, the GPT-3.5 series represents a major milestone in natural language processing and has the potential to revolutionize a wide range of industries and applications.
So why wait to get started with it fully? Let’s do it and Take control of your destiny!
Happy Friday!
For more information, check out this fabulous article:?https://yaofu.notion.site/How-does-GPT-Obtain-its-Ability-Tracing-Emergent-Abilities-of-Language-Models-to-their-Sources-b9a57ac0fcf74f30a1ab9e3e36fa1dc1?