Chat GPT unpacked.

Chat GPT unpacked.

My name is Aamir,( ????? ????? ????? ?????? ?????) and I am an AI researcher based in Melbourne. Welcome to my latest blog post on Chat GPT.

?

GPT Chatbot is a revolutionary technology that allows people to communicate with machines in a natural, human-like way. It is an artificial intelligence (AI) based system that can generate responses to user inputs, based on its knowledge of language and understanding of the context of the conversation. By using natural language processing and machine learning, GPT Chatbot can understand user inputs and generate relevant and personalized responses. GPT Chatbot can be used in a range of applications, from customer service and sales to medical diagnosis and more. With its ability to understand user intent and generate accurate responses, GPT Chatbot has the potential to revolutionize the way humans interact with machines.?

In this blog, we go under the hood and try to get some sense of how such technologies are created.?

Mass Language models:

?

Before we dive deep into the underlying technologies which underpin Chat GPT, let’s get a basic introduction to generative mass language models.

Mass language models are machine learning models that are used to generate natural language text. They are based on deep learning algorithms and use large datasets of natural language to learn how to generate text. They are used for a variety of tasks such as summarizing text, generating dialogue, translating text, question answering, and knowledge extraction.

One of the more specialized forms of Mass language models (MLM) is GPT, pioneered by Open AI which in turn is based on Google Transformers, GPT stands for Generative pre-training for transformers.

GPT-type models are trained using a technique called self-supervised learning. In this technique, the model is fed a large amount of text data, mostly scraped from the internet, and then asked to predict the next word or phrase that should follow. The model is then evaluated on its ability to accurately predict the next word given phrase. Over time, the model can learn the language structure and context of the text data, enabling it to accurately make predictions.?

When it comes to MLM, bigger is definitely better, as the size ( as in number parameters ) grows so does the capability of comprehending natural language. GPT 3 upon which Chat GPT is a neural network with 175 billion connections. Although bigger MLMs exist like Google Pathways with 540 billion connections, they are not publicly available.?

Some of the use cases of GTP-3 type models are.

?

? Natural language processing: GPT-3 can be used to develop natural language processing (NLP) applications such as dialogue systems, question-answering systems, and text summarization.?

? Machine translation: GPT-3 can be used to develop machine translation systems that can translate text from one language to another.?

? Text generation: GPT-3 can be used to generate text on a given topic, making it useful for applications such as story generation and creative writing.?

? Text classification: GPT-3 can be used to classify text into various categories such as sentiment analysis, topic detection, and spam detection.?

? Image captioning: GPT-3 can be used to generate captions for images, making it useful for applications such as photo albums.?

? Video captioning: GPT-3 can be used to generate captions for videos, making it useful for applications such as news video summaries.?

? Text summarization: GPT-3 can be used to generate summaries of the text, making it useful for applications such as news article summarization.?

?

However please be aware that GPT-3 is mostly trained on unfiltered text from the internet, it is known to generate false, misleading, or very toxic text.?

GPT-3 was a technology demonstrator and as such its language generation capabilities are not meant to be used by end users directly. ?

Open AI offers an API (Application programming interface) to be used by third-party developers to fine-tune the models for downstream tasks.?

One can think of raw GTP-3 as a jack of all and master of none. It has the potential to do a lot of things, however, it is in its current form it has to be further trained with desired objectives.?

Fine Tuning MLM.

?Large language models are excellent few-shot learners, what I mean by this is that they can be trained to produce specific outcomes given very few examples. This becomes very handy in areas where quality supervised data is hard to come by. Here are some of the ways to fine-tune GPT-type models.?

? Pre-training the model on specific domain data.

? Adjusting the model architecture to the specific task.

? Adjusting the hyperparameters of the model.

? Adjusting the training objective, such as adding a task-specific training loss.

? Augmenting the training data with task-specific examples.

? Introducing task-specific bias in the model.

? Generating task-specific data using hand-crafted rules or reinforcement learning.?

?

Chat GPT (A fined tuned version of GPT-3)

?

Now that you have some basic intuition behind MLM and how they can be fine-tuned for specific tasks, this is a nice segway to Chat-GPT.?Chat-GPT is a type of MLM that is designed to generate natural language responses to user inputs, allowing for natural conversations between humans and machines. This makes it particularly useful for creating virtual assistants, generating customer service conversations, and building interactive chatbots.

?

While the current generation of Chat-GPT is completely general purpose, there is no technical reason that future versions will be further specialized catering to Industries like healthcare, finance, education, and e-commerce, or even counseling and teaching. Each with its own specialized version of GPT.

?

The relationship between GPT-3 and Chat GPT can be understood with the following illustration.


No alt text provided for this image

How does Chat GPT work?

?

How do we go from a raw GPT to a conversational, filtered ( less toxic ) version of GPT-3? Before we get started on this path let’s understand a few terms related to MLM, High capability and low alignment and high alignment and low capability. Let’s unpack.

No alt text provided for this image

GPT-3 as it stands is an example of the first, which means that the model can do a lot more when it comes to generating text, however, in its raw state it is not aligned with any specific task. In simple terms GPT-3 has a lot of understanding of natural language, however, where it lacks is the capability to put it to good use.

?

Large Language Models such as GPT-3 are powerful tools trained on vast amounts of data from the internet which can generate human-like text. However, the output produced may not always align with human expectations or desirable values, as the objective function of these models is a probability distribution over the word and token sequences, allowing them to predict the next word in a sequence. As GPT-3 stands, the training objective is not to align itself with human expectations. To know more about various versions of GPT more details can be found here.

?

So how do we take an MLM from high capability and low alignment ( to human expectation ) to low capability and high alignment ( to human expectation )

?

Enter Reinforcement learning with human assistance, a new neural network is created which takes the output of GPT type models and outputs a scaler reward signal based on how human acceptable the answers are based on some prompt.

?

Here are the steps involved in going from GPT-3 to Chat-GPT.?

1.??Pre-train GPT-like models on data scraped from the internet, this could be hundreds of gigabytes

2.??Create another GPT-like model based on human supervision which takes in the output of a trained GPT and outputs a scaler reward signal on how well GPT is aligned to human expectation. This part can be thought of as how well GPT-3 follows instructions and produce text output acceptable to the end users.

3.??With constant human feedback during training this constant reward signal is fed back to the original GPT-3, GPT gets a higher reward when the output is aligned to human expectation and less reward or no reward when it is not.

4.??The training algorithm Proximal policy optimization (PPO) with the help of a reward signal pushes GPT-type models to align themself to the maximization of rewards or aligned to human expectations.

?

Here is a visual representation of the same.

No alt text provided for this image

Some Limitations of this type of training process.

?

Ultimately the GPT model PPO depends on its human-labeled data, thus prone to human bias, here are some of the limitations.?

●????The biases of the labelers who produce the demonstration data.

●????The researchers who designed the study wrote the labeling instructions.

●????Underlying the source of the data scraped from the internet, here quality matters. There could be some deep-rooted biases introduced into the core model which can be hard to get rid of.

●????Prompts crafted by the developers or provided by the Open AI customers. The model response can only be as good as the prompt, or the questions being asked as part of the training process.

●????Data labelers or researchers taking part in the training process may not be representative of all potential end users of the language model.

?

Finally, Chat-GPT-type models open up a whole new possibility in the areas of teaching, counseling, and customer service. Other companies like google also have similar products like LaMDA though not publicly released.

?

Microsoft has just announced a 10 billion dollar investment in Open AI I am confident with investments like these other players will also join the race to produce even more compelling products.?

It has been a pleasure writing this blog, I hope you have found it to useful. Till next time another day another topic.

Chandu Thulasiram PMP?,CSM?

Release Manager | Project Management | Delivery Manager | Certified Scrum Master | SAFe? 5.0 Agilist | Digital Transformation Consultant

1 年

Well written, good one Aamir Mirza

Sandip Patil

Senior Data Scientist at CUBE

1 年

Very well explanation in simple language!

回复

要查看或添加评论,请登录

Aamir Mirza的更多文章

  • EV: To be or not to be.

    EV: To be or not to be.

    Hi, my name is Aamir Mirza,( ????? ????? ????? ?????? ????? ) an AI researcher based in Melbourne, Australia. This blog…

  • GPT the next frontier. Multi-Modality a path to AGI.

    GPT the next frontier. Multi-Modality a path to AGI.

    AI-generated pod-cast here on Sound cloud Hi, my name is Aamir, ( ????? ????? ????? ?????? ?????) an AI researcher…

  • Age of truly immersive gaming, GPT meets NPC.

    Age of truly immersive gaming, GPT meets NPC.

    My name is Aamir Mirza an AI researcher based in Melbourne(????? ????? ????? ?????? ?????). In this blog, we shall look…

    1 条评论
  • The Age of Prompt Engineering.

    The Age of Prompt Engineering.

    My name is Aamir Mirza, (????? ????? ????? ?????? ?????) an AI researcher based in Melbourne. In this blog, we shall…

    4 条评论
  • GPT - Hallucination, toxicity, and other electric dreams

    GPT - Hallucination, toxicity, and other electric dreams

    My name is Aamir Mirza,(????? ????? ????? ?????? ?????) an AI researcher based in Melbourne. In this blog we shall…

    2 条评论
  • Security, What Security?

    Security, What Security?

    There has been a recent spate of high-profile online security breaches in Australia. Private details of millions of…

  • The Age of Declarative Programming.

    The Age of Declarative Programming.

    My name is Aamir Mirza ( ????? ????? ????? ?????? ?????) an AI researcher based in Melbourne Australia. If you are not…

  • Self Driving Cars, what they are and what they are not

    Self Driving Cars, what they are and what they are not

    My name is Aamir Mirza ( ????? ????? ????? ?????? ?????) an AI researcher based in Melbourne Australia. A few years…

  • Using Machine learning for Data anonymization.

    Using Machine learning for Data anonymization.

    Hi, my name is Aamir Mirza, a Data Scientist based in Melbourne Australia. It has been a while since I have posted any…

    2 条评论
  • From Strings to Things.

    From Strings to Things.

    How Data Scientist in @myob are transforming online search. Hi, my name is Aamir Mirza, a Data Scientist and…

    4 条评论

社区洞察