How Large Language GPT models evolved and work

How Large Language GPT models evolved and work

Generative Pre-Training (GPT) models are trained on unlabeled dataset (which are available in abundance). So the models were trained on the unlabeled data set and then fine tuning the model on specific annotated dataset.

These models perform way better than the previous state of the art models. For example, a model can be trained on Wikipedia dataset and then that model can be fine-tuned on sentiment analysis dataset.

Challenges with previous language models

Natural Language Tasks include various tasks such as semantic similarity assessment, textual entailment, document classification and question answering. Previously to train any models on these tasks, special data set curated for such tasks were required. But, the problem was to acquire such dataset was very difficult and even if we have such a dataset, we can't have large corpus. The other problem was that such models failed miserably on other NLP tasks as they were especially trained on specific tasks.

How the GPT models evolved

GPT-1

Open AI came up with the first iteration of Generative Pre-Training-1 (GPT-1). It was trained on books Corpus data set which has 7000 un-published books. GPT model was based on Transformer architecture. It was made of decoders stacked on top of each other (12 decoders). These models were same as BERT as they were also based on Transformer architecture. The difference in architecture with BERT is that it used stacked encoder layers. GPT model works on a principle called autoregressive which is similar to one used in RNN. It is a technique where the previous output becomes current input.

GPT Model leverages semi-supervised learning which include first performing unsupervised pre-training and then supervised fine-tuning.

No alt text provided for this image

The architecture of the GPT is shown above. It shows that it uses 12 layers of decoder with 12 attention heads in each self-attention layer. It contains masked self attention which is used for training the model. The architecture was very similar to the original transformer architecture. Masking helps where the model doesn't have access to the words to the right side of the current word.

GPT-2

Open-AI came up with the second generation of Generative Pre-Training model, gpt2. Architecture is based on same philosophy as that of GPT-1 (stacked decoder layers), but was made larger by using a very large data set and adding more parameters to the model. The model was trained on very large data set which was scrapped from reddit and which they called the WebText data set. It was around 40 GB of text data with 8 million documents. The model had way more parameters than the previous edition of gpt, around 10 more than gpt-1 (1.5 billion). In this edition of GPT model, layers of the decoders stacked on each other, 48 to be precise. GPT-2 showed that between a model trained on a larger data set and with more parameters can increase the accuracy of the model.


The authors of GPT-2 trained four different models. First was with 117 million parameters (same as GPT-1), second was with 345 million parameters, third was with 762 million parameters and fourth one was with a whooping 1.5 billion parameters (GPT-2). Each subsequent model performed better than the previous one and had lower perplexity than previous one.

GPT-3

In 2020, Open-AI came up with another edition of GPT model, GPT 3, which has over around 175 billion parameters, which was around 10 times more than Microsoft NLG model and around 100 times more than the previous GPT-2. It was trained on 5 different Corpus each having different weight assigned. They were webtext, books-1, books-2, Common Crawl, WebText2, Wikipedia. Applications of GPT-3 include generating text, writing podcasts, writing legal documents. This doesn't here as it can write website or ML codes too!

GPT 3.5

GPT-3.5 is based on GPT-3 but work within specific policies of human values and only 1.3 billion parameter fewer than previous version by 100X. sometimes called InstructGPT that trained on the same datasets of GPT-3 but with additional fine tuning process that adds a concept called ‘reinforcement learning with human feedback’ or RLHF to the GPT-3 model.

GPT 4.0

GPT-4 is a?multimodal?large language model?created by?OpenAI?and the fourth in its?GPT?series.?It was released on March 14, 2023, and has been made publicly available in a limited form via?ChatGPT?Plus, with access to its commercial?API?being provided via a waitlist.?As a?transformer, GPT-4 was pretrained to predict the next?token?(using both public data and "data licensed from third-party providers"), and was then fine-tuned with?reinforcement learning from human and AI feedback?for?human alignment?and policy compliance.

Compared with its November 2022 predecessor, ChatGPT, observers reported GPT-4 to be an impressive improvement on ChatGPT, with the caveat that GPT-4 retains some of the same problems.Unlike ChatGPT, GPT-4 can take images as well as text as input.OpenAI has declined to reveal technical information such as the size of the GPT-4 model.


Reference links-https://iq.opengenus.org/introduction-to-gpt-models/#:~:text=GPT%20model%20works%20on%20a%20principle%20called%20autoregressive,first%20performing%20unsupervised%20pre-training%20and%20then%20supervised%20fine-tuning.

https://en.wikipedia.org/wiki/GPT-4https://iq.opengenus.org/gpt-3-5-model/

Gautam O.

Executive Director | Intelligent Automation and Technology Delivery Expert

1 年

Umesh Khandelwal A great quick read. Keep them coming. ??

要查看或添加评论,请登录

Umesh Khandelwal的更多文章

  • Conversational AI Chatbot(PVA)- New Advanced AI Features

    Conversational AI Chatbot(PVA)- New Advanced AI Features

    Power Virtual Agents hosts multiple AI models and AI capabilities on a single service, the core of which is a…

  • Access GPT Models using Azure OpenAI

    Access GPT Models using Azure OpenAI

    Azure OpenAI Service provides REST API access to OpenAI's powerful language models including the GPT-3, Codex and…

  • Azure Cloud Monitoring Solutions

    Azure Cloud Monitoring Solutions

    Azure Monitor is a comprehensive monitoring solution for collecting, analyzing, and responding to telemetry from your…

  • Use GPT Capabilities in PVA Chatbot

    Use GPT Capabilities in PVA Chatbot

    By leveraging recent advancements in AI large language models, Power Virtual Agents (preview) transforms how you build…

  • Azure Blueprints

    Azure Blueprints

    Azure Blueprints simplify deployments by packaging artifacts, such as Azure Resource Manager templates, Azure…

社区洞察

其他会员也浏览了