ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

How Large Language GPT models evolved and work

Umesh Khandelwal

VP, Lead Engineer at Wells Fargo | AI Enthusiast | Lifelong Learner

å‘å¸ƒæ—¥æœŸ: 2023å¹´4æœˆ1æ—¥

Generative Pre-Training (GPT) models are trained on unlabeled dataset (which are available in abundance). So the models were trained on the unlabeled data set and then fine tuning the model on specific annotated dataset.

These models perform way better than the previous state of the art models. For example, a model can be trained on Wikipedia dataset and then that model can be fine-tuned on sentiment analysis dataset.

Challenges with previous language models

Natural Language Tasks include various tasks such as semantic similarity assessment, textual entailment, document classification and question answering. Previously to train any models on these tasks, special data set curated for such tasks were required. But, the problem was to acquire such dataset was very difficult and even if we have such a dataset, we can't have large corpus. The other problem was that such models failed miserably on other NLP tasks as they were especially trained on specific tasks.

How the GPT models evolved

GPT-1

Open AI came up with the first iteration of Generative Pre-Training-1 (GPT-1). It was trained on books Corpus data set which has 7000 un-published books. GPT model was based on Transformer architecture. It was made of decoders stacked on top of each other (12 decoders). These models were same as BERT as they were also based on Transformer architecture. The difference in architecture with BERT is that it used stacked encoder layers. GPT model works on a principle called autoregressive which is similar to one used in RNN. It is a technique where the previous output becomes current input.

GPT Model leverages semi-supervised learning which include first performing unsupervised pre-training and then supervised fine-tuning.

The architecture of the GPT is shown above. It shows that it uses 12 layers of decoder with 12 attention heads in each self-attention layer. It contains masked self attention which is used for training the model. The architecture was very similar to the original transformer architecture. Masking helps where the model doesn't have access to the words to the right side of the current word.

GPT-2

Open-AI came up with the second generation of Generative Pre-Training model, gpt2. Architecture is based on same philosophy as that of GPT-1 (stacked decoder layers), but was made larger by using a very large data set and adding more parameters to the model. The model was trained on very large data set which was scrapped from reddit and which they called the WebText data set. It was around 40 GB of text data with 8 million documents. The model had way more parameters than the previous edition of gpt, around 10 more than gpt-1 (1.5 billion). In this edition of GPT model, layers of the decoders stacked on each other, 48 to be precise. GPT-2 showed that between a model trained on a larger data set and with more parameters can increase the accuracy of the model.

é¢†è‹±æŽ¨è

RAG: From Concept to Advanced Implementation - A Comprehensive Guide

RAG: From Concept to Advanced Implementation - Aâ€¦

Brij kishore Pandey 7 ä¸ªæœˆå‰

Top LLM Papers of the Week (July Week 2, 2024)

Kalyan KS 8 ä¸ªæœˆå‰

Exploring RAG with LangChain

Atul Kumar 2 ä¸ªæœˆå‰

The authors of GPT-2 trained four different models. First was with 117 million parameters (same as GPT-1), second was with 345 million parameters, third was with 762 million parameters and fourth one was with a whooping 1.5 billion parameters (GPT-2). Each subsequent model performed better than the previous one and had lower perplexity than previous one.

GPT-3

In 2020, Open-AI came up with another edition of GPT model, GPT 3, which has over around 175 billion parameters, which was around 10 times more than Microsoft NLG model and around 100 times more than the previous GPT-2. It was trained on 5 different Corpus each having different weight assigned. They were webtext, books-1, books-2, Common Crawl, WebText2, Wikipedia. Applications of GPT-3 include generating text, writing podcasts, writing legal documents. This doesn't here as it can write website or ML codes too!

GPT 3.5

GPT-3.5 is based on GPT-3 but work within specific policies of human values and only 1.3 billion parameter fewer than previous version by 100X. sometimes called InstructGPT that trained on the same datasets of GPT-3 but with additional fine tuning process that adds a concept called â€˜reinforcement learning with human feedbackâ€™ or RLHF to the GPT-3 model.

GPT 4.0

GPT-4 is a?multimodal?large language model?created by?OpenAI?and the fourth in its?GPT?series.?It was released on March 14, 2023, and has been made publicly available in a limited form via?ChatGPT?Plus, with access to its commercial?API?being provided via a waitlist.?As a?transformer, GPT-4 was pretrained to predict the next?token?(using both public data and "data licensed from third-party providers"), and was then fine-tuned with?reinforcement learning from human and AI feedback?for?human alignment?and policy compliance.

Compared with its November 2022 predecessor, ChatGPT, observers reported GPT-4 to be an impressive improvement on ChatGPT, with the caveat that GPT-4 retains some of the same problems.Unlike ChatGPT, GPT-4 can take images as well as text as input.OpenAI has declined to reveal technical information such as the size of the GPT-4 model.

Reference links-https://iq.opengenus.org/introduction-to-gpt-models/#:~:text=GPT%20model%20works%20on%20a%20principle%20called%20autoregressive,first%20performing%20unsupervised%20pre-training%20and%20then%20supervised%20fine-tuning.

https://en.wikipedia.org/wiki/GPT-4https://iq.opengenus.org/gpt-3-5-model/

Gautam O.

Executive Director | Intelligent Automation and Technology Delivery Expert

1 å¹´

Umesh Khandelwal A great quick read. Keep them coming. ??

èµž

å›žå¤

1 æ¬¡å›žåº”

æŸ¥çœ‹æ›´å¤šè¯„è®º

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Umesh Khandelwalçš„æ›´å¤šæ–‡ç«

Conversational AI Chatbot(PVA)- New Advanced AI Features

2023å¹´5æœˆ2æ—¥

Conversational AI Chatbot(PVA)- New Advanced AI Features

Power Virtual Agents hosts multiple AI models and AI capabilities on a single service, the core of which is aâ€¦
Access GPT Models using Azure OpenAI

2023å¹´4æœˆ13æ—¥

Access GPT Models using Azure OpenAI

Azure OpenAI Service provides REST API access to OpenAI's powerful language models including the GPT-3, Codex andâ€¦
Azure Cloud Monitoring Solutions

2023å¹´4æœˆ6æ—¥

Azure Cloud Monitoring Solutions

Azure Monitor is a comprehensive monitoring solution for collecting, analyzing, and responding to telemetry from yourâ€¦
Use GPT Capabilities in PVA Chatbot

2023å¹´3æœˆ28æ—¥

Use GPT Capabilities in PVA Chatbot

By leveraging recent advancements in AI large language models, Power Virtual Agents (preview) transforms how you buildâ€¦
Azure Blueprints

2023å¹´3æœˆ27æ—¥

Azure Blueprints

Azure Blueprints simplify deployments by packaging artifacts, such as Azure Resource Manager templates, Azureâ€¦

See all articles

How Large Language GPT models evolved and work

Umesh Khandelwal

VP, Lead Engineer at Wells Fargo | AI Enthusiast | Lifelong Learner

Challenges with previous language models

How the GPT models evolved

GPT-1

GPT-2

é¢†è‹±æŽ¨è

GPT-3

GPT 3.5

GPT 4.0

Umesh Khandelwalçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Retriever Augmented Generation (RAG): Enhancing Language Models with External Knowledge

Understanding Transformers: A Deep Dive with PyTorch

Optimizing Response Efficiency: Semantic Caching Strategies in GPT Cache

The Future of Search: How Perplexity AI and Comet Are Changing the Game

Retrieval Augmented Generation (RAG) overview

Retrieval Augmented Generation (RAG): The Second Coming of LLMs

Use Cases for Foundation Models (aka LLMs) in the Energy Industry

Navigating the Landscape of Generative AI and Large Language Models

What the latest AI model GPT-3 means for Customer Feedback Analysis

Vector Databases: Powering Large Language Models (LLMs) and General AI

Challenges with previous language models

How the GPT models evolved

GPT-1

GPT-2

é¢†è‹±æŽ¨è

GPT-3

GPT 3.5

GPT 4.0

Umesh Khandelwalçš„æ›´å¤šæ–‡ç«

Conversational AI Chatbot(PVA)- New Advanced AI Features

Access GPT Models using Azure OpenAI

Azure Cloud Monitoring Solutions

Use GPT Capabilities in PVA Chatbot

Azure Blueprints

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

Retriever Augmented Generation (RAG): Enhancing Language Models with External Knowledge

Understanding Transformers: A Deep Dive with PyTorch

Optimizing Response Efficiency: Semantic Caching Strategies in GPT Cache

The Future of Search: How Perplexity AI and Comet Are Changing the Game

Retrieval Augmented Generation (RAG) overview

Retrieval Augmented Generation (RAG): The Second Coming of LLMs

Use Cases for Foundation Models (aka LLMs) in the Energy Industry

Navigating the Landscape of Generative AI and Large Language Models

What the latest AI model GPT-3 means for Customer Feedback Analysis

Vector Databases: Powering Large Language Models (LLMs) and General AI

é¢†è‹±æŽ¨è

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†