How Generative Text Artificial Intelligence (Specifically GPT) Works Under the Hood, In Plain English, For Non-Nerds…

How Generative Text Artificial Intelligence (Specifically GPT) Works Under the Hood, In Plain English, For Non-Nerds…

GTP is a subset of Generative Text Artificial Intelligence, which is also known as a Large Language Model.

GTP stands for Generative Pre-trained Transformer.

I’ll break each element down to what it is and how it works.

A user inputs written words.

A Large Language Model (LLM), software understanding and generating human language, like GPT (each time I mention an LLM assume I am referring to GPT) splits those words into what are called tokens.

A token is a part of a word or sentence.

A token is approximately on average 0.75 words.

This is not to be confused with the use of the word token in crypto or in web apps!

LLMs then take the fractions of words or sentences (as tokens – assume every time I mention words or sentences, I really mean fractions of words or sentences or tokens) and look for the context 9surronding words) that it exists in.

LLMs understand the context (or sequence) that words exist in by using training data.

This is the Pre-training in the GPT acronym.

OpenAI’s GPTs initial training data corpus was from the likes of massive quantities of text from:

  1. Common Crawl
  2. WebText2
  3. Books1 & Books2 (what they are currently being litigated on);
  4. Wikipedia; and
  5. Reddit (who has now tightened access via their API).

LLMs then compare and narrow down the types of words and patterns of words that surround (or their context it appears in) a user's input of written words.

LLMs do this by creating what is called a vector through a function of software.

A vector is a series of values (represented as a number) that depict the linguistic features of each word such as the meaning and intent of users' input of written words.

This is called a “word embedding” or “embedded vector”.

These numbers (embedded word vectors) allow LLMs to determine how close words are in proximity (or position) to other like words and their relationship to other words in similar sequences that were contained in the training data from before.

One can build a semantic search engine based purely on tokenization and embedded word vectors.

I have built several and this is the basis of an AI-powered chatbot.

A keyword search engine – what you have most used to since the dawn of the search engine, simply searches for the occurrence of words in its dataset.

However, with a semantic search engine, a user can input a completely unknown word or sequence of words that doesn’t exist in the dataset and a very accurate response will still be generated.

This is why you can get away with spelling mistakes in your input from LLMs.

This works because tokenization and embedded word vectors look beyond a word or its incorrect spelling and look for linguistic features such as meaning and intent in the user's input.

This is the start of the Generative process in the GPT acronym.

A Transformer in the GPT acronym is what is called a parser.

In computer terminology, a parser is a software function that breaks down a sentence of input words (a string) into parts.

In doing so, the Transformer captures the intent, meaning and context of the entire user's word input, the closeness, proximity, and the relationship of each word to each other and what are the important parts of the sequence of words it parsed.

In other words, the Transformer looks for intent and meaning.

This is called self-attention.

Once the LLM has understood the user’s input by parsing it for its intent and meaning, it can then create a meaningful and relevant output to it.

Remember, through embedded word vectors, LLMs compare and narrow down the types of words and patterns of words that surround (or their context it appears in) a user's input of written words.

Based on this, LLMs try to predict the next word that will appear in the sentence it is outputting.

It predicts this through what is called a probability score.

“What is the probable likelihood of a word from a bank of possibilities being the next in the sequence of words?”

Because it is based on probability LLMs can get sentences wrong.

Not in how they are constructed, but in what they mean.

And because embedded word vectors are not the same as a keyword search engine, LLMS makes up facts (called hallucinations).

And that is the Generative part of the GPT acronym.

That is how GPT works.

The Chat function of ChatGPT simply allows you to add the previous question/answers to your input.

Woodley B. Preucil, CFA

Senior Managing Director

11 个月

Orren Prunckun Very well-written & thought-provoking.

回复
Woodley B. Preucil, CFA

Senior Managing Director

11 个月

Orren Prunckun Very insightful. Thanks for sharing.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了