登录查看更多内容

What is a transformer?

馬Antony 裕杰

Cybersecurity SaaS developer. Web Isolation and Augmented Whitelisting. #Insurtech #decentralised_insurance

发布日期: 2023年5月8日

One reason ChatGPT is so powerful is that it utilizes a new neural network architecture called the Transformer, which was proposed in a paper by Google engineers in 2017. There are two features that make Transformers powerful: firstly, they are designed for parallel processing of sequential data, such as textual sentences, which accelerates the training process. Secondly, the paper introduced an attention mechanism that captures the importance of the context of words, rather than just their sequence.

While I only know the basics of the Transformer architecture, I will attempt to explain the essential elements without delving into too much detail.

In my last post about text embedding, I used English dictionary to explain the concept of text embedding. We can represent sentences using multi-dimensional vectors. The true value of this representation lies in the fact that each word exists in relation to other words, which imparts meaning. Publishers arrange words alphabetically in a dictionary to facilitate easy access for readers. However, this alphabetical order does not hold any semantic value. Starting with A and ending with Z is simply an efficient organization and search method for humans.

On the other hand, a dictionary for a computer neural network does not need to adhere to alphabetical order. If we were to ask a neural network to create a dictionary using only the text from the Bible, this dictionary would represent all the relationships between words in the Bible.

A Transformer can be likened to a dictionary assistant, capable of searching this new dictionary using multi-dimensional vectors. To handle user queries, a Transformer consists of an encoder and a decoder.

For instance, if you were to use this dictionary to find the answer to the question, "Where was the Son of God born?", the computer must understand that "Son of God" should be considered as one unit and is equivalent to "Jesus the Messiah" rather than "John the Baptist." The encoder provides the context, importance, and relationships of words according to the biblical text. The encoder achieves this by utilizing a method called Positional Encoding.

"Positional Encoding: In natural language processing, the order of words in a sentence is crucial for determining the sentence’s meaning. However, traditional machine learning models, such as neural networks, do not inherently understand the order of inputs. To address this challenge, positional encoding can be used to encode the position of each word in the input sequence as a set of numbers. These numbers can be fed into the Transformer model, along with the input embeddings. By incorporating positional encoding into the Transformer architecture, GPT can more effectively understand the order of words in a sentence and generate grammatically correct and semantically meaningful output."

The encoder generates a vector to store all contextual values, which is then passed on to the decoder. Subsequently, the decoder processes these contextual values to determine the appropriate output. For instance, the words "where" and "born" pertain to a location. Both Bethlehem and stable are possible answers. The decoder has to decide which answer is more relevant to the input. It is crucial to note that the output does not rely solely on the sequence of words but also takes into consideration their context, allowing for a more accurate and nuanced understanding of the text.

领英推荐

Is GPT-3 Overhyped?

Naveen Joshi 3 年前

ChatGPT and CFD

Mohamed Aly Sayed 1 年前

What is ChatGPT really!

Vipul Patel 2 年前

In the ChatGPT 3 transformer, there are 96 layers of encoders and decoders. This advanced mechanism of the Transformer architecture, as well as its ability to process sequential data in parallel, contributes to the exceptional performance of models like ChatGPT, making it a vital topic of discussion among IT professionals.

PS:

If you are too busy to catch up with what are happening in AI industry globally, below are some links I found both insightful and has long term impact .

Open Source Large Language Model has a better chance to win

(https://www.semianalysis.com/p/google-we-have-no-moat-and-neither)

2. 10 Reasons to Ignore AI Safety

https://www.youtube.com/watch?v=9i1WlcCudpU

Beng Guan Toh

Cybersecurity, Risk, Compliance & IT Governance leader with wide ranging exposure across IT | CISO | CISM, CCSP, CISSP, SABSA, ITIL

1 年

Antony, great job in making things simple. Your explanation of text embedding brought to mind the controversy surrounding the slow publication of the Dead Sea Scrolls. Two scholars reconstructed texts of the unpublished scrolls using information from published concordances (listing of every word in a text and their immediate context). Up till then, all these cross-refereces and indexes were manually collated, I believe this was the first instance a computer program was used to "recover" the text from the references. That was in 1991 and computers have progressed by leaps and bounds since.

查看更多评论

要查看或添加评论，请登录

馬Antony 裕杰的更多文章

What happens , if Elon is right?

2025年2月25日

What happens , if Elon is right?

We recently had an internal discussion about the ongoing saga of weekly email work achievement reports in the U.S.
Can you play chess without knowing the rules? LLM can !

2023年8月11日

Can you play chess without knowing the rules? LLM can !

Most people, upon their first interaction with ChatGPT or other large language models (LLMs), are astonished by the…
Combating the Globalization of Phishing Attacks: Using AI to Fight Back Against AI Threats

2023年5月28日

Combating the Globalization of Phishing Attacks: Using AI to Fight Back Against AI Threats

Language Learning Models (LLMs) have become adept at performing tasks that require specific content generation, such as…
What is text embedding ?

2023年4月30日

What is text embedding ?

Did you know Cambridge dictionary works like a language model ? and Each time you lookup a word, it's as if you are…
What is a vector search?

2023年4月23日

What is a vector search?

The purpose of this newsletter is on assisting enterprise tech people transition into AI-era. It is also my learning…
LLMs store data using Vector DB. Why and how ?

2023年4月16日

LLMs store data using Vector DB. Why and how ?

Traditionally, computing has been deterministic, which refers to the inherent consistency, repeatability, and…
An Oracle DBA's journey to Vector DB, LLM and Generative AI

2023年4月13日

An Oracle DBA's journey to Vector DB, LLM and Generative AI

An Oracle DBA's journey to Vector DB, LLM and Generative AI Discovering the Power of #LLMs in Business Applications and…
Which is a more effective phishing tactic against youngsters?

2021年2月16日

Which is a more effective phishing tactic against youngsters?

Scarcity or Authority or Both? #university #students #research When John received an email inviting him to sign an…
4 Key facts about WPA2 attack, not explained in major media

2017年10月18日

4 Key facts about WPA2 attack, not explained in major media

The news about WPA KRACK—Key Reinstallation Attack and its threats are proliferating. As a global connectivity company…
First 90 days being a startup founder

2016年4月11日

First 90 days being a startup founder

In December 2015, I left my job at Singapore IDA, where I had spent two years, and a month later started my own firm…

10 条评论

See all articles

What is a transformer?

馬Antony 裕杰

Cybersecurity SaaS developer. Web Isolation and Augmented Whitelisting. #Insurtech #decentralised_insurance

领英推荐

馬Antony 裕杰的更多文章

社区洞察

其他会员也浏览了

Behind the AI Curtain: Top-p and Top-k in ChatGPT, Grok, and Gemini by Google

Unveiling the Mechanism of ChatGPT: A Journey into Conversational AI

How Do Large Language Models (LLMs) Work?

GPThibault Pulse” vol. 3 - your weekly fix of Prompt Engineering, insider tips and news on Generative AI, and Life Sciences

ChatGPT in the Age of Generative AI

"As humans, we don't want to admit that something is intelligent, but the truth is that things are improving - a lot."

Can AI read Minds?

Developing Probably: Building Apps with LLMs

How ChatGPT Works: Technology, Algorithms, and Security Challenges

LLM for Executive lookup

领英推荐

馬Antony 裕杰的更多文章

What happens , if Elon is right?

Can you play chess without knowing the rules? LLM can !

Combating the Globalization of Phishing Attacks: Using AI to Fight Back Against AI Threats

What is text embedding ?

What is a vector search?

LLMs store data using Vector DB. Why and how ?

An Oracle DBA's journey to Vector DB, LLM and Generative AI

Which is a more effective phishing tactic against youngsters?

4 Key facts about WPA2 attack, not explained in major media

First 90 days being a startup founder

社区洞察

其他会员也浏览了

Behind the AI Curtain: Top-p and Top-k in ChatGPT, Grok, and Gemini by Google

Unveiling the Mechanism of ChatGPT: A Journey into Conversational AI

How Do Large Language Models (LLMs) Work?

GPThibault Pulse” vol. 3 - your weekly fix of Prompt Engineering, insider tips and news on Generative AI, and Life Sciences

ChatGPT in the Age of Generative AI

"As humans, we don't want to admit that something is intelligent, but the truth is that things are improving - a lot."

Can AI read Minds?

Developing Probably: Building Apps with LLMs

How ChatGPT Works: Technology, Algorithms, and Security Challenges

LLM for Executive lookup