登录查看更多内容

Introduction to Transforms - English version

Hay Hoffman

AI Engineer @ Spacial | M.Sc in CS

发布日期: 2023年6月28日

As part of my thesis work, I recently delved into transformer architecture. As I read the first articles in the field (Attention is all You Need, ViT), I struggled to process the concrete ideas on which transformers are based on into abstract ideas.

As a result, I delved into the subject, analysed code from a number of sources on the Internet alongside reading the articles, and finally conducted a Q&A round with ChatGPT to sharpen my understanding.

After a week and a half of intensive research, I came to the conclusion that I want to share the knowledge I have accumulated with the world. Therefore, I decided to write a series of posts through which I can express my understanding, and perhaps shorten the process for those who do not have the time I had to invest in this learning. As part of the writing, I will cover the history of the field, network structure, advantages and disadvantages, and how to train it.

In the second stage, I will cover articles in the field of computer vision that present the use of the network for tasks such as segmentation, image matting, body gait transfer, classification, etc.

To do this, I approached Michael Erlihson , for scientific editing, and together we embarked on this journey.

领英推荐

Introducing The Big Book of Large Language Models!

Damien Benveniste, PhD 1 个月前

"Retrieval-Augmented Generation (RAG), Simplified!"

Rajesh Dangi 8 个月前

Exploring LangChain's Expression Language (LCEL)

Rany ElHousieny, PhD??? 6 个月前

Historical Overview

Transformers were initially proposed as a solution to the problem of text analysis. the first article in the field (Attention is all You Need), presented an English to French translation and opened the gate to the NLP revolution we are seeing today.

Transformers introduced several new ideas in the field of language analysis. The two main ones, intertwined in one architecture and constitute the conceptual building blocks of the network (in addition to other innovations that the article presented). The first was parallel processing of information, which led to computational efficiency in training the model compared to previous models (RNN\LSTM\GRU), and for the first time made it possible to break through the barrier of time-dependent learning of sequential input, i.e., dependencies can be learned in parallel, both short and long range in sequential input. Note that transformers are also limited in their ability to process parallel input, but this is a limitation dependent on computational resources such as available memory and processing units. (Currently the maximum number of tokens is 1024 [3])

In addition, the network introduced two attention mechanisms, the first is self-attention which allowed the model to focus on the most important information selectively. And the second is cross-attention, which allowed the model access to different parts of the dependent input for each output token. These features are necessary in tasks such as translation, answering questions, and summarizing text, which require conscious selection of the most important parts of the input and output.

These ideas excited me and made me want to understand what led to their development. In order to do so, I started to review the architectures that preceded the transformers, how they worked, and why they did not succeed in the task that the transformers did succeed in.

Deep learning

1,022 位关注者

Michael Erlihson

1 年

So, cool. I'll share it later. Thx for your substantial effort, Hay!!

1 次回应

查看更多评论

要查看或添加评论，请登录

Hay Hoffman的更多文章

Transformers Architecture - Part 3: English Version

2023年9月12日

Transformers Architecture - Part 3: English Version

So, Michael (Mike) Erlihson, PhD, and I are back with a new episode; this time, it's the third part analyzing the…

5 条评论
Transformers Architecture - Part 2: English Version

2023年9月3日

Transformers Architecture - Part 2: English Version

So, Michael (Mike) Erlihson, PhD, and I are back with a new episode; this time, it's the second part analyzing the…

7 条评论
Transformers Architecture - Part 1: English Version

2023年8月29日

Transformers Architecture - Part 1: English Version

So, Michael (Mike) Erlihson, PhD , and I are back with a new episode; this time, it's the first part of analyzing the…

2 条评论
Attention Mechanism - Part 2 : English Version

2023年8月15日

Attention Mechanism - Part 2 : English Version

A new chapter has just been published, and this time Michael (Mike) Erlihson, PhD and I provide a comprehensive…
Attention Mechanism - Part 1 : English Version

2023年8月8日

Attention Mechanism - Part 1 : English Version

A new chapter has just been published, and this time Michael (Mike) Erlihson, PhD and I provide a comprehensive…

2 条评论
Long-Short-Term Memory Networks - English Version

2023年7月30日

Long-Short-Term Memory Networks - English Version

Building on our previous post where we discussed RNNs, we will continue our exploration of solutions proposed for…

6 条评论
Transformers architecture - part 2 Hebrew Version

2023年7月19日

Transformers architecture - part 2 Hebrew Version

?? ??? ?Michael (Mike) Erlihson, PhD ?????? ?? ???? ???? ?????? ?????????? ????????????. ???? ?? ????? ?????? ??????…

6 条评论
Transformers architecture - part 1 Hebrew Version

2023年7月18日

Transformers architecture - part 1 Hebrew Version

?? ??? ?Michael (Mike) Erlihson, PhD ?????? ?? ??? ???, ?????, ???? ?????? ?????? ?????????? ???????????. ???? ??? ???…

6 条评论
Recurrent Neural Networks - English Vesion

2023年7月3日

Recurrent Neural Networks - English Vesion

Following our previous post where we introduced Transformers, we'll now start by looking at the problem more closely…

2 条评论
Attention Mechanism - Part 2 : Hebrew Version

2023年5月1日

Attention Mechanism - Part 2 : Hebrew Version

?? ??? ??? ??? ??????, ????? ??? ? Michael(Mike) Erlihson, PhD????? ???? ?????? ????? ??? ????? ??????? ??????. ?????…

2 条评论

See all articles

Introduction to Transforms - English version

Hay Hoffman

AI Engineer @ Spacial | M.Sc in CS

领英推荐

Deep learning

1,022 位关注者

Hay Hoffman的更多文章

社区洞察

其他会员也浏览了

How Irrelevant Retrieval Leads to Hallucination in RAG Models

Paper Review: Training Large Language Models to Reason in a Continuous Latent Space

NLP

"Attention" for Neural Machine Translation (NMT) without pain

How Does an AI Language Model Think and Write (2)? ---- Training Methods That Can Impart Human Writing Skills to Computers 人工智能如何建模思考和写作 (2)

Large Language Models: The AI Revolution Shaping the Future of Human Language

The Marvels of Transformer Architecture: A Journey Through Key Models

How beginner and expert coders use AI coding tools to code faster and ship great software.

New Technique Enhances AI's Problem-Solving Abilities with Python Programs

Google’s BLEURT is BERT for Evaluating Natural Language Generation Models

领英推荐

Deep learning

1,022 位关注者

Hay Hoffman的更多文章

Transformers Architecture - Part 3: English Version

Transformers Architecture - Part 2: English Version

Transformers Architecture - Part 1: English Version

Attention Mechanism - Part 2 : English Version

Attention Mechanism - Part 1 : English Version

Long-Short-Term Memory Networks - English Version

Transformers architecture - part 2 Hebrew Version

Transformers architecture - part 1 Hebrew Version

Recurrent Neural Networks - English Vesion

Attention Mechanism - Part 2 : Hebrew Version

社区洞察

其他会员也浏览了

How Irrelevant Retrieval Leads to Hallucination in RAG Models

Paper Review: Training Large Language Models to Reason in a Continuous Latent Space

NLP

"Attention" for Neural Machine Translation (NMT) without pain

How Does an AI Language Model Think and Write (2)? ---- Training Methods That Can Impart Human Writing Skills to Computers 人工智能如何建模思考和写作 (2)

Large Language Models: The AI Revolution Shaping the Future of Human Language

The Marvels of Transformer Architecture: A Journey Through Key Models

How beginner and expert coders use AI coding tools to code faster and ship great software.

New Technique Enhances AI's Problem-Solving Abilities with Python Programs

Google’s BLEURT is BERT for Evaluating Natural Language Generation Models