登录查看更多内容

Transformers - The game changing architecture behind GenerativeAI

Mohammed Karimkhan Pathan

Senior Data Scientist | Manager - Projects | Data Science Consultant | Generative AI Expert

发布日期: 2024年7月25日

The Transformers architecture, introduced in 2017, revolutionized AI by using self-attention mechanisms, enabling models like GPT-3.5 to excel. This innovation has led to widespread adoption and transformative impacts. Key features include the encoder-decoder structure and multi-headed attention. Paper link - https://arxiv.org/abs/1706.03762

?????? ????????????????:

?????????? ?????????????????? ??????????: The first step in the process involves the input embedding layer. The purpose of this layer is to convert input words into vectors of continuous values. These vectors are a dense representation of the words and capture the semantic and syntactic properties of the words. The values of these vectors are learned during the training process.

???????????????????? ????????????????: Since Transformers lack the recurrence mechanism found in RNNs, they use positional encodings added to the input embeddings to convey the position of each token within the sequence.

- Researchers proposed using a combination of sine and cosine functions to generate positional vectors, making this encoding method adaptable to sentences of any length. Each dimension of these positional vectors is represented by unique frequencies and offsets of sine and cosine waves, with values ranging from -1 to 1, effectively encoding the position of each token.

????????-?????????????????? ????????????: Self-attention is a key mechanism in transformer-based models that allows the model to focus on different parts of an input sequence when processing each element. It's used to relate different positions of a single sequence to compute a representation of the same sequence. This helps the model to capture long-range dependencies and dynamically determine the relative importance of various words in a sequence.

ParametricArchitecture 1 年前

Best tips for prompt engineering: Insights from…

Kane Simms 4 周前

From Thresholds to Impact: The Future of Observability…

Yoseph Reuveni 2 个月前

??????????-???????? ??????????????????: Enables the model to attend to information from different representation subspaces. The multi-headed attention mechanism uses self-attention, which allows the model to relate each word in the input to other words. For example, it can connect the word “de” with “quién”.

This mechanism allows the encoder to pay attention to multiple parts of the input sequence as it processes each token.It computes attention scores

??????????????: This part takes our input and converts it into a matrix representation. For example, it processes the Spanish sentence "?De quién es?" and transforms it into a structured format that captures the essence of the input. Its primary function is to convert input tokens into contextualized representations. Unlike earlier models that processed tokens in isolation, the Transformer encoder captures the context of each token within the entire sequence.

??????????????: This component receives the encoded representation and iteratively generates the output. In our case, it takes the encoded data and produces the translated sentence "Whose is it?" in English.

????????-?????????????? ???????????? ????????????????: Process the output from the attention mechanism. The journey of the normalized residual output continues as it passes through a pointwise feed-forward network, a crucial phase for additional refinement.

#GenerativeAI #Transformer #LLM #attention #self-attention #feedforward #position-encoding #encoder #decoder

Transformers - The game changing architecture behind GenerativeAI

Mohammed Karimkhan Pathan

Senior Data Scientist | Manager - Projects | Data Science Consultant | Generative AI Expert

领英推荐

Deep Dive - Data Science

3,054 位关注者

更多精彩文章

社区洞察

其他会员也浏览了

Short Circuit — Let AI Design Your Chips

AI Use Cases in Semiconductor Design with Wipro's WeGA Studio? AI - (1)

FOD#71: Matryoshka against Transformers

Paper Review: LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models

LLMs for Building Design: Game-changer or Old Wine in New Bottles?

Paper Review: Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures

FOD#46: What is Mamba and can it beat Transformers?

???????????????????????? ??????????????????? ?????????? ????????????????????????????????????? ? ?:? ? ?????????? ?&? ????????? ???

Modelling of Cyber-Physical Systems

领英推荐

Deep Dive - Data Science

3,054 位关注者

AutoGen - An Open-Source Programming Framework for Agentic AI

2024年8月12日

Why do we require LLM finetuning ?

2024年7月30日

Hype Cycle for Artificial Intelligence, 2024

2024年7月1日

Run language model on your personal device: SLM ( Small Language Model ) - Phi 3

2024年6月1日

Is Your Career at Risk due to Generative AI? Read This and Plan it Before It's Too Late

2024年5月22日

Detailed Decoding of job disruption due to Generative AI - Part 2

2024年5月21日

Detailed Decoding of job disruption due to Generative AI - Part 1

2024年5月18日

A $1 Trillion productivity story on Generative AI - with a catch

2024年5月17日

Generative AI could deliver up to $1 trillion in annual growth by 2032, while potentially disrupting up to 90% of existing jobs.

2024年5月15日

What is Knowledge Graph

2024年4月24日