Transformers - The game changing architecture behind GenerativeAI
Mohammed Karimkhan Pathan
Senior Data Scientist | Manager - Projects | Data Science Consultant | Generative AI Expert
The Transformers architecture, introduced in 2017, revolutionized AI by using self-attention mechanisms, enabling models like GPT-3.5 to excel. This innovation has led to widespread adoption and transformative impacts. Key features include the encoder-decoder structure and multi-headed attention. Paper link - https://arxiv.org/abs/1706.03762
?????? ????????????????:
?????????? ?????????????????? ??????????: The first step in the process involves the input embedding layer. The purpose of this layer is to convert input words into vectors of continuous values. These vectors are a dense representation of the words and capture the semantic and syntactic properties of the words. The values of these vectors are learned during the training process.
???????????????????? ????????????????: Since Transformers lack the recurrence mechanism found in RNNs, they use positional encodings added to the input embeddings to convey the position of each token within the sequence.
- Researchers proposed using a combination of sine and cosine functions to generate positional vectors, making this encoding method adaptable to sentences of any length. Each dimension of these positional vectors is represented by unique frequencies and offsets of sine and cosine waves, with values ranging from -1 to 1, effectively encoding the position of each token.
????????-?????????????????? ????????????: Self-attention is a key mechanism in transformer-based models that allows the model to focus on different parts of an input sequence when processing each element. It's used to relate different positions of a single sequence to compute a representation of the same sequence. This helps the model to capture long-range dependencies and dynamically determine the relative importance of various words in a sequence.
领英推荐
??????????-???????? ??????????????????: Enables the model to attend to information from different representation subspaces. The multi-headed attention mechanism uses self-attention, which allows the model to relate each word in the input to other words. For example, it can connect the word “de” with “quién”.
This mechanism allows the encoder to pay attention to multiple parts of the input sequence as it processes each token.It computes attention scores
??????????????: This part takes our input and converts it into a matrix representation. For example, it processes the Spanish sentence "?De quién es?" and transforms it into a structured format that captures the essence of the input. Its primary function is to convert input tokens into contextualized representations. Unlike earlier models that processed tokens in isolation, the Transformer encoder captures the context of each token within the entire sequence.
??????????????: This component receives the encoded representation and iteratively generates the output. In our case, it takes the encoded data and produces the translated sentence "Whose is it?" in English.
????????-?????????????? ???????????? ????????????????: Process the output from the attention mechanism. The journey of the normalized residual output continues as it passes through a pointwise feed-forward network, a crucial phase for additional refinement.
#GenerativeAI #Transformer #LLM #attention #self-attention #feedforward #position-encoding #encoder #decoder