Transformers

We’re exploring the realm of Deep Learning, focusing on the pivotal role that “transformers” play in driving advancements in AI, rather than referring to the fictional robots of cinema fame.

Transformer was first proposed in a 2017 paper called “Attention is All You Need” by researchers at Google and the University of Toronto.

Transformers employ semi-supervised learning; they are pre-trained in an unsupervised manner with large, unlabeled datasets, and then fine-tuned through supervised training to enhance their performance. Furthermore, Transformers execute multiple sequences in parallel, significantly expediting the training process.

Examples include language translation, document summarization, and auto-completion tasks.

What sets transformers apart from other models?

  1. Attention Mechanism — In transformers, the attention mechanism computes attention scores between each pair of tokens in the input sequence. These attention scores determine how much focus should be given to each token when processing a particular token. For e.g. — In the sentence “the animal didn’t cross the street because it was too tired” the attention mechanism would assign higher weights to “animal” when processing “it,” as it refers to animal.

2. Positional Encoding — Positional encoding is a crucial component of transformers that provides information about the position of words or tokens within a sequence. Since transformers process input sequences in parallel, they lack the inherent understanding of the sequential order of tokens that RNNs possess. Positional encoding addresses this limitation by injecting positional information into the input embeddings. This allows the transformer model to differentiate between tokens based on their positions within the sequence.

3. Parallel Process — Transformers process the entire input sequence in parallel, enabling faster training and inference, especially for long sequences.

It consists of 2 parts

  1. Encoder — The encoder layer is responsible for capturing the input data and transforming it into a fixed-dimensional representation called context vector. According to the research paper, the encoder is composed of stack of N=6 identical layers. Each layer has 2 sub-layers: Self attention and feed Forward.
  2. Decoder — The Decoder layer takes the context representation generated by the encoder layer to generate the output sequence one element at a time. The decoder is also composed of a stack of N=6 identical layers. Each decoder layer has 3 sub-layers: Self attention, encoder-decoder attention, and Feed Forward.

Pre-Models Transformers models are

  1. Bidirectional Encoder Representations from Transformer (BERT) — BERT utilized the encoder part of the Transformer.
  2. Generative Pre-trained Transformer (GPT) — GPT uses only the decoder part of the Transformer processing the text in a unidirectional manner from left to right. Language Generative Tasks are performed here.

References —

  1. https://www.youtube.com/watch?v=SMZQrJ_L1vo
  2. https://jalammar.github.io/illustrated-transformer/

Finally

Hopefully, you enjoyed reading it. Buckle up, because our next blog is gonna be EPIC!

Got questions? Don’t be shy! Hit me up on LinkedIn. Coffee’s on me (virtually, of course) ??

A J.

Senior Test Engineer - Khoros | Application Support | Ex - DXC Technology | Ex- Hexaware Technologies Limited

5 个月

Hi Ishika I need a referral from your end

M Sharana Basava

Immediate Joiner | Embedded engineer | I completed 10 months of Hands-on Technical Training Program Emertxe @ Bangalore | I have 1 years of experience in customer support role

5 个月

I agree! #i have 1.6 year experience can i get a referral in your company #immediate joiner

Ishika Garg Hii I am Looking for a Job in Account Profile in any Reputed MNC Company and I just Put the MNC Tag on Myself that's why I wanna be the part of MNC Company like Genpect could you please give me any leads to Join. Thank You.

K Vijaya Kumar

Assistant Manager Operations l US Healthcare l Immediate Joiner l Operations Management l Team management

5 个月

Hi Ishika

回复

Thats a good point to explain , Just made me curious even after analysing our sentence how it fetch results , as i think they dont use search engines to fetch results

要查看或添加评论,请登录

Ishika Garg的更多文章

  • Linear Regression

    Linear Regression

    Today, we’re diving into the math behind one of the most fundamental models in machine learning: linear regression…

    12 条评论
  • RAG

    RAG

    RAG stands for Retrieval-Augmented Generation. It’s a game-changer when working with LLMs.

    6 条评论
  • Vector Database

    Vector Database

    In the world of databases, we’re all familiar with traditional databases like RDBMS. But have you heard about vector…

    9 条评论
  • LLM Models

    LLM Models

    LLMs are a category of foundation models trained on large amounts of data (such as books, articles, etc.), enabling…

    14 条评论
  • Foundation Model

    Foundation Model

    FOUNDATION MODEL is a versatile machine learning model that has been pre-trained on a vast amount of unlabelled, and…

    6 条评论

社区洞察

其他会员也浏览了