The brains behind Google's Bard

The brains behind Google's Bard

In this newsletter, we will discuss one of the components of Google's AI known as BARD. The component is known as Bi-directional Encoder Representations from Transformer(BERT) an approach that performs analysis of a given text in a bidirectional manner.

Fundamentally, there are 2 approaches to applying Pre-trained Language Representation

  • Embeddings from Language Models (ELM)
  • Generative Pre-trained Transformer (GPT)

However, both of these approaches have a flaw. They both use a unidirectional language model(Left to Right or Right to left) to produce general language representations. BERT uses bi-directional pretraining to represent general language. At a basic level, there are 2 steps in this framework.

  • Pre-training: Where unlabeled data is provided to the model to achieve different pretraining tasks
  • Fine tunning: The parameters formed during pre-training are fine tuned with labelled data

There are 2 models that are currently available

  • BERTbase (L = 12, H = 768, A = 12. TP = 110M)
  • BERTlarge (L = 24, H = 1024, A = 16, TP = 340M)

L = Total number of layers(Transformer layers)

H = Hidden Size

A = No. of self-attention heads

TP = Total Parameters


Pre Training

Task 1: Masked LM

  • The training data generator will randomly pick 15% of the words in a statement. If a position is chosen it will be replaced by [MASK] 80% of the time. Then a function T(i) will be used to randomly predict the masked word with cross-entropy loss.

Task 2: Next Sentence Predictor:

  • Question Answering and Natural Language Interface are 2 tasks are based on understanding on the relationship between 2 sentences. The model is pre-trained on a corpus of data containing different sentences for a binarized next-sentence prediction task.

Pretraining Data:

BERT uses the Books Corpus(800M) words and the English Wikipedia (2,500M words) after removing tables, lists, and headers as a document-level corpus is more reliable.


Fine Tuning

All parameters pre-trained are now fin-tuned end to end. During fine-tuning input the sentence A and B are analogous to

1: sentence pairs in paraphrasing

2: hypothesis pairs in the establishment

3: Question-Passage pair in question answering

4: a degenerate text-? pair in text classification or sequence tagging.


It is impossible to discuss the complete BERT working on a single page, here is the link to the research paper published by Google explaining the complete working of BERT. If you are aspiring to become an Algorithm designer like me i would recommend you to read it

A few more technical newsletters are coming up stay tuned

要查看或添加评论,请登录

社区洞察

其他会员也浏览了