The brains behind Google's Bard
In this newsletter, we will discuss one of the components of Google's AI known as BARD. The component is known as Bi-directional Encoder Representations from Transformer(BERT) an approach that performs analysis of a given text in a bidirectional manner.
Fundamentally, there are 2 approaches to applying Pre-trained Language Representation
- Embeddings from Language Models (ELM)
- Generative Pre-trained Transformer (GPT)
However, both of these approaches have a flaw. They both use a unidirectional language model(Left to Right or Right to left) to produce general language representations. BERT uses bi-directional pretraining to represent general language. At a basic level, there are 2 steps in this framework.
- Pre-training: Where unlabeled data is provided to the model to achieve different pretraining tasks
- Fine tunning: The parameters formed during pre-training are fine tuned with labelled data
There are 2 models that are currently available
- BERTbase (L = 12, H = 768, A = 12. TP = 110M)
- BERTlarge (L = 24, H = 1024, A = 16, TP = 340M)
L = Total number of layers(Transformer layers)
H = Hidden Size
A = No. of self-attention heads
TP = Total Parameters
Pre Training
Task 1: Masked LM
- The training data generator will randomly pick 15% of the words in a statement. If a position is chosen it will be replaced by [MASK] 80% of the time. Then a function T(i) will be used to randomly predict the masked word with cross-entropy loss.
领英推荐
Task 2: Next Sentence Predictor:
- Question Answering and Natural Language Interface are 2 tasks are based on understanding on the relationship between 2 sentences. The model is pre-trained on a corpus of data containing different sentences for a binarized next-sentence prediction task.
Pretraining Data:
BERT uses the Books Corpus(800M) words and the English Wikipedia (2,500M words) after removing tables, lists, and headers as a document-level corpus is more reliable.
Fine Tuning
All parameters pre-trained are now fin-tuned end to end. During fine-tuning input the sentence A and B are analogous to
1: sentence pairs in paraphrasing
2: hypothesis pairs in the establishment
3: Question-Passage pair in question answering
4: a degenerate text-? pair in text classification or sequence tagging.
It is impossible to discuss the complete BERT working on a single page, here is the link to the research paper published by Google explaining the complete working of BERT. If you are aspiring to become an Algorithm designer like me i would recommend you to read it
A few more technical newsletters are coming up stay tuned