登录查看更多内容

Simplest Tutorials on BERT and XLNet

Niraj Kumar, Ph.D.

AI/ML R&D Leader | Driving Innovation in Generative AI, LLMs & Explainable AI | Strategic Visionary & Patent Innovator | Bridging AI Research with Business Impact

发布日期: 2020年1月25日

XLNet

XLNet: is a generalized autoregressive pre-training method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pre-training. Empirically, under comparable experiment settings, XLNet outperforms BERT on 20 tasks, often by a large margin, including question answering, natural language inference, sentiment analysis, and document ranking. I have used referneces - [3], [6], & [7] and tried to prepare interactive and simplest possible tutorials on XLNet.

BERT

BERT: (Bidirectional Encoder Representations from Transformers): It is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. BERT uses masked language models to enable pretrained deep bidirectional representations. The pre-trained representations reduce the need for many heavily-engineered taskspecific architectures. The pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications. I have used references - [1], [2], [3], [4], & [5] and tried to prepare interactive tutorials on BERT Language model.

Some Results from PaperswithCode (For more details visit the link)

References:

1. Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ?ukasz Kaiser, and Illia Polosukhin. "Attention is all you need." In Advances in neural information processing systems, pp. 5998-6008. 2017.

2. Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).

3. Clark, Kevin, Urvashi Khandelwal, Omer Levy, and Christopher D. Manning. "What Does BERT Look At? An Analysis of BERT's Attention." arXiv preprint arXiv:1906.04341 (2019).

4.Park, Jonggwon, Kyoyun Choi, Sungwook Jeon, Dokyun Kim, and Jonghun Park. "A Bi-directional Transformer for Musical Chord Recognition." arXiv preprint arXiv:1907.02698 (2019).

5. Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "Layer normalization." arXiv preprint arXiv:1607.06450 (2016).

6. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems (pp. 5754-5764).

7. Dai, Z., Yang, Z., Yang, Y., Cohen, W. W., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2018). Transformer-xl: Language modeling with longer-term dependency.

要查看或添加评论，请登录

Niraj Kumar, Ph.D.的更多文章

Internal Covariate Shift and Batch Normalization

2023年3月25日

Internal Covariate Shift and Batch Normalization

Internal Covariate Shift Internal covariate shift [1,2,3] refers to the phenomenon where the distribution of inputs to…
Forced/Guided Learning in Deep Learning

2023年3月11日

Forced/Guided Learning in Deep Learning

The forced/guided type deep learning techniques have proven their ability in any model that outputs in sequences. For…
Deep Clustering (A Self-Supervised Learning System)

2023年2月18日

Deep Clustering (A Self-Supervised Learning System)

If you are interested in any of the following, How do I develop a deep learning model, that can learn to do clustering?…
Time to Welcome - “The Quantum Deep Learning”

2023年1月21日

Time to Welcome - “The Quantum Deep Learning”

The Quantum World is Approaching Us The MIT xPRO - Quantum Computer Ai, highlighted the status of quantum AI by using…
Deep Learning for Dynamic Graph

2022年4月30日

Deep Learning for Dynamic Graph

Introduction. It is well understood that adding the time dimension to each and every component of the graph helps us in…
Winning Ensemble Classification Strategies

2020年6月6日

Winning Ensemble Classification Strategies

These days (1) due to the increase in the complexity of data, (2) data quality-related issues, and (2) the demand for…
Video Book on Deep Learning

2019年12月13日

Video Book on Deep Learning

I am happy to present a video book on deep learning. Thanks for all the email messages and suggestions.

3 条评论
Deep Learning for NLP Part-2

2019年10月12日

Deep Learning for NLP Part-2

Sequence transduction plays a very important role in natural language processing. The ability to transform and…
Loss Functions: Cross-Entropy, Categorical Cross-Entropy Loss, Binary Cross-Entropy Loss

2019年1月22日

Loss Functions: Cross-Entropy, Categorical Cross-Entropy Loss, Binary Cross-Entropy Loss

The following contains tutorial videos on (1) Cross-Entropy, (2) Categorical Cross-Entropy Loss, and (3) Binary…
Probabilistic graphical models for Deep Learning Part-1 (Restricted Boltzmann Machines)

2018年7月21日

Probabilistic graphical models for Deep Learning Part-1 (Restricted Boltzmann Machines)

RBM: Restricted Boltzmann machines are undirected graphical models that can also be interpreted as two-layered…

1 条评论

See all articles

Simplest Tutorials on BERT and XLNet

Niraj Kumar, Ph.D.

AI/ML R&D Leader | Driving Innovation in Generative AI, LLMs & Explainable AI | Strategic Visionary & Patent Innovator | Bridging AI Research with Business Impact

XLNet

BERT

References:

Niraj Kumar, Ph.D.的更多文章

社区洞察

其他会员也浏览了

Exploring the potential of open-source neural networks and their practical uses

Solving Math with GPT-4; Transformers and Recursive Problem-Solving; Open-source Falcon 40B; Orca 13B by Microsoft; OpenAI API Updates; and More;

Advanced Graph Deep Learning: A powerful field for AI researchers

Keras vs TensorFlow vs PyTorch : All You Need To Know

The Encoder Component of the Transformer Architecture: Source code Demystified

Build Deep Learning Models with TensorFlow training

Differentiable Manifolds

Paper Review: Titans: Learning to Memorize at Test Time

?? How a Mini Neural Network Reads Handwritten Digits! ??

Image Classification for Hand Signs 0-5: A Step-by-Step Guide to TensorFlow and CNN

XLNet

BERT

References:

Niraj Kumar, Ph.D.的更多文章

Internal Covariate Shift and Batch Normalization

Forced/Guided Learning in Deep Learning

Deep Clustering (A Self-Supervised Learning System)

Time to Welcome - “The Quantum Deep Learning”

Deep Learning for Dynamic Graph

Winning Ensemble Classification Strategies

Video Book on Deep Learning

Deep Learning for NLP Part-2

Loss Functions: Cross-Entropy, Categorical Cross-Entropy Loss, Binary Cross-Entropy Loss

Probabilistic graphical models for Deep Learning Part-1 (Restricted Boltzmann Machines)

社区洞察

其他会员也浏览了

Exploring the potential of open-source neural networks and their practical uses

Solving Math with GPT-4; Transformers and Recursive Problem-Solving; Open-source Falcon 40B; Orca 13B by Microsoft; OpenAI API Updates; and More;

Advanced Graph Deep Learning: A powerful field for AI researchers

Keras vs TensorFlow vs PyTorch : All You Need To Know

The Encoder Component of the Transformer Architecture: Source code Demystified

Build Deep Learning Models with TensorFlow training

Differentiable Manifolds

Paper Review: Titans: Learning to Memorize at Test Time

?? How a Mini Neural Network Reads Handwritten Digits! ??

Image Classification for Hand Signs 0-5: A Step-by-Step Guide to TensorFlow and CNN