Simplest Tutorials on BERT and XLNet

Simplest Tutorials on BERT and XLNet

XLNet

XLNet: is a generalized autoregressive pre-training method that (1) enables learning bidirectional contexts by maximizing the expected likelihood over all permutations of the factorization order. Furthermore, XLNet integrates ideas from Transformer-XL, the state-of-the-art autoregressive model, into pre-training. Empirically, under comparable experiment settings, XLNet outperforms BERT on 20 tasks, often by a large margin, including question answering, natural language inference, sentiment analysis, and document ranking. I have used referneces - [3], [6], & [7] and tried to prepare interactive and simplest possible tutorials on XLNet. 


BERT

BERT: (Bidirectional Encoder Representations from Transformers): It is designed to pretrain deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers. BERT uses masked language models to enable pretrained deep bidirectional representations. The pre-trained representations reduce the need for many heavily-engineered taskspecific architectures. The pre-trained BERT model can be finetuned with just one additional output layer to create state-of-the-art models for a wide range of tasks, such as question answering and language inference, without substantial taskspecific architecture modifications. I have used references - [1], [2], [3], [4], & [5] and tried to prepare interactive tutorials on BERT Language model. 

Some Results from PaperswithCode (For more details visit the link)

No alt text provided for this image

References:

1. Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ?ukasz Kaiser, and Illia Polosukhin. "Attention is all you need." In Advances in neural information processing systems, pp. 5998-6008. 2017.

2. Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "Bert: Pre-training of deep bidirectional transformers for language understanding." arXiv preprint arXiv:1810.04805 (2018).

3. Clark, Kevin, Urvashi Khandelwal, Omer Levy, and Christopher D. Manning. "What Does BERT Look At? An Analysis of BERT's Attention." arXiv preprint arXiv:1906.04341 (2019).

4.Park, Jonggwon, Kyoyun Choi, Sungwook Jeon, Dokyun Kim, and Jonghun Park. "A Bi-directional Transformer for Musical Chord Recognition." arXiv preprint arXiv:1907.02698 (2019).

5. Ba, Jimmy Lei, Jamie Ryan Kiros, and Geoffrey E. Hinton. "Layer normalization." arXiv preprint arXiv:1607.06450 (2016).

6. Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R. R., & Le, Q. V. (2019). Xlnet: Generalized autoregressive pretraining for language understanding. In Advances in neural information processing systems (pp. 5754-5764).

7. Dai, Z., Yang, Z., Yang, Y., Cohen, W. W., Carbonell, J., Le, Q. V., & Salakhutdinov, R. (2018). Transformer-xl: Language modeling with longer-term dependency.

要查看或添加评论,请登录

Niraj Kumar, Ph.D.的更多文章

社区洞察

其他会员也浏览了