Deep Learning for NLP Part-2
Niraj Kumar, Ph.D.
AI/ML R&D Leader | Driving Innovation in Generative AI, LLMs & Explainable AI | Strategic Visionary & Patent Innovator | Bridging AI Research with Business Impact
Sequence transduction plays a very important role in natural language processing. The ability to transform and manipulate sequences from one type to another type is a crucial part of human intelligence. These days attention-based mechanism supports different types of sequence transduction like (including but not limited to): Sequence to sequence mapping, machine translation, text to speech, speech to text, text to selective summary generation, protein secondary structure prediction and so on. The development of attention-based mechanism has improved the bottlenecks of traditional encoder-decoder architectures. To achieve this, we used to put attention layer(s) between encoder and decoder layers. The way it makes changes, can be used to define it also –
Attention Definition: Given a set of vector values, and a vector query, attention is a technique to compute a weighted sum of the values, dependent on the query.
Progress in attention mechanism.
If we see the progress in attention based mechanisms, we find that most of the scientific literature just consider one-two fixed architectural places for modifications. This will clear from the following steps.
Steps to apply attention in sequence transduction.
NOTE: The transformer model [1] uses the scaled dot product based attention. In the next article, I will try to cover BERT and XLNet.
Tutorials on Attention Based Models and Transformer Model for NLP
Reference.
- Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, ?ukasz Kaiser, and Illia Polosukhin. "Attention is all you need." In Advances in neural information processing systems, pp. 5998-6008. 2017.
- Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
- Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning. "Effective approaches to attention-based neural machine translation." arXiv preprint arXiv:1508.04025 (2015).
- Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." Advances in neural information processing systems. 2014.
- Wu, Y., Schuster, M., Chen, Z., Le, Q. V., Norouzi, M., Macherey, W., ... & Klingner, J. (2016). Google's neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144.