BertViz - Visualizing Attention in Transformers
Arun Krishnan
Entrepreneur, Technology Leader, Business Leader, Analytics, AI, GenAI, Author Experienced Data Science and AI professional and leader. Driving Data Science, AI and GenAI technology and business growth
With the increasing use of LLMs and Transformers in organisations, users are starting to demand explainability from transformer models. Why do transformers output what they are outputting?
This article explains BertViz, a tool for visualizing attention within transformer models. Attention, as you all know by now, is central to the modern transformer architecture and was first introduced in the seminal paper, "Attention is all you need". It not only allows for larger contexts than both RNNs and LSTMs but also enables parallel processing and hence is faster to train.
My most A-ha moment from the article was how visualizing attention can bring out inherent biases in the model. The article explains this with a great example:
Changing the pronoun at the end of the sentence leads to two completely different endings because the model implicitly assumes that the "She" refers to the nurse and the "He" to the doctor.
Do give this a read!