Integrated Gradients — Interpreting the LLM decision making process
Ankush Seth
CTO @ Mi Analyst | Helping businesses accelerate growth and efficiency with Gen AI
Large Language Models have attracted a lot of attention in recent times. Through the likes of ChatGPT these models have started to play a big role in the areas of content generation, customer support channels such as chat bots, content localization and even content curation. As the use cases and adoption increases there is a need to ensure accuracy, transparency and validation against bias. Therefore, understanding the behind the scenes decision making process is important in addressing these concerns.
It is important to understand that language models like GPT-3.5, do not operate using traditional decision trees that you can trace in the same way as traditional classifiers. Instead, Large Language Models (LLMs) use deep learning techniques, specifically transformers, which are neural network architectures designed for sequence-to-sequence tasks, making their decision-making processes harder to decipher.
However, one can leverage various interpretation techniques (integrated gradients, attention maps, input-output pairs, etc.) to gain insight into the decision making process. In this post we will focus on Integrated gradients as an interpretation technique.
Gradient-based techniques help highlight which words or tokens in an input sequence have significant impact on the model’s output. One can compute gradients with respect to the input to identify influential words or phrases.
The formula for Integrated Gradients looks like this -
领英推荐
We won’t get into math in this post but instead translate how this is applied in the deep learning context from a high level perspective. There are four key steps one typically follows to apply Integrated Gradients to a deep learning model :
Using the above approach one can gain insight into the model’s interpretation process for a specific use-case. Furthermore, one can build a confidence heat map by applying this technique over a wide variety of test input data. As mentioned earlier this is just one interpretation technique. Depending upon the questions one is trying to address, some of the other techniques might need to be leveraged as well.