Discovering How Language Models Choose the Next Word
Simon Hefti
Creating value from data. Co-Founder of D ONE and herlock.ai. Physicist, Data Scientist, Lecturer and Bassist.
Joseph Weizenbaum, at MIT between 1964 and 1967, developed ELIZA, an early natural language processing program [1]. He was surprised that people attributed human-like emotions to this program. Today, we often make a similar mistake with Large Language Models, which are more advanced but still fundamentally simple machines.
In this example, we visualise how a small pre-trained model selects the next word for a sentence [2]. The model uses this new sentence as a starting point for the next word, forming a complete statement eventually.
To understand the model's word choices, we examine its 13 hidden layers, focusing on the top-rated tokens in each layer. These hidden layers are the building blocks for the model's final word choice in a sentence.
This analysis is a part of the "Primer in Generative AI for Business" workshop conducted by Philipp Thomann and myself [3].