Discovering How Language Models Choose the Next Word

Discovering How Language Models Choose the Next Word

Joseph Weizenbaum, at MIT between 1964 and 1967, developed ELIZA, an early natural language processing program [1]. He was surprised that people attributed human-like emotions to this program. Today, we often make a similar mistake with Large Language Models, which are more advanced but still fundamentally simple machines.

In this example, we visualise how a small pre-trained model selects the next word for a sentence [2]. The model uses this new sentence as a starting point for the next word, forming a complete statement eventually.

To understand the model's word choices, we examine its 13 hidden layers, focusing on the top-rated tokens in each layer. These hidden layers are the building blocks for the model's final word choice in a sentence.

This analysis is a part of the "Primer in Generative AI for Business" workshop conducted by Philipp Thomann and myself [3].


[1] https://en.wikipedia.org/wiki/ELIZA

[2] https://huggingface.co/gpt2

[3] https://www.academy.d-one.ai/generative-ai


要查看或添加评论,请登录

Simon Hefti的更多文章

社区洞察

其他会员也浏览了