How are LLMs tackling the pertinent challenge of entropy?
Jasvin Bhasin
Bridging the next frontier for the new digital age II Keynote speaker
A core component of our civilization’s evolution has been the development of human language. As societies grew and human cognition advanced, language became not just a tool for survival but also a vehicle for abstract thought, cultural expression, and social cohesion.
Language is both a structured system and an evolving, unpredictable entity. For the longest time, the dichotomy between its ordered and chaotic elements presented a fascinating area of study, especially in the domain of natural language processing (NLP).
In fact, many years ago (I am much older than you think! ;-)) my bachelor thesis project dealing with the development of Big Data Deduplication NLP Application for multilingual data also had touch points to the concept of N-gram models.
For example one concept that encapsulates this tension is ‘entropy,’ a measure of unpredictability or randomness in a system.
In linguistics, entropy has been used to understand the complexities and uncertainties associated with the structure and usage of languages.
The origins of entropy
The concept of entropy originally emerged in the field of thermodynamics as a measure of disorder or randomness in isolated systems.
It was later adapted by Claude Shannon in the realm of information theory to quantify the information content. In an effort to ascertain the amount of information conveyed by text in the English language, he introduced a foundational concept: entropy.
His key revelation was that the predictability of a sequence of text inversely correlates with the information content per symbol. In a more formalised sense, entropy serves as a metric for gauging the predictability of a text sequence, under the assumption of near-perfect predictive capabilities.
For many years, subsequent to Shannon’s pioneering work, computational linguists have engaged in the methodological exercise of developing novel approaches to measure the entropy inherent to the English language. By contrasting entropy estimates with the empirical performance of language models in text prediction tasks, many evaluative frameworks have emerged for the tracking of advancements in the pursuit of achieving human-level proficiency in language modelling.
The idea was that entropy offers a more holistic perspective, capturing the nuanced complexities that are intrinsic to natural language, thereby providing a comprehensive measure of a model’s linguistic competence.
But entropy is not just a number
Entropy in language isn’t merely an abstract concept; it has had many real-world implications. High entropy can complicate various NLP tasks like machine translation, text summarization, and sentiment analysis. The unpredictability could lead to ambiguous interpretations, reducing the accuracy and reliability of these systems.
For instance, the polysemy of words — words having multiple meanings — increases entropy and poses challenges in machine understanding. Imagine a machine trying to translate a sentence from English to French but getting stuck because the word “bank” could mean both a financial institution and the side of a river. That’s entropy causing a little chaos, and that’s why it’s essential to manage it effectively.
So what happened in this space before the age of Large Language Models (LLMs) struck?
Before the advent of LLMs like ChatGPT, several methods were employed to tackle language entropy.
Rule-based systems were the early pioneers, employing a deterministic approach guided by syntactic and semantic rules. However, they were inflexible and struggled with exceptions and irregularities.
Statistical methods, such as N-gram models, offered a probabilistic approach. They predicted the next word based on the frequency of occurrence of word sequences in the training data. Despite their relative success, these models couldn’t capture long-term dependencies or understand context beyond the preceding few words, leaving a gap in effective entropy management.
The ChatGPT Paradigm: A Multi-dimensional Approach to Entropy Management
With the launch of LLMs such as ChatGPT we experienced a paradigm shift in how we understand and manage entropy in NLP. They employ a nuanced and multi-dimensional strategy to manage the topic of entropy.
ChatGPT's abilities is a product of its underlying Transformer technology, a 2017 invention of Google (Vaswani et al. 2017). Transformer technology has demonstrated impacts well beyond language generation, across a range of generative AI tasks in the enterprise, from the Internet of Things to robotics.
Let us have a look at the core features and how they tie with the topic of entropy.
Softmax Layer
The softmax function at the output layer plays a crucial role in managing entropy. Softmax normalizes the logits (raw scores) for each token in the vocabulary so that they become probabilities that sum to one. The use of the softmax function allows the model to not just select the most likely next word, but also quantify how much more likely it is compared to other candidates. In essence, it allows the model to express a kind of “confidence” in its choices, which can be seen as a way to manage the uncertainty or entropy in the language.
领英推荐
It transforms the entropy problem from an abstract concept into a computable form.
Contextual Embeddings
Contextual embeddings are high-dimensional vectors that the model uses to represent words in a way that captures both their semantic meaning and their role in the specific context where they appear. Traditional one-hot encoding or word embeddings like Word2Vec or GloVe do not offer this level of granularity. Contextual embeddings allow the model to understand words like “bank” differently in the context of “river bank” versus “savings bank.”
This contextual understanding helps to reduce entropy by making the model’s predictions more context-sensitive and thus more accurate.
Attention Mechanisms
The attention mechanism in the Transformer architecture essentially allows the model to focus more on specific parts of the input when making a prediction. This is a dynamic operation: the parts of the text that the model “attends to” can differ from one prediction to the next. This adaptability is crucial for managing entropy.
For instance, if the model is generating a sentence and needs to decide whether a pronoun like “it” refers to a “dog” or a “cat” mentioned earlier in the text.
The attention mechanism allows the model to “look back” at the relevant parts of the input to make a more informed, less random choice.
Training on Large Datasets
ChatGPT is trained on a massive corpus of text data, often encompassing billions of tokens. This comprehensive training data allows the model to learn the intricacies of language, including idiomatic expressions, common phrases, and even some domain-specific jargon. By learning the statistical properties of the language, the model is better equipped to reduce the uncertainty associated with any given text generation task.
It’s a form of empirical grounding that serves to lower the entropy of the generated text.
Temperature Parameter during Sampling
When generating text, you can manipulate the “temperature” to control how conservative or adventurous the model is in its word choices. A lower temperature value (closer to 0) will make the model more deterministic, often sticking to more common words or phrases. A higher value (closer to 1 or above) allows for more creative or unexpected outputs.
This is a direct way to control the entropy of the generated text, making it either more predictable or more varied, depending on the desired outcome.
The way forward
Despite all these advancements, some questions will still continue to challenge us.
“How can the net amount of entropy of the universe be massively decreased?” - Alexander Adell to Multivac in Isaac Asimov’s “The Last Question” (1956)
The answer lies perhaps in the further development of transformer-driven generative AI with other noteworthy models which draw inspiration from basic physics such diffusion and Poisson flow generative models (PFGMs).
More on that in future posts!
Follow me for exciting food for thought to bridge.the.NEXT( ) in this stone age of the new digital age ??
If you like what you read, why not give it a thumbs up?
Also, would be glad to read your thoughts in the comments ??
Appreciate it! ??
*Disclaimer: Nothing in this article constitutes as financial advice. Always do your own research.
? Infrastructure Engineer ? DevOps ? SRE ? MLOps ? AIOps ? Helping companies scale their platforms to an enterprise grade level
6 个月Large Language Models (LLMs) are innovatively addressing the challenge of entropy in language systems, paving the way for exciting advancements in NLP. Your article promises intriguing insights into this evolving landscape Jasvin Bhasin
GEN AI Evangelist | #TechSherpa | #LiftOthersUp
6 个月LLMs bring balance between structure and creativity. Intriguing perspectives Jasvin Bhasin