ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

How are LLMs tackling the pertinent challenge of entropy?

Jasvin Bhasin

Bridging the next frontier for the new digital age II Keynote speaker

å‘å¸ƒæ—¥æœŸ: 2024å¹´5æœˆ17æ—¥

A core component of our civilizationâ€™s evolution has been the development of human language. As societies grew and human cognition advanced, language became not just a tool for survival but also a vehicle for abstract thought, cultural expression, and social cohesion.

Language is both a structured system and an evolving, unpredictable entity. For the longest time, the dichotomy between its ordered and chaotic elements presented a fascinating area of study, especially in the domain of natural language processing (NLP).

In fact, many years ago (I am much older than you think! ;-)) my bachelor thesis project dealing with the development of Big Data Deduplication NLP Application for multilingual data also had touch points to the concept of N-gram models.

For example one concept that encapsulates this tension is â€˜entropy,â€™ a measure of unpredictability or randomness in a system.

In linguistics, entropy has been used to understand the complexities and uncertainties associated with the structure and usage of languages.

The origins of entropy

The concept of entropy originally emerged in the field of thermodynamics as a measure of disorder or randomness in isolated systems.

It was later adapted by Claude Shannon in the realm of information theory to quantify the information content. In an effort to ascertain the amount of information conveyed by text in the English language, he introduced a foundational concept: entropy.

His key revelation was that the predictability of a sequence of text inversely correlates with the information content per symbol. In a more formalised sense, entropy serves as a metric for gauging the predictability of a text sequence, under the assumption of near-perfect predictive capabilities.

For many years, subsequent to Shannonâ€™s pioneering work, computational linguists have engaged in the methodological exercise of developing novel approaches to measure the entropy inherent to the English language. By contrasting entropy estimates with the empirical performance of language models in text prediction tasks, many evaluative frameworks have emerged for the tracking of advancements in the pursuit of achieving human-level proficiency in language modelling.

The idea was that entropy offers a more holistic perspective, capturing the nuanced complexities that are intrinsic to natural language, thereby providing a comprehensive measure of a modelâ€™s linguistic competence.

But entropy is not just a number

Entropy in language isnâ€™t merely an abstract concept; it has had many real-world implications. High entropy can complicate various NLP tasks like machine translation, text summarization, and sentiment analysis. The unpredictability could lead to ambiguous interpretations, reducing the accuracy and reliability of these systems.

For instance, the polysemy of words â€” words having multiple meanings â€” increases entropy and poses challenges in machine understanding. Imagine a machine trying to translate a sentence from English to French but getting stuck because the word â€œbankâ€ could mean both a financial institution and the side of a river. Thatâ€™s entropy causing a little chaos, and thatâ€™s why itâ€™s essential to manage it effectively.

So what happened in this space before the age of Large Language Models (LLMs) struck?

Before the advent of LLMs like ChatGPT, several methods were employed to tackle language entropy.

Rule-based systems were the early pioneers, employing a deterministic approach guided by syntactic and semantic rules. However, they were inflexible and struggled with exceptions and irregularities.

Statistical methods, such as N-gram models, offered a probabilistic approach. They predicted the next word based on the frequency of occurrence of word sequences in the training data. Despite their relative success, these models couldnâ€™t capture long-term dependencies or understand context beyond the preceding few words, leaving a gap in effective entropy management.

The ChatGPT Paradigm: A Multi-dimensional Approach to Entropy Management

With the launch of LLMs such as ChatGPT we experienced a paradigm shift in how we understand and manage entropy in NLP. They employ a nuanced and multi-dimensional strategy to manage the topic of entropy.

ChatGPT's abilities is a product of its underlying Transformer technology, a 2017 invention of Google (Vaswani et al. 2017). Transformer technology has demonstrated impacts well beyond language generation, across a range of generative AI tasks in the enterprise, from the Internet of Things to robotics.

Let us have a look at the core features and how they tie with the topic of entropy.

Softmax Layer

The softmax function at the output layer plays a crucial role in managing entropy. Softmax normalizes the logits (raw scores) for each token in the vocabulary so that they become probabilities that sum to one. The use of the softmax function allows the model to not just select the most likely next word, but also quantify how much more likely it is compared to other candidates. In essence, it allows the model to express a kind of â€œconfidenceâ€ in its choices, which can be seen as a way to manage the uncertainty or entropy in the language.

é¢†è‹±æŽ¨è

How to Evaluate the Performance of Large Language Models (LLMs)

How to Evaluate the Performance of Large Languageâ€¦

SP Software (P) Limited 8 ä¸ªæœˆå‰

When the Machines Will Start Speaking Human and How It Will Benefit Web 3.0

When the Machines Will Start Speaking Human and How Itâ€¦

Adello 2 å¹´å‰

The Technical Essence and Future Path of Large Language Models in AI

The Technical Essence and Future Path of Largeâ€¦

OnFinance AI 11 ä¸ªæœˆå‰

It transforms the entropy problem from an abstract concept into a computable form.

Contextual Embeddings

Contextual embeddings are high-dimensional vectors that the model uses to represent words in a way that captures both their semantic meaning and their role in the specific context where they appear. Traditional one-hot encoding or word embeddings like Word2Vec or GloVe do not offer this level of granularity. Contextual embeddings allow the model to understand words like â€œbankâ€ differently in the context of â€œriver bankâ€ versus â€œsavings bank.â€

This contextual understanding helps to reduce entropy by making the modelâ€™s predictions more context-sensitive and thus more accurate.

Attention Mechanisms

The attention mechanism in the Transformer architecture essentially allows the model to focus more on specific parts of the input when making a prediction. This is a dynamic operation: the parts of the text that the model â€œattends toâ€ can differ from one prediction to the next. This adaptability is crucial for managing entropy.

For instance, if the model is generating a sentence and needs to decide whether a pronoun like â€œitâ€ refers to a â€œdogâ€ or a â€œcatâ€ mentioned earlier in the text.

The attention mechanism allows the model to â€œlook backâ€ at the relevant parts of the input to make a more informed, less random choice.

Training on Large Datasets

ChatGPT is trained on a massive corpus of text data, often encompassing billions of tokens. This comprehensive training data allows the model to learn the intricacies of language, including idiomatic expressions, common phrases, and even some domain-specific jargon. By learning the statistical properties of the language, the model is better equipped to reduce the uncertainty associated with any given text generation task.

Itâ€™s a form of empirical grounding that serves to lower the entropy of the generated text.

Temperature Parameter during Sampling

When generating text, you can manipulate the â€œtemperatureâ€ to control how conservative or adventurous the model is in its word choices. A lower temperature value (closer to 0) will make the model more deterministic, often sticking to more common words or phrases. A higher value (closer to 1 or above) allows for more creative or unexpected outputs.

This is a direct way to control the entropy of the generated text, making it either more predictable or more varied, depending on the desired outcome.

The way forward

Despite all these advancements, some questions will still continue to challenge us.

â€œHow can the net amount of entropy of the universe be massively decreased?â€ - Alexander Adell to Multivac in Isaac Asimovâ€™s â€œThe Last Questionâ€ (1956)

The answer lies perhaps in the further development of transformer-driven generative AI with other noteworthy models which draw inspiration from basic physics such diffusion and Poisson flow generative models (PFGMs).

Jasvin Bhasinçš„æ›´å¤šæ–‡ç«

Why will the inability to manage "agreement" do more harm in this AI age?

2024å¹´5æœˆ29æ—¥

Why will the inability to manage "agreement" do more harm in this AI age?

It is easy to be overwhelmed with the rapid pace with which companies such as Open AI and Google are launchingâ€¦

2 æ¡è¯„è®º
How can classic science fiction help us tackle the complexities of the AI age?

2024å¹´5æœˆ24æ—¥

How can classic science fiction help us tackle the complexities of the AI age?

It has been a while since I last read Douglas Adamsâ€™ iconic work â€” â€œThe Hitchhikerâ€™s Guide to the Galaxyâ€. I do notâ€¦

1 æ¡è¯„è®º
Why is it worth exploring the AI hype on first principles level?

2024å¹´5æœˆ21æ—¥

Why is it worth exploring the AI hype on first principles level?

I realised after my last article that the topic of entropy is a tricky one and might sometimes even be unexplored andâ€¦
Why you should know about DePin during the current AI hype

2024å¹´5æœˆ14æ—¥

Why you should know about DePin during the current AI hype

By now most of us are aware of amazing generative AI products such as ChatGPT, Midjourney and the like. We use them forâ€¦

9 æ¡è¯„è®º
The internet is dying but it is not what you think.

2024å¹´5æœˆ3æ—¥

The internet is dying but it is not what you think.

Correction. The internet is dead.
IAA Munich 2023 - Automotive tradition freshly interpreted with the future of mobility

2023å¹´9æœˆ5æ—¥

IAA Munich 2023 - Automotive tradition freshly interpreted with the future of mobility

Imagine that you and your cool frenemies got together to have the coolest party every year and showcase your latestâ€¦
The Real Secret Sauce Behind Nvidia's Success ??

2023å¹´9æœˆ4æ—¥

The Real Secret Sauce Behind Nvidia's Success ??

So, Nvidia is definitely the stockmarketsâ€™ darling lately, having recently reached a trillion dollar evaluation. Andâ€¦

See all articles

How are LLMs tackling the pertinent challenge of entropy?