登录查看更多内容

Data Science Letters: New Series. 01. On Individual Crushes Against Entropy.

Goran S. Milovanovi?, Phd

Chief AI Officer @smartocto.com

发布日期: 2024年5月11日

This time, after a long absence - I will explain - I will write about entropy, predictability, anxiety, the principle of Occam's razor, and tolerance for ambiguity. But first...

The New Series

After a considerable time during which my thoughts were thoroughly dedicated to, among other things, the question of whether it makes more sense for me to write something new at all, I decided to start a new series of Data Science Letters. I am convinced that sooner or later, every author begins to ask themselves such questions. But this was different, as I will attempt to explain in this newsletter. So far, I have written a lot: I started journalism in computer magazines of former Yugoslavia in the 80s when I was only thirteen years old, followed by scientific papers in journals during my studies all the way to my doctorate, as early as the 2000s I was an editor and co-author of several books in the fields of Internet behavior, online perceptions and attitudes, and Internet Governance (and even Geopolitics). I was a blogger on the most visited blog platform in Serbia (former blog of the daily newspaper Blic - the best pieces according to my choice are archived), wrote on LinkedIn, wrote for Talas.rs on Artificial Intelligence, and for years ran the (now archived) Exactness of Mind blog dedicated to Data Science. I have written a lot. I've always enjoyed it, particularly the advantages of being a fast, fluent writer. After all, who doesn't enjoy excelling in a skill they have mastered?

In the new series of Data Science Letters, some changes are taking place. First of all, this newsletter is no longer branded by my consulting firm, DataKolektiv . Dedicated to education in Data Science, Machine Learning, and Artificial Intelligence, finding and selecting personnel for these fields, assembling teams, and consulting on projects if they are truly, truly interesting, it has for some time led its own deserved, separate life. I realised that it begins to bother my thoughts and expression if it is always on my mind, amidst the various topics that spark my curiosity. Additionally, in Data Science Letters, I am not writing from my primary position as the Lead Data Scientist and head of the LABS team at smartocto. This is simply me, and any view or opinion I express here does not represent the official stance or opinion that smartocto or DataKolektiv would necessarily support - no matter how natural it might seem that it could be so. When I am a manager, I do not want to be an author; to paraphrase Steve Jobs, then I want to listen to those smarter than myself. However, when I am an author, I do not want to be a manager: I have realized that this inevitably leads to excessive optimization of the text. Thus, starting with the new series, Data Science Letters will be a pure expression of thoughts, a reflection on the phenomena of artificial and digital that are shaping a new reality, a reality from which the human species can no longer escape - although there seems to be a contemporary spirit that feels the need to do so.

Another decision related to the new series of Data Science Letters is that I will use, as much as possible, a language that will be accessible to a much broader audience than just experts and enthusiasts in Data Science, Machine Learning, and AI. As a scientist, I have written tons of technical text; at my age, I think I can confidently leave exactness to Python and R code.

The Fundamental Author's Dilemma

I would return to the dilemma I faced for so long: does it even make sense to write something new? What caused this dilemma? Entropy, to disrupt my promise with one word to write with a less technically burdened vocabulary. Let me explain. Over the past two years, I have been extremely focused, for professional reasons alongside my natural curiosity and interest in fundamental science and engineering, on topics in the arena of generative AI. It has not escaped any reader, even those only superficially informed, that the planet has been buzzing daily about, or with, generative AI since the fall of 2022. Add to this that if you follow the field on a professional level, we are witnessing an era in the development of information technology in which something new occurred every week, and sometimes every day in the past two years, which needed to be recognized, examined, classified, evaluated, and carefully systematised into a bestiary of new ideas.

Midjourney's bad hallucination on entropy.

From my chapter "The Individual in the Global Information Society: Concept, Theory, and Research of the Information Society" in the book "Global Citizens" (published in Serbian by Belgrade Open School), which I co-edited in 2023:

"Our basic cognitive abilities, at the level of sensory processes and perception based on the forms of energy distribution in our physical environment, are determined by the structure of the nervous system, which is formed in an extremely long and complex evolutionary process of adaptation to the environment. Thus, humans are able to register only limited spectra of energy distributions in their surroundings: we see light whose wavelength is in the range of 380 to 780 nanometers and are unable to register electromagnetic radiation outside this range; sound, mechanical vibrations in our environment, are perceived if their frequency lies between 20 and 20,000 hertz. As our basic sensory processes are limited, they have constituted our exclusive world, i.e., what is the world for us, our higher cognitive abilities are also significantly limited, again in several ways. The duration of memory traces in our short-term memory, the assumed "module" of our mental apparatus that enables us to immediately focus attention on the information we receive (like the linguistic processing of sentences you are reading right now), is limited to a few seconds. After many discussions by experimental psychologists, most agree that the capacity of short-term memory itself is limited to processing 7±2 (comment in this newsletter only: see George A. Miller's famous 1956 paper "The Magical Number Seven, Plus or Minus Two") items, and it is difficult to significantly expand this capacity. However, it appears that in everyday dealings with our environment, this capacity of short-term memory is sufficient for stable adaptation to environmental conditions: communication with community members, reading written materials, etc. [...] The question that arises as essential in the discussion of the relationship between the individual and their global information limitations in a psychologically relevant way poses as the question of the individual's ability to cope with the pressure of processing a huge, constantly growing mass of information, with which they have become surrounded, thanks to the information technology revolution."

And simply tracking all that information - add to that learning all the new technologies - hit me, and hit me hard: for several months I found myself in a state where I thought that every day we have so many new pieces of information and knowledge that there simply isn't enough time or space (nor sense!) for someone like me, who has to read it all, to contribute anything! There is a subtlety in this, and since I am a cognitive psychologist by basic vocation, I know how this sea of information led to the state I found myself in (which an average HR professional would call a transient burnout and advise a walk in nature, I presume). Entropy.

Predictability

People like predictability. The feeling that we can predict and control our environment, both natural and social, by knowing the regularities that govern it - which large language models, generative AI, believe are mere correlations, purely associative in nature - is a fundamental feeling that gives us security. When a person is deprived of the sense of control and predictability of the world around them and their actions in it, they become anxious. Prolonged anxiety, accompanied by frustration in every attempt to change something around them through their own actions, ends in the phenomenon of learned helplessness, well studied in learning theories.

Can we quantify, measure the predictability of our environment somehow? To show you how, I will need two large circles, and within them a larger number of smaller circles. Here.

I simply adore doing this to generative AIs. My solution in Inkscape next.

No. Better:

High entropy to the left and lower entropy to the right. The right is more predictable than the left. Chaos to the right, order to the left!

The probability of randomly drawing a white or black circle from the left large circle is the same: 5/10, or 1/2, or 0.5. These probabilities are 8/10 or 0.8 for black and 2/10 or 0.2 for white circles in the right circle, respectively (just count the number of white and black circles). In the left large circle, we find five white, smaller circles, and an equal number of black circles. In the right circle, there are eight black, smaller circles, and only two white ones. The right circle, viewed as a system from which we can randomly draw a circle without knowing in advance which one we will get - viewed, therefore, as a random variable - has lower entropy than the left circle. I will explain.

Imagine you need to bet on the color of the next randomly drawn circle from one of the two large circles in the picture. Suppose you can earn a dollar each time you correctly predict whether a white or black circle will be drawn before it is drawn, and you must pay a money each time you incorrectly predict the color of the circle that will be drawn. Which large circle would you choose for your game: the left or the right?

In the left circle, black and white circles are randomly drawn with a probability of 1/2: there you can never be sure what will happen. Suppose you decide to always predict "black," then no matter what comes out (not properly weighted classification models, or those simply missing a decent ROC analysis, do this): about half the time you would be right, and half not, and the dollars you would earn would offset the dollars you would have to pay for misses. In the long run, with the left circle, you are at zero. What if we tried to do the same playing on the right circle where there are many more (eight) black circles? Well, if you constantly predicted "black," in the long run you would be right about 80% of the time, while you would be wrong about 20% of the time with that guessing strategy. And you would make some money! Why? Because the right circle, a circle of lower entropy, is more predictable.

We can understand entropy as a measure of the predictability of a system: high entropy systems are difficult to predict, while low entropy systems are more predictable. How do we really measure entropy?

Entropy

Since I am limited by the fact that I cannot use LaTeX on LinkedIn, thus I am unable to even type the simplest formulas, here is the fantastic Josh Starmer in the episode "Entropy (for data science) Clearly Explained!!!" from his (highly recommended) StatQuest with Josh Starmer YouTube channel to explain:

领英推荐

Data Roles, Small Language Models, Knowledge Graphs…

Towards Data Science 1 个月前

Building Reliable RAG-Based LLM Applications: Key Pain…

Data Science Dojo 11 个月前

Tracing the Roots of Data Science: From Statistics to…

Iain Brown PhD 4 个月前

To put it in a nutshell:

We first begin by an understanding that rare events carry more information for us than predictable, frequent events: "A war has just started in Asia." carries more information and brings more surprise than the fact that "The Sun rises in the east".
Thus, to measure surprise, we can take the inverse of its probability, p, like this: 1/p; but for technical reasons we choose to measure surprise as -log(p) - we do that simply because logarithm has some nice properties (making the surprise of conjunctive events additive, for example).
With -log(p) as a measure of surprise induced by any event with probability p, from the properties of the logarithmic function we also have that (a) surprise from a certain - i.e. p = 1 - event is zero, while the surprise from an impossible event - i.e. p = 0 - is infinite, which intuitively makes a lot of sense.
Then, if observe any random system with any number of discrete states, such as an urn with black and white balls in it put under random sampling, we define its entropy to be simply the average amount of information - the average surprise - that its states carry. For a system with two states - "a black ball is drawn" and "a white ball is drawn" - we would have:
-(p(Black)*log2(p(Black)) - p(White)*log2(p(White))
Systems with high entropy carry a large surprise on the average, which means that they are unpredictable, while systems of low entropy carry a lower surprise and are more predictable!
You should now be able to deal with entropy even in a simplest Excel sheet.

Entropy and Hype

If you think that I am now about to write on all different types of entropy and related concepts used in Machine Learning and AI, e.g.

Shannon Entropy, Cross-Entropy, Joint Entropy, Conditional Entropy, Relative Entropy (Kullback-Leibler Divergence), Mutual Information, Perplexity, Differential Entropy, Renyi Entropy, Tsallis Entropy, Cumulative Residual Entropy...

- I will not. Rather, I would return to the dilemma from which I started, and somewhat explain its origin: my frantic tracking of information and developments in the arena of generative AI since the fall of 2022 had at one point led me to despair, which was mostly reflected in my indifference to the idea of ever starting to write (again). Instead of continuing to list one type of entropy after another, allow me to mention just some of the prompt engineering techniques that are mentioned across various sources:

Prompt Chaining, Chain-of-Thought Prompting, Few-Shot Prompting, Tree of Thoughts (ToT), Self-Consistency, Zero-Shot Prompting, Automatic Prompt Engineer (APE), ReAct Prompting, Automatic Reasoning and Tool-use (ART), Retrieval Augmented Generation (RAG), Directional Stimulus Prompting, Instruction prompting, Hybrid prompting, Negative prompting, Contextual prompting, Adversarial prompting, Recursive prompting, Meta prompting, Prompt chaining, Soft prompting, Hard prompting, Dynamic prompting, Imaginative prompting, Analytical prompting, Explorative prompting, Conditional prompting, Iterative refinement prompting

And everything would be fine were it were not for the following fact: except in rare cases, there is at least a 90% overlap among all the mentioned techniques. Hardly any of them deserve a separate name, and it is certain that with it - thanks to the hyper-production of scientific papers or simply the insane efforts of individuals to write something that will distinguish them from others comes a plethora of synonyms for the same thing.

I can only say so much about prompt engineering, which is a simple skill that I guarantee you won’t need more than six to ten hours of practice to completely master and use at will for your needs. As for the tons of text I had to go through, which were certainly over 75% redundant (overlapping in content), seeking specks of relevant, new knowledge and information, whether written from a scientific, engineering, programming, or business perspective - better don't get me started. The hype that generative AI, especially LLMs, has created, has produced such high entropy that at one point I thought that online authors would somehow manage to simply use up all the available words that exist in all languages of the world, leaving humanity without text - I certainly was left speechless!

Entropy: at some point in time since Autumn 2022, everything in the generative AI began to seem equiprobable, causing the perception of the whole AI narrative to explode in high entropy. AGI yes? AGI no? AGI 50:50? (Of course not, or at least not this time, to be honest: we are nowhere close to Artificial General Intelligence, abr. AGI).

Here goes the cure for pain.

Pluralitas non est ponenda sine necessitate

(“plurality should not be posited without necessity.”)

(Occam’s razor, principle stated by the Scholastic philosopher William of Ockham (1285–1347/49), often stated as: "Entities are not to be multiplied beyond necessity.", source: Occam’s razor, Britannica)

I wonder if it's possible that people have truly forgotten such principles.

Occam's Razor really boils down to keeping things simple. It suggests that the easiest explanation is usually the right one. This idea comes in handy when we're swamped by the sheer amount of data and complexity that's typical in our digital lives. If you stick to the basics and cut through the clutter, decisions become easier, stress levels can drop, and everything seems less overwhelming. Think about a data scientist drowning in data points and potential models; using Occam’s Razor means picking the simplest model that does the job, which lightens the mental load and eases anxiety. Basically, it's telling us not to make things more complicated than they need to be. When you're facing a problem, the solution doesn't have to be fancy or loaded with extras. This straightforward strategy helps clear up the fog in decision-making, especially when you’re flooded with too many options or drowning in data.

And I can offer another pill if you wish:

Tolerance to Ambiguity

This is simply about being cool with not having all the answers. In a world that's always changing, especially with tech and digital interactions, being able to handle uncertainty without freaking out is crucial. In these fields, you're always on the edge of the unknown. One day, your models are the superhero; the next, they might be the sidekick because something new has taken the spotlight.

At its core, tolerance to ambiguity involves a comfort level with uncertainty and the unknown. Psychologically, this means that some people are better equipped to handle incomplete information without experiencing significant stress. This is crucial because stress can cloud judgment and lead to poor decision-making. From a Decision Theory perspective, when faced with ambiguity, individuals with high tolerance are more likely to evaluate their options rationally and remain open to multiple outcomes, rather than rushing to premature conclusions or sticking rigidly to the familiar.

On a cognitive level, those with a high tolerance for ambiguity can process information more holistically. Emotionally, these individuals are better at managing the anxiety that often accompanies uncertainty. Be one, even if you are not. Emotional regulation supports sustained engagement and curiosity, even in the face of complex and challenging problems.

If you have endured and read up to here - you are my hero, thank you!

I sincerely hope that you will continue to follow Data Science Letters where I will keep discussing interesting ideas, concepts, and events in the future.