AI lies! Reasons and solutions.
Dru HERO (Asdrúbal Hernández-Romero)
AI Consultant + Scientist / Human Evolution engineer / People Optimizer / Modeling a Better type of AI and remodeling Human Cognition. Philantropist. Bringing Forbes-Nash values into AI and neurosciences.
"About the so-called 'drift' (divergence of results or 'variability') in the jargon of LLMs.
Reasons, detailed explanations. Solutions."
Opening statement:
CLARIFICATION: LLMs exhibit 'drift,' but 'expert systems' (which are another type of artificial intelligence) do not.
Preface:
DEFINITION: 'Drift' refers to the phenomenon where an AI provides two different responses to the same request. (Two 'OUTPUTS' for an IDENTICAL REQUEST).
Dissertation, explanations, and solutions:
Some technicians argue that the 'drift' (the 'divergence') is embedded in the system. I would go deeper and beyond, it is not just 'embedded' but rather 'inherent' because it is part of the design. There are several technical reasons for this to occur. Addressing its various origins could lead to a better design for LLMs.
Level 3:
Firstly, the very nature of a 'large system', composed of billions of parameters, which can create different combinations with the same building blocks (tokens), produces an immense variety of 'different appearances' that are 'structurally identical' in value (in regards to machine calculations).
Just as 2/4 has the same mathematical value as 1/2. For an LLM system, using either of the two would still yield a 'correct token structure,' even if they don't have exactly the same meaning; it is being precise enough, akin to mentioning either of two different universities in a biography of someone of us. The LLM does not have definitions at that level of sentence construction. 'Harvard' or 'Princeton' are just 'names,' subsidiary tokens to the main token within that paragraph of the biography. NOTE: It would be different if the main token were 'the university,' that is, in the case of writing the descriptive profile of said university.
Note: In contrast to '2/4' and '1/2', the number '0.5' has the same mathematical value but has a different structure from the first two. For an LLM, '0.5' is a different type of token. If combined with a mathematical engine (like the Wolfram Language in wolfram.com), they would have an 'equivalent value,' consequently all three would have the same value as an "appropiate tokenized answer" (they would be equally valid for constructing the sentence).
Level 2:
Secondly, the very learning process of the LLM operates with a technique called 'genetic mutation algorithm,' which means that for any 'solution' it finds, it changes it a little (hence the concept of 'genetic mutation') and then compares both to determine which better satisfies the 'expected outcome.' In fact, there can be more than one, actually many solutions that meet the criterion, and all of them will become part of that 'neural network' after being trained. Later, when used, the selection between alternatives to choose 'which to present' will depend on any minimal difference (another token) added to this same 'combined set of tokens' from the previous solution. In this way, many paths coexist within the LLM neural network, and the selection of the prevalent path will depend on its relative position within the 'assembly chain' with other tokens, both prior and subsequent.
Example:
Prompt: Give me a biography of my black cat in two sentences.
GPT, response 1: The cat was black. It passed away a few years ago.
GPT, response 2: The cat you had passed away a few years ago. It was black.
Both responses perfectly satisfy the 'expected outcome' and coexist within the neural network. However, the first response does not explicitly state that you owned the cat. I assume you didn't notice that small difference. Most likely, neither would an LLM.
Let's add a bit of 'ambiguity' to the request.
Prompt: Give me a biography of my black cat, which I named after my favorite character.
GPT, response 1: You had a black cat named 'Soup' in reference to Superman. It was black and passed away a few years ago.
GPT, response 2: You had a black cat named 'Bat' to remind you of your favorite character, Batman. Sadly, it passed away a few years ago.
But this can get an even more curve slide!
GPT, response 3: [...] named 'Bat' to remind you of your favorite character, 'Batter.' [...]
All of these responses are correct from the LLM's perspective. There is no specificity regarding which of your favorite characters it can use. The generation process needs to be flexible and malleable to accept the infinity of prompts it may receive and to digest the multitude of data it has 'read,' synthesizing and fitting them within its billions of parameters. Because of this, variability, the 'drift,' inherently 'comes from factory'.
Level 1:
Thirdly, there exists a 'clock' within every computing system. This clock governs the 'random number' that is used as a 'seed' for the 'genetic mutation algorithm.' Because of this, an identical prompt can yield different results in an LLM as it has been activated and operated at different "instances of time" (and the clock and random number are different). This can occur even if simultaneous prompts are made from two different computers since any process (and the use of memory) is a singular physical reality and it is nearly impossible to synchronize the operation of two within this particular vast system.
领英推荐
The very fact that it is a generative process involving the introduction of a 'random signal' to produce an output already imparts variability in its internal calculation processes when the 'output response' is being assembled.
But there is an even more significant reason!
Level 5:
Fifth reason. Human abstract thought. A GPT-LLM is created to emulate the 'verbal logic' of human language. In this 'verbal reasoning' itself and by the way we use it, a single meaning can be verbalized in two or more ways.
Consequently, the fact that two or more 'output sequences' are 'equivalent' gives us this possibility of variability in results. Combining this fifth reason with the first, the 'clock' within computers, we already have an LLM system that will irreversibly deliver potentially variable outcomes.
Examples:
They are structurally different but convey the same meaning. Different outcomes can always be obtained, even if the 'temperature/variability/creativity' is set to 0 (zero).
But there is yet another technical reason for this 'variability'!
Level 4:
There is no structured database within an LLM. Consequently, there are no 'definitive definitions' of data. As any token can be requested and used by many different branches within the neural network, it can then be combined with many other tokens in countless different structures. Any composite token, being the sum of various other tokens, can itself manifest in many different forms with varying degrees of detail. The number of tokens used to construct the response is a factor (among others) that determines these different structures. For example:
"I am a very intelligent man. I am also an engineer."
With fewer tokens:
"I am an intelligent engineer, also male."
There are no 'read facts' within an LLM, only 'read statements'. They can be falsehoods. In fact, many of them are! The internet is flooded of false data, falacies and biased opinions! And all that got into the LLM!!
Solutions
Just as Elon Musk proposed the construction of a 'Truthful GPT' (and someone hastened to purchase the domain name, of which I took a screenshot for this article), similarly, I am advocating for a revolutionary complete redesign of the LLM-type structure.
We would leverage what we have already learned to guide the development as we construct a 'truly convolutional neural network' that genuinely mimics the neural networks that exist in the brain.
These two key features can be incorporated into the design. They are already conceptualized and visualized; the systems and training methods for this new type of neural network are even structured. I am in the process of finding the right incubator for this invaluable project within the AI industry.
If you are interested in trully reliable AI, then be among the firsts to get to know when it is released. (Alpha is already in "demo mode" to private acquaintances).
I will soon be posting the project on my website NeuraLevels.com. (You can check it out right now!).
Postscript (an educative challenge for broader understanding):
I wonder if you can construct in your mind the structure from which I extracted these 5 reasons... and why I enumerated them exactly as I did. If you take a moment to think (and you know a bit about computing), you will find the reason behind that order, and it will give you a deeper understanding of how LLMs work. Hint: The levels are analogous to the concept of 'high-level languages' versus 'low-level languages' in computing.