LLMs like ChatGPT: How They Work?, Why Results Vary?, & Why Errors Occur?
Anuj Mubayi
Distinguished IBA Fellow, PhD, Expertise in data-driven strategies utilizing mathematical and statistical modeling in HEOR
Large Language Models (LLMs)
Have you ever wondered how ChatGPT works behind the scenes?
The answers to these questions lies in understanding how LLMs work in a simple way.
LLMs, like GPT-4, are complex AI systems or advanced computer programs that process and generate text based on patterns learned from vast amounts of historical data. They don’t understand the text like we human do (don’t truly “understand” the meaning of the text they process), but they learn patterns, like which words tend to follow others, or how sentences are usually structured. For example, if you ask a question, the model doesn’t “know” the answer; it predicts what a likely response would be based on similar questions and answers it has seen during training.
LLMs rely on some important mathematical concepts such as
Tokens (which are words, subwords, or characters): For example, words or group of words.
领英推荐
Additional concepts that LLMs use include random variables, law of large numbers, the central limit theorem, expectation, variance, stochastic processes, attention mechanism (use it to weigh the importance of different tokens in the sequence when predicting the next token), transformer architecture (which processes tokens in parallel and uses attention mechanisms to capture dependencies between tokens by converting raw scores for tokens into probabilities).
LLMs generate responses by predicting text based on patterns from vast amounts of data. They can give different results for the same prompt due to their probabilistic nature, where each response is a guess among many possibilities. Errors occur because LLMs don’t truly understand content; they only mimic patterns, sometimes leading to incorrect or fabricated information. Hence, they might sometimes produce information that isn’t accurate or even real. This happens because the model is guessing what the next word or sentence should be, based on patterns, rather than recalling actual facts.
This is known as "hallucination" in the context of AI.
Summary
Why Different Results? Because the model uses probability, the results can vary each time, especially if multiple words have similar probabilities. The model might choose different words on different occasions, leading to variations in the output for the same prompt.
How It Remembers? LLMs like ChatGPT don't actually "remember" your old prompts in the way a human might remember a conversation. Instead, they use a technique called "contextual memory" to generate responses based on the immediate conversation history. During a session, the model keeps track of recent exchanges, using them to inform its replies. This temporary "memory" allows the model to maintain coherence within the conversation, but once the session ends, it forgets everything. The model's responses are based on patterns it has learned from vast amounts of data, not on any permanent memory of previous interactions.
Why Errors Occur? Errors happen because the model doesn't "understand" the meaning of words; it just follows patterns. If the patterns in the training data are ambiguous or misleading, the model might generate incorrect or nonsensical results.
Applied Philosopher of Science -- Writer -- Entrepreneur (Opinions and Postings are my own views and do not reflect the views of the institutions with which I am affiliated.)
2 个月https://www.dhirubhai.net/feed/update/urn:li:activity:7228536218781659136/
Senior Managing Director
2 个月Anuj Mubayi Very Informative. Thank you for sharing.
PhD in Applicable Mathematics | Seasoned Mathematical Modeller | Research for Policy & Technological Solutions in Health & Society | Strategic Planning & Management Expert | HBTU Kanpur | TIFR-CAM Bangalore | IIMB
2 个月The metaphorical use of ‘Hallucinations’ in context of AI comes from human context. Human brain is capable of Hallucinations of three types: Auditory, Visual and Tactile. People living with Mental Ilness sometimes experience hallucinations, may not be all types. Auditory hallucinations are more common. Just like in AI, noisy and dirty data leads to hallucinations, the social and other environmental factors have psychological impact on human brain where a person can experience ‘hallucinations’ and get mental illness. Sometimes previous generation trauma is transferred to next generation as well, which also leads to MI. Much like in AI it is difficult to handle Hallucinations’ and hence emphasis is given on, to use of good quality data- it is extremely difficult to manage Hallucinations in Humans! So, remove noisy data from life!!
Applied Mathematician | Mathematical Modeler | Economic Modeler | Health Economist | SLR | Market Access
2 个月Great insight. How hallucination is used in AI to define the variability in the answers is really interesting. To be specific in the choice of words to be asked is really important.
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
2 个月The use of hashtags #AI, #Techonology, and #MachineLearningi indicates a focus on the intersection of artificial intelligence, technological advancements, and machine learning algorithms. It suggests a discussion or exploration of how these fields are interconnected and influencing each other. I think it's fascinating how these concepts are constantly evolving, with new breakthroughs happening all the time. Given the probabilistic nature of LLMs, how do you envision incorporating techniques like Bayesian inference to improve their interpretability and explainability?