LLMs like ChatGPT: How They Work?, Why Results Vary?, & Why Errors Occur?

LLMs like ChatGPT: How They Work?, Why Results Vary?, & Why Errors Occur?

Large Language Models (LLMs)        

Have you ever wondered how ChatGPT works behind the scenes?

  1. Why same prompt gives different answers?
  2. How it remembers my old prompts? and
  3. Why it gives hallucinations?

The answers to these questions lies in understanding how LLMs work in a simple way.

LLMs, like GPT-4, are complex AI systems or advanced computer programs that process and generate text based on patterns learned from vast amounts of historical data. They don’t understand the text like we human do (don’t truly “understand” the meaning of the text they process), but they learn patterns, like which words tend to follow others, or how sentences are usually structured. For example, if you ask a question, the model doesn’t “know” the answer; it predicts what a likely response would be based on similar questions and answers it has seen during training.

  • Predicting Words and Generating Text: LLM tries to predict what word comes next. For example, if you type "The cat sat on the," the model might guess the next word could be "mat" because that’s a common phrase it has seen during its training. Based on these predictions, LLMs can write full sentences, paragraphs, or even entire stories.
  • Understanding Context: LLMs are good at understanding the context of a conversation. If you ask a question like, "What’s the weather like today?" and then ask, "Should I bring an umbrella?" it knows you’re still talking about the weather.
  • Limitations: Although they’re powerful, LLMs don’t actually “understand” the text like a human would. They don’t have real knowledge or opinions; they’re just really good at predicting what a human might say next based on patterns they’ve seen before. Sometimes, they might even make things up because they’re guessing based on patterns rather than facts.

LLMs rely on some important mathematical concepts such as

Tokens (which are words, subwords, or characters): For example, words or group of words.

  1. Probability Distribution (generates a probability distribution over possible next tokens): For example, if the model has seen "The cat sat on the," it will assign probabilities to various possible next words ("mat," "floor," "couch," etc.). The word with the highest probability might be "mat."
  2. Conditional Probability (to determine the likelihood of a token given the preceding context): For example, for the sentence "The cat sat on the mat," the model computes: P("The?cat?sat?on?the?mat") = P("The") × P("cat"|"The") × P("sat"|"The cat") × ? × P("mat"|"The?cat?sat?on?the")
  3. Maximum Likelihood Estimation (MLE): During training, the model adjusts its parameters to increase the probability of the correct sequences (i.e., maximize the likelihood). For example, the model can use gradient descent to optimize the parameters, making the predicted probabilities closer to the actual distribution in the data.
  4. Sampling Tokens: When generating text, the model samples from the probability distribution of possible next tokens. For example, a parameter called "temperature" can adjust the randomness of the sampling process. A lower temperature makes the model more deterministic (favoring high-probability tokens), while a higher temperature increases diversity in the output by allowing lower-probability tokens. This generates variation in answers.
  5. Markov Property: The probability of a token depends on a fixed number of preceding tokens. For example, it approximates the probability of the next token based on recent tokens.
  6. Chain Rule of Probability: The model calculates the joint probability of a sequence by multiplying the conditional probabilities of each token. For example, for the sentence "The cat sat on the mat," the model computes: P("The?cat?sat?on?the?mat") = P("The") × P("cat"|"The") × P("sat"|"The cat") × ? × P("mat"|"The?cat?sat?on?the")
  7. Uncertainty and Entropy: Entropy measures the uncertainty in the model’s predictions. High entropy indicates that the model is uncertain, while low entropy indicates confidence. For example, if the model predicts the next token with near-equal probabilities for several tokens, the entropy is high, indicating uncertainty.


Additional concepts that LLMs use include random variables, law of large numbers, the central limit theorem, expectation, variance, stochastic processes, attention mechanism (use it to weigh the importance of different tokens in the sequence when predicting the next token), transformer architecture (which processes tokens in parallel and uses attention mechanisms to capture dependencies between tokens by converting raw scores for tokens into probabilities).


LLMs generate responses by predicting text based on patterns from vast amounts of data. They can give different results for the same prompt due to their probabilistic nature, where each response is a guess among many possibilities. Errors occur because LLMs don’t truly understand content; they only mimic patterns, sometimes leading to incorrect or fabricated information. Hence, they might sometimes produce information that isn’t accurate or even real. This happens because the model is guessing what the next word or sentence should be, based on patterns, rather than recalling actual facts.

This is known as "hallucination" in the context of AI.


Summary

Why Different Results? Because the model uses probability, the results can vary each time, especially if multiple words have similar probabilities. The model might choose different words on different occasions, leading to variations in the output for the same prompt.

How It Remembers? LLMs like ChatGPT don't actually "remember" your old prompts in the way a human might remember a conversation. Instead, they use a technique called "contextual memory" to generate responses based on the immediate conversation history. During a session, the model keeps track of recent exchanges, using them to inform its replies. This temporary "memory" allows the model to maintain coherence within the conversation, but once the session ends, it forgets everything. The model's responses are based on patterns it has learned from vast amounts of data, not on any permanent memory of previous interactions.

Why Errors Occur? Errors happen because the model doesn't "understand" the meaning of words; it just follows patterns. If the patterns in the training data are ambiguous or misleading, the model might generate incorrect or nonsensical results.


Michael Lissack

Applied Philosopher of Science -- Writer -- Entrepreneur (Opinions and Postings are my own views and do not reflect the views of the institutions with which I am affiliated.)

2 个月
Woodley B. Preucil, CFA

Senior Managing Director

2 个月

Anuj Mubayi Very Informative. Thank you for sharing.

Surabhi Pandey

PhD in Applicable Mathematics | Seasoned Mathematical Modeller | Research for Policy & Technological Solutions in Health & Society | Strategic Planning & Management Expert | HBTU Kanpur | TIFR-CAM Bangalore | IIMB

2 个月

The metaphorical use of ‘Hallucinations’ in context of AI comes from human context. Human brain is capable of Hallucinations of three types: Auditory, Visual and Tactile. People living with Mental Ilness sometimes experience hallucinations, may not be all types. Auditory hallucinations are more common. Just like in AI, noisy and dirty data leads to hallucinations, the social and other environmental factors have psychological impact on human brain where a person can experience ‘hallucinations’ and get mental illness. Sometimes previous generation trauma is transferred to next generation as well, which also leads to MI. Much like in AI it is difficult to handle Hallucinations’ and hence emphasis is given on, to use of good quality data- it is extremely difficult to manage Hallucinations in Humans! So, remove noisy data from life!!

Dr. Sudipa Chauhan

Applied Mathematician | Mathematical Modeler | Economic Modeler | Health Economist | SLR | Market Access

2 个月

Great insight. How hallucination is used in AI to define the variability in the answers is really interesting. To be specific in the choice of words to be asked is really important.

Godwin Josh

Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer

2 个月

The use of hashtags #AI, #Techonology, and #MachineLearningi indicates a focus on the intersection of artificial intelligence, technological advancements, and machine learning algorithms. It suggests a discussion or exploration of how these fields are interconnected and influencing each other. I think it's fascinating how these concepts are constantly evolving, with new breakthroughs happening all the time. Given the probabilistic nature of LLMs, how do you envision incorporating techniques like Bayesian inference to improve their interpretability and explainability?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了