In-between memory and thought: How to wield Large Language models. Part I.
This is the first in a series of articles geared towards non-technical business leaders. We aim to shed light on some of the inner workings of LLMs and point out a few interesting quirks along the way.
Introduction Robustness in the Context of LLMs
Trust is paramount in the rapidly evolving world of artificial intelligence. The performance and reliability of Large Language Models (LLMs) such as GPT-4 play a pivotal role in fostering trust and subsequent use. As these complex AI systems gain prominence and their adoption begins to seem workable, the need for Robustness and Resilience becomes a pressing issue. This is especially true for organisations wanting to embed AI into their processes and customer interactions. The prospect of using LLMs might seem daunting to businesses who worry about control, auditability, and liability.
Our goal is to make only one point: given the inevitability of their widespread adoption, you need to understand that robustness is vital for the success of trustworthy LLM systems (not to mention, ‘AI’ in general). Whilst it’s still machine learning, the challenges of aligning LLM system behaviours with your intentions are new, nuanced and hard.
Anyway. Stir up a coffee and let's dive in.
Is it Thought? Is it Memory?
There’s an interesting comparison to make if we consider an LLM like GPT-4 against an advanced search engine – both applications are an empty text box.
There’s something fascinating to learn, both from a) a similar trait, and b) a dissimilarity.
a) The similarity.
There is a similarity to be drawn to a search engine: It’s logical to appreciate that if you need to find something, it’s quicker if you only need to search in the relevant areas. Instead of combing through a vast knowledge bank for the most relevant information, it’s helpful if you know where to look.
The fastest way to sort through, index, search and return information is a deeply mathematical field. In essence, your search can return relevant results faster if it identifies where the information is likely to be and prioritises these areas to search first.
(In contrast; consider the last time you used your computer’s file system search, where it alphabetically combs through all files and painstakingly returns irrelevant results in no logical order. Groan.)
Similarly, with an LLM like ChatGPT-4, an approach can be taken called the ‘Mixture of Experts’ (MoE). The MoE approach consists of multiple smaller slices of trained algorithms are only engaged when a prompt (or, in search terms, a ‘query’) is relevant to their expertise. It's akin to a group of specialists, each with a different field of knowledge, working together to solve a problem.
Each expert only contributes when their specific field of knowledge is needed, making the overall system more efficient. A leaked report by George Hotz (spoken about further here) suggests ChatGPT-4 has 8 such ‘Experts’ and perhaps two of these experts are engaged in each query. It’s intuitive to understand, if you’re going to have multiple experts, choosing which experts to engage is a vital step. This processing is performed separately, overlaying the 8 experts.
b) The dissimilarity.
But here's where it gets interesting – their main difference to search architecture. These models don't "think" (thought) or "remember" (knowledge) like we would expect them to. Consider the search paradigm and let’s highlight an example task – researching a topic and writing a summary paragraph.
--> You can see there’s a very clear dividing line that breaks thought apart from knowledge.
An LLM blurs this line.
Here’s out it goes when ChatGPT-4 is given the goal of researching a topic and writing a summary paragraph.
--> Large Language Models perform thought and knowledge tasks in one go.
This means that in a very real way their knowledge doesn't stem from thought or memory. It arguably doesn’t involve any thought or any memory! Clearly, it amounts to something new. For the sake of grappling with this bizarre and incredible phenomenon, let’s call it ‘thought and memory’.
A Continuum of Thought and Memory
LLMs are not identifying and regurgitating relevant pieces of information they've found on the internet; they’re producing coherent answers from the knowledge that was imprinted into, in ChatGPT-4’s case, 8 topically distinct patterns of language with embedded knowledge. It's worth quickly understanding the consequential trade-off that LLMs running a MoE approach need to make in combining these two tasks. There are architecture decisions exemplified by the 8 experts in Chatgpt-4, which lead to a trade-off between thought and memory.
领英推荐
More memory, less thought:
More thought, less memory:
Ryan Moulton, in his article The Many Ways that Digital Minds Can Know, eloquently describes this trade-off (and is the inspiration behind this analogy). He likens this process to "compression." What we’re calling ‘Thought’ he calls ‘Integration’. What we’re calling ‘Memory’, he’s calling ‘Coverage’. Put simply, models with a broader "coverage" of information pull from a more expansive range of facts and examples to generate responses. This requires greater resources.
If something is computationally easier - requiring fewer resources, then it’s also cheaper, too. ChatGPT-4 users were getting suspicious that answers were getting ‘dumber’. We’re not saying this is necessarily true, but you can understand how ?OpenAI might be naturally driven by profitability and engineering trade-offs, optimising towards the cheapest acceptable answer versus the best answer.
This is all to stress one major point: these models don't use every piece of information they've learned; they use patterns in the information they’ve learned, and knowledge of the world is embedded in these patterns.
Breaking the Anthropomorphic Spell
The MoEs approach selectively choose which patterns are best suited to answer a question or solve a problem. The fewer patterns needed, the less computation needed, the quicker, cheaper and more efficient the LLM can be.
Although they're designed to mimic human-like conversation and although they seem to be performing some kind of search-like knowledge function; in fact, they're fundamentally sophisticated word prediction engines, where knowledge has been embedded within their statistical manipulation of human language (this recent study showed an LLM encoded deep medical knowledge).
In psychology, the ‘Anthropomorphic bias’ describes the human tendency to ascribe human-like characteristics were in fact none exist. This is what’s happening when you see a face in the knot of a tree. It is this same bias that makes the experience of LLMs so uncanny, so real, makes it…feel… so believable.
This bizarre research into ‘analogous tokens’ is a really clear demonstration of an LLM vulnerability, such that it breaks the anthropomorphic spell. It demonstrates that all words are not words to an LLM, but that they are converted into tokens.
Patterns are found between tokens, not words. It’s not reading, but it’s searching a neural network of tokens. A hiccup of this approach (amid thousands of other examples in the research) is found when ChatGPT-4 simply doesn’t see the word ‘SmartyHeaderCode’. This shows that LLMs really don’t ‘think’ – let alone think like a human – at all.
Take the below image as a good visual example. It’s a slight tangent due to the multi-modal nature of this model (it interprets vision and text), but it helps underscore this inhuman aspect of models.
·???????? A picture has been merged with a graphical axis.
·???????? The LLM asked to interpret it.
·???????? The idiom ‘crossing wires’ seems appropriate because the LLM begins to talk about a trend ‘over time’, which the picture obviously doesn’t show.
These examples of hyper intelligent machines making incredibly ‘dumb’ errors can happen in the wild, but they can also be intentionally provoked into making these kinds of errors.
Imagine a bad actor taking advantage of a discovered vulnerability by using an equivalent trick to make a sidewalk look like a road to an automated vehicle; or a red light like a green; or a stop sign like a speed limit sign.
Their knowledge doesn't stem from understanding or memory, but form something inhumanly new, something in-between. Understanding this difference when working with them will enable you to best understand their limitations and therefore how to install proper guardrails on their use.
Now that we've understood a bit about how these models work, in our next post we’ll explore how they can be controlled and guided.