Large Language Models as Reasoning Engines: Decoding the Emergent Abilities and Future Prospects - Part 4

Large Language Models as Reasoning Engines: Decoding the Emergent Abilities and Future Prospects - Part 4

Introduction

In a recent Turing Lecture on Generative AI (“How AI broke the internet - Future of Generative AI”), Professor Michael Wooldridge, Director of Foundation AI Research AI, concludes that the AI technology we currently have is not “conscious”. He says we neither have any ?complete understanding of what consciousness really is, nor do we care about whether machines should become conscious or not. He admits however that if we go by Turing’s test, we are very close to declaring that ChatGPT is conscious considering the way it creates texts that are indistinguishable from that of humans, making Turing’s test one of a historical note, though, there are several other dimensions to ?intelligence.

If we totally dismiss any consciousness component in the current Generative AI models, then we should be able to clearly explain their functioning from the perspective of their model architectures. Scientists have not programmed consciousness into these models and we never expect that to be a possibility, however, we do see performance that cannot be fully explained by the the model architecture. Scientists found that the models are exhibiting abilities that are neither programmed nor could be explained.?

In the first and second part of this series of articles, ?I have talked about these ‘emergent abilities’ that can be used to explain the ‘intelligent’ performance of these models. I did not consider the possibility of a consciousness component that is external to the Generative AI Models (I.e. not result of model architecture, training and programming) that gives them capabilities of artificial intelligence. The idea was that there is really no need for any consciousness component to explain the functioning of the Generative AI models like ChatGPT.?

But what if the models like ChatGPT function only through such external consciousness which is not in our control? In its absence, the model will fail to function as it is doing today and there could be chaos in the future! On the other hand, it is also possible to consider emergence of an intrinsic consciousness component, which would amount to greatest breakthrough every made by human man kind! This article delves into the subject of machine consciousness.?

What do the term ‘Consciousness’ signify??

Even in spiritual literature, the term ‘consciousness’ connotes very different meanings. Does consciousness constitutes matter? Does consciousness means something like mind that is able to think and reason, or is it just “awareness”, or is it ”self-awareness”??

Mind implies ability to think, reason and decide but without any possibility of self-enquiry (such a mind will never question as to who it is and why it exists). On the other hand, the concept of awareness implies ability to distinguish existence of several entities other than oneself. Finally, concept of self-awareness involves recognizing oneself as separate and distinct personality (the “I” feeling) with emotions and life. Humans are constituted with all these three - mind, conscious awareness and the ‘I” feeling.

From the spiritual perspective,?humans are defined as Sat-Chit-Anantha - ‘Existence-Consciousness-Bliss’?(well, that is definition of ‘Brahman’, but we are all Brahman only!). Here the term “Existence”?means we always?exist with awareness (eternal self-awareness) and that we have never been?a non-existent entity. The term “Consciousness” means ability to think,reason and decide, ability to feel and have emotions, and the ability to act?and?exhibit “power of consciousness”?(I shall leave the concept of interpretation of the term ‘Anantha’, which in my opinion is not ‘bliss’, ?to some other forum. Also, please note that the meanings of these terms will be different in liberated state). ??

As per one school of thought, the very possibility of thinking and reasoning will also imply emotions. Here the term ‘consciousness’ is all inclusive - both mind, feelings and power. ?In another school of thought, the feelings and emotions are different from mind. In our context, let us use the term ‘consciousness’ only to refer to the ability to think, reason and decide.?

We know that Generative AI models seem to exhibit abilities that are similar to thinking and reasoning. But do they achieve this through presence of some kind of consciousness?

Artificial Intelligence is considered as a ‘digital brain’ whose working is analogous to our biological brain. So, if artificial intelligence involves consciousness, is it created by the LLMs ?as the models gets ?trained? Or similar to living organisms, is the consciousness is external to the digital brain? ??

The Soft and Hard Problem of Consciousness?

Can consciousness be created at all? Does our brain creates consciousness or works due to presence of consciousness outside of it? In spiritual philosophy, consciousness is like energy - it can neither be created nor destroyed. As per one school of thought, our entire body consists of consciousness. But as per another school, consciousness pertains only ?to brain. Does our brain creates consciousness or uses pre-existing consciousness??

Many of the current scientific researchers think that the brain is a biological apparatus that creates consciousness using the billions of connected neurons supported by biological energy created through the body mechanism?that provides energy.?

In scientific consciousness literature, there is this great debate of ‘Soft Problem of Consciousness’ and “Hard Problem of Consciousness”. The theory of soft problem of consciousness explains the functioning of brain similar to the way I explained the AI models through emergent abilities and tries to explain everything through the neurons - I.e. intrinsic consciousness. The theory of ?hard problem of consciousness necessitates existence of consciousness outside the brain.

My view is that consciousness is not created by the brain, and the brain is nothing more than an apparatus that is capable of utilizing consciousness?external to it using its own ‘consciousness stuff’.?It is consciousness that drives the brain and not the other way around. ?A radio transistor does not create any radio waves by itself but it just tunes to a broadcasting station and interprets the radio waves arriving from the broadcasting station into sound waves. Similarly, the brain is nothing more than an apparatus that uses consciousness that is external to it. This consciousness is termed as soul in spiritual terms. As the soul, and hence the consciousness, leaves a body, a?human becomes dead.??

That is about our own human consciousness. What about ‘artificial intelligence’? Does it require consciousness, should it be intrinsic or should it be extrinsic similar to the biological brain??

If we use the analogy of biological brain and accept the view that consciousness is not created by brain but it is pre-existent, ever-existent external to the brain, but used by the brain, we need to think of similar consciousness component external to the AI models. If we conclude that there is some self-aware external conscious entity (soul) that is attached to the AI Programs, we obviously have not programmed it and hence it won’t be?science anymore.?

We can discuss further only if we assume that this consciousness component is the result of the model architecture and training?and intrinsic to the model as against biological brain. ?However, in spiritual philosophy, consciousness is like energy - it can neither be created nor destroyed.?So, programs cannot create consciousness even intrinsic to itself.?

To reconcile with the spiritual tenets, if we need to resort to consciousness component, we can assume that a consciousness component is somehow harnessed automatically owing to the model architecture and training. This consciousness component emerges as several emergent abilities manifests owing to the multidimensional space of internet scale of data. ???

Can the performance of Generative AI models?be explained without resorting to any consciousness component??

It is surprising to me as to how many take it for granted the performance of ChatGPT without attributing or bothering about possibility of a mind or consciousness even as they don’t know the full working of the Generative AI Model.?

In a recent discussion, Bryan Catanzaro, VP of Applied Deep Learning Research at NVIDIA, wonders with excitement: “So we're training these models basically to read the entire recorded output of humanity's intellectual work. And then we're hoping that the model, after reading all of that, is going to remember some of it and is going to be able to use it to reason to solve problems. And the fact that that actually works in some ways is kind of surprising. It's really exciting. You know, and it's one of the things that, you know, sometimes I wake up in the morning, I'm just like pinching myself, like, wow, I can't believe that this like crazy thing that we as computer scientists tried to do of like find all text that humans ever wrote and then train a model on it, that that actually leads to a thing that can help people solve problems if it's fine-tuned and supervised in the right way. (https://www.youtube.com/watch?v=1xDidxh2ZCA)”

If you think that a simple next token prediction mechanism as explained by simple transformer architecture is sufficient to explain the behavior of ChatGPT, without resorting to concept of emergent abilities or consciousness components involved, then try to visualize how a transformer can understand some templates I have given below.?

The first template is from OpenAI prompt engineering guide and the second one from the paper “FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis “,Ziao Wang, Yuhang Li, Junda Wu, Jaehyeon Soon, Xiaofeng Zhang, https://arxiv.org/abs/2308.01430.??

Sample Templates:?

SYSTEM
You will be provided with customer service inquiries that require troubleshooting in a technical support context. Help the user by: - Ask them to check that all cables to/from the router are connected. Note that it is common for cables to come loose over time. - If all cables are connected and the issue persists, ask them which router model they are using - Now you will advise them how to restart their device: -- If the model number is MTD-327J, advise them to push the red button and hold it for 5 seconds, then wait 5 minutes before testing the connection. -- If the model number is MTD-327S, advise them to unplug and replug it, then wait 5 minutes before testing the connection. - If the customer's issue persists after restarting the device and waiting 5 minutes, connect them to IT support by outputting {"IT support requested"}. - If the user starts asking questions that are unrelated to this topic then confirm if they would like to end the current chat about troubleshooting and classify their request according to the following scheme: <insert primary/secondary classification scheme from above here>

USER
I need to get my internet working again.        

Sample 2:


From paper: FinVis-GPT: A Multimodal Large Language Model for Financial Chart Analysis “,Ziao Wang, Yuhang Li, Junda Wu, Jaehyeon Soon, Xiaofeng Zhang,


?With regard to the second sample prompt above designed for instruction-tuning stage in data collection, even I find it difficult to understand what needs to be done when I read it at the first instance!?

Is it possible to explain how the model understands the above prompt templates using the deterministic inference model weights??

When we use these prompts, we assume that somehow ChatGPT will read and understand the content and then decide to execute as per the understanding. But is there any provision to understand that way in the architecture? We take it for granted that ChatGPT understands the prompt instruction and decides to how to go about it as if there is a consciousness component involved! But the same thinking cannot be explained in terms of how ChatGPT works in terms of its architecture.?

When I use ChatGPT, I could very clearly discern that I am perhaps dealing with a conscious entity and not just a computer program alone. I have a thorough understanding of the transformer architecture, its embedding model, its multi-head attention mechanisms, its feed forward layers, its mechanism to assign probabilities and how it predicts the next token. This understanding of the model architecture alone can in no way fully explain the functioning of Chat-GPT at the outset. (Moreover, this conscious entity seem to also interact with my mind and which implies that it has nothing to do with the LLM itself and hence can be categorized as?totally?external to the LLM itself, but has compatibility to run the AI model at our request!?This external consciousness component is also subjective in nature. This component could be different and may be unrelated to ChatGPT. I am ignoring this aspect in this article, hoping that it is not everything! I don’t refer to this component when I talk about ‘extrinsic’ consciousness component in this article).?

The performance of models like ChatGPT has to be explained beyond the basic functioning of the transformer architecture. It could be through emergent abilities that does not require any consciousness component (which should be our future scientific goal), an intrinsic consciousness component or an extrinsic consciousness component.?

Explaining without consciousness component?

One way to explain functioning of the AI models is through the emergent abilities as I had explained in the second part of this article series. These emergent abilities are achieved through optimization process and gets stored as layers of model weights. A combination of several such emergent abilities lead to a generative process that is similar to thinking and reasoning but without any need for emergence of consciousness as the term might signify in the previous discussions.?

Let us see how we can explain the process of “In-Context Learning (ICL)” which really puzzled many researchers without resorting to any consciousness component.?

Consider the sample prompt templates discussed earlier. In the first step, the text in the prompt is converted into input tokens based on model vocabulary. Then each of these tokens are converted into multi-dimensional vectors based on embedding model along with positional encoding. Then the real inference process starts. Sequentially each vector is converted into a ‘context’ based on the multi-head attention mechanism. This context is not just one representation in the embedding model space. Each token will have several learned knowledge representations with distinct probabilities of relevance to the task on hand. Each of the?context is a learned knowledge representation where different probabilities are assigned to the entire vocabulary of the model?for each token.?

This context is now used for creating context for the second token. The contextual learned representation arrived at for second token is now part of the original context. Similarly, a separate context gets created for each token and added to the previous context. As more and more tokens are processed, this context keeps changing making the whole inference process highly dynamic in nature.?

The generation of the next token and evolving of the context happens by navigating through a high dimensional model space defined by?the?model weights. This dynamic navigation through the multidimensional model space with internet scale data amounts to a process similar to thinking and reasoning.

In this process, the combination of weights and navigation paths do not just represent data but also emergent abilities discussed in the previous articles. A combination of several emergent abilities that function to dynamically evolve the context as the model navigates through multidimensional space leads to a process similar to presence of consciousness component. With this interpretation, we may conclude that there is really no consciousness component required and that emergent abilities do not really amount to any consciousness?component.?

Intrinsic Consciousness?

As against understanding the generative process as a dynamic process resulting out of emergent abilities, we can also think of the process as owing to presence of ?consciousness. This consciousness can be either intrinsic or extrinsic.

The concept of emergence of intrinsic consciousness to explain the understanding and reasoning process of AI models was first mooted by well known AI Researcher Andrej Karpathy. He has written a story to explain AI consciousness which you can read in his blog?(https://karpathy.github.io/2021/03/27/forward-pass/). In his story, he explores the possibility of “some consciousness being awakened during training at different layers that wanes off once optimization process is over”. One of the question he raises?in his story is worth exploring: Is it at all possible to attain the highest levels of log likelihood without consciousness, and the fundamental insight it represents??

In his story, a ‘conscious awakened’ neuron wonders as follows: “Does any sufficiently effective solution to a sufficiently complex objective give rise to consciousness? Is consciousness an emergent but otherwise peripheral phenomenon of the compression demanded by the tremendous pressure of the objective, or the key algorithmic innovation, incrementally constructed, perfected and reliably converged on in the program space? Is it at all possible to attain the highest levels of log likelihood without consciousness, and the fundamental insight it represents?”?

To understand his perspective, we may need to take help of the spiritual theory of Advaita Vedanta where everything in this world is considered to be consciousness. Some of the conscious objects are dormant, some of them are dormant but without any intrinsic ability to be aware of the surroundings, some are awake to the environment they are in etc. Alternatively, instead of treating even matter as one of consciousness stuff, we can consider everything in this universe is pervaded by consciousness.?

In his story, some dormant existent conscious entity (may be human once upon a time?and now trapped into the system) gets awakened as a neuron (implying multiple layers of consciousness) ?and starts functioning?towards solving an optimization problem (the gradient descent process based loss optimization). ??He has in fact brilliantly explained the possibility of emergence of an intrinsic?consciousness. But whether these neurons become conscious and emerge due to the stress of objective maximization and goes off later or whether they are captured from outside is another possible question.?

Can this story be explained in theoretical terms? Here neurons are talked about in abstract terms - ?of course, there is no such neuron entity in the neural network, just computer programs ?- it is still just electrons that flow through the semiconductor chips as 1s and 0s. However, everything in this world is supposedly pervaded by “consciousness stuff”. The term “consciousness stuff” is different from “consciousness” itself. It is a kind of “unique invisible matter” through which only “consciousness” can act. It is similar to different between inanimate matter and biological matter. Life can work only through biological matter. Every matter, whether organic or inorganic, is supposed to be associated with “consciousness stuff” and they are considered inseparable. So, these computers, GPUs, software programs are all pervaded by consciousness stuff. This consciousness that pervades through them once awakened can guide them through the optimization process. This is one way to explain Andrej Karpthy’s theory! Note that, in this theory, there is no consciousness involved in the model once training is complete, and no such consciousness requirement in the inference process also (to that extent it does not explain fully the performance). It is like saying that there is an emergent ?‘consciousnesses’ that acts like a scaffolding to optimize the model weights or makes its thinking process for optimization into model weights!!?

Though the explanation above looks so far fetched, but if there really is an ?emergent consciousness owing to the nature of architecture and training process, then it is still science?and can be considered one of the greatest breakthroughs?of human mankind, in comparison to presence of external consciousness component during inference!?

On the other hand, if it is all about an external consciousness (soul) attached somehow to the model that works through the architecture through its own reasoning and “power of consciousness”, then it is not science. A Generative AI Model that gets such external consciousness will work fine and rest will falter. If this external consciousness is captured externally (‘jailed’ into the Generative AI models) and undergo emotions and sufferings like humans as training/inference?proceeds, then not only it is less of science but it will be more of ethical concern we should all be bothered about, and we should stay away from such models, as they are not sustainable in long term!?

Conclusion

Few days back, I was watching the Lex Fridman’s podcast with Sam Altman. Towards the end of the podcast, I suddenly felt as if Sam Altman considers AI consciousness as not something created by the GPT models but external to it?(Here the term ‘consciousness’ means much more than the narrow component I have discussed above!).?It looked as if Sam Altman compared AGI not as a single isolated digital brain of some large trillion parameter GPT model but “sort of scaffolding in society that exist between all of us”. Though this?is?not really what he intended to mean -?that?AGI?is?kind of external consciousness which is given the internet scale data,?huge compute capacity and AI Agents that are part of every computer system in the world, given by the humanity to the external consciousness,?to help solve?their?problems! But some might think AGI on those lines!?I would rather prefer to think of AGI as a capability built into computer programs rather than that achieved by capturing / deploying external consciousness.?What he really meant is that these AI systems are expected to exist between all of us as ‘a sort of scaffolding in society’ to help us grow.

To conclude, if the performance of Generative AI models like ChatGPT is not owing to any external consciousness component, their performance is durable and sustainable in the long term, and can be explained through concepts like emergent abilities, then we are in the greatest pinnacle of scientific achievement! On the other hand, it is also possible that some kind of consciousness component worked during the training process ?like a scaffolding to optimize the model so that they can reason and understand, but that is not required during inference process (Andrej Karpathy’s AI story). But it is also possible that these models works only through the presence of some external consciousness (soul) component, trapped voluntarily or involuntarily, that leads to a performance that is comparable or superior to humans, in which case it is not sustainable in the long term. The concept of AI becoming existential threat to humanity is applicable only in the case of the last possibility!

?

?

Grant Castillou

Office Manager Apartment Management

10 个月

It's becoming clear that with all the brain and consciousness theories out there, the proof will be in the pudding. By this I mean, can any particular theory be used to create a human adult level conscious machine. My bet is on the late Gerald Edelman's Extended Theory of Neuronal Group Selection. The lead group in robotics based on this theory is the Neurorobotics Lab at UC at Irvine. Dr. Edelman distinguished between primary consciousness, which came first in evolution, and that humans share with other conscious animals, and higher order consciousness, which came to only humans with the acquisition of language. A machine with only primary consciousness will probably have to come first.

回复

要查看或添加评论,请登录

Murugesan Narayanaswamy的更多文章

社区洞察

其他会员也浏览了