In Search of Consciousness: Assessing the Limits of Language Models
Dr. Dibyendu Das, PhD
Data & AI , Deloitte | Computer Vision| Generative AI | Deep Learning | Data Science
“Our greatest human adventure is the evolution of consciousness. We are in this life to enlarge the soul, liberate the spirit, and light up the brain.” — Tom Robbins
Generative AI, a subset of artificial intelligence, focuses on creating models capable of generating new content, such as images, text, and audio. One of the most prominent examples of generative AI is Large Language Models (LLMs). LLMs, such as OpenAI’s GPT series, Google’s Gemini, have gained significant attention for their ability to generate coherent and contextually relevant text based on a given prompt. These models are trained on massive datasets containing vast amounts of text from books, articles, and other sources, allowing them to learn the intricacies of human language and produce remarkably human-like responses. LLMs operate by employing a transformer architecture, which enables them to understand and generate text in a highly contextualized manner.
While some may push for the classification of Large Language Models (LLMs) as Artificial General Intelligence (AGI), it’s important to highlight the clear differences between the two. LLMs fundamentally lack the defining traits of AGI, such as autonomous adaptation to dynamically changing environments and the capacity for self-directed learning akin to human cognition.
Few days back Yann LeCun argued that Autoregressive Large Language Models (LLMs) cannot be considered a sufficient pathway to achieving Artificial General Intelligence (AGI). This is primarily because LLMs lack crucial capabilities essential for intelligent beings, such as the ability to comprehend and reason about the physical world.
The term AGI was re-introduced and popularized by Shane Legg and Ben Goertzel around 2002. Ray Kurzweil (Kurzweil, 2005) coined the term “narrow AI” to describe systems capable of executing specific intelligent tasks within defined contexts. Unlike humans, narrow AI systems require human intervention or reconfiguration to adapt to changes in context or task specifications. In contrast, Artificial General Intelligence (AGI) represents systems with broad generalization capabilities, allowing them to self-adapt and transfer knowledge across diverse goals and contexts.
· General intelligence encompasses achieving diverse goals and tasks across various contexts and environments.
So far, we have gained a thorough understanding of how Large Language Models (LLMs) operate and the diverse range of applications they serve. Let’s investigate the limitation of LLM in terms of achieving General Intelligence (AGI)
· LLMs are typically trained on specific datasets and tasks, resulting in limited knowledge and capabilities outside their trained domain.
If we draw a Maslow like hierarchy of an AI realm, it could look like below structure.
Generative AI models, such as Large Language Models (LLMs)/Multimodal Models, excel at tasks like pattern recognition and generation, where they demonstrate remarkable proficiency in understanding and generating text, images, and other forms of data. However, their capabilities are limited when it comes to reasoning and understanding complex causal relationships within their working environment. While they can recognize patterns and correlations in data, they often struggle to draw causal boundaries and infer cause-and-effect relationships accurately.
On the other hand, General Intelligence refers to the ability of an AI system to exhibit a broad range of cognitive abilities, including not only pattern recognition but also reasoning, problem-solving, planning, and understanding causal relationships. A truly general intelligent system would be capable of understanding the underlying causes of observed phenomena, making predictions about future events based on this understanding, and adapting its behavior accordingly.
Sentient AI extends beyond conventional artificial intelligence by endowing AI systems with consciousness, self-awareness, and subjective experiences akin to human cognition. This would involve endowing AI systems with the ability to perceive their own existence, experience emotions, and possess a sense of self-awareness about their environment and their own actions. Achieving sentient AI would require advances not only in computational capability but also in our understanding of consciousness and subjective experience from a scientific and philosophical standpoint.
The AI community has been engaged in extensive debates concerning the potential transition of Generative AI towards achieving general intelligence. Before delving the complex architecture of Large Language Models (LLMs) and strategies for achieving the general intelligence, it’s imperative to examine the evolutionary trajectory of human and animal cognition towards general intelligence.
In biological evolution, general intelligence emerges as a result of complex neural structures and cognitive processes that enable organisms to adapt and thrive in diverse environments. This adaptation involves the development of sophisticated sensory systems for perception, memory mechanisms for storing and retrieving information, and cognitive abilities for reasoning, problem-solving, and decision-making.
Life and delicate dance of order & disorder
Life, a phenomenon as enigmatic as it is ubiquitous, is characterized by a remarkable ability to maintain ordered structures and processes in the face of the inexorable march towards disorder dictated by entropy. Erwin Schr?dinger, a renowned physicist and one of the founding figures of quantum mechanics, delved into the question of life and entropy in his influential book “What is Life?” published in 1944. In this book, Schr?dinger explored the idea that living organisms can maintain their highly ordered structures and processes despite the tendency of the second law of thermodynamics to increase entropy in the universe.
Schr?dinger proposed that living organisms achieve this feat by temporarily decreasing their own entropy while increasing the entropy of their surroundings, thus obeying the overall increase in entropy required by the second law of thermodynamics. One of Schr?dinger’s key insights was the idea of “negative entropy” or “negentropy,” which he proposed as a measure of the degree of order or organization within a system. According to Schr?dinger, living organisms can temporarily reduce their own entropy by importing negentropy from their environment in the form of free energy. This process allows living organisms to sustain their ordered structures and processes, maintain homeostasis, and carry out the functions necessary for life.
In living organisms, free energy is derived from external sources, such as sunlight or chemical energy from food, and is utilized to drive the synthesis of complex molecules, maintain cellular structures, and power various physiological processes. Despite the localized order that living organisms exhibit, the overall trend is towards an increase in entropy. This paradox arises because the processes that sustain life and maintain order inevitably generate entropy as a by-product . Metabolic reactions within cells produce heat and by-products as waste that contribute to the overall increase in entropy, reflecting the unavoidable trade-off between order and entropy inherent in the functioning of living systems.
In the digital realm, data serves as the equivalent of free energy, fueling the operation and evolution of AI algorithms and networks. Just as living organisms import negentropy from their environment to maintain order and functionality, AI systems ingest vast quantities of data from their surroundings to optimize performance and adapt to changing conditions. This influx of data enables AI systems to synthesize complex patterns, make predictions, and execute tasks with increasing efficiency and sophistication. By harnessing free energy in the form of data and implementing mechanisms for adaptation and optimization, AI systems exemplify the digital manifestation of life’s enduring quest for order amidst the entropic currents of the universe.
In living organisms, homeostasis the maintenance of internal stability amidst external changes is primarily driven by causal relationships between stimuli and responses. This cause-and-effect mechanism ensures that organisms adapt to maintain order and functionality. Conversely, in artificial intelligence (AI) systems, homeostasis is predominantly driven by correlational statistics and do not explicitly capture the underlying dynamics, or causal hierarchies . These systems rely on analyzing vast amounts of data to identify patterns and make predictions, without necessarily understanding the underlying causal mechanisms.
If we are taking an example of foraging behavior of an animal
· Animals perceive environmental cues such as the presence of food sources, predators, and obstacles through sensory inputs such as vision, smell, and touch. These cues serve as causal stimuli that influence the animal’s behavior.
This causality-driven approach to homeostasis has facilitated the emergence of two prominent characteristics in animals: common sense and self-awareness. Common sense can be conceptualized as a repository of world models within an agent’s cognitive architecture, enabling it to discern likelihood, plausibility, and impossibility in various situations. These world models serve as predictive frameworks, allowing animals to rapidly acquire new skills with minimal trial and error.
Whereas self-awareness involves the ability to recognize oneself as an individual entity distinct from others, and to understand one’s own thoughts, feelings, and intentions. That I will discuss in our next episode.
领英推荐
World model and Causality:
Humans construct a cognitive representation of the world based on the information accessible through their senses. Our decisions and behaviors stem from this internalized model of reality . To manage the abundance of information encountered in our daily experiences, our brain acquires abstract representations of spatial and temporal aspects of this data. We possess the ability to observe a scene and retain an abstract depiction of it. Additionally, research indicates that our perception at any given moment is influenced by our brain’s anticipation of future events, guided by our internal model. The brain encodes the world model as a causal graph, where events and their relationships are represented causally, facilitating computational efficiency and decision-making processes. It has even been argued that human-level AI is impossible without causal reasoning (Pearl, 2018).
AS per Pearl Causal Hierarchy (PCH) (Pearl et al., 2000), which categorizes causal information into three distinct levels: observational, interventional, and counterfactual. These levels reflect an ascending order of causal depth, each enabling a richer understanding of causality.
· The observational level provides insights into natural associations and correlations.
In the above picture, a causal graph depicting a world map of accident events is presented. By understanding these causal connections, mind construct a causal graph that illustrates the interrelationships between various factors contributing to road accidents. Each factor represents a node in the graph, with causal links indicating how one factor influences or leads to another, ultimately resulting in road accidents.
At the other hand, Large Language (Vision) Models (LLMs) operate in an autoregressive manner, predicting subsequent words in a sequence by analyzing preceding words. They lack the capacity to execute actions in real-world environments or acquire knowledge through embodied experiences. Let’s check the below scenario.
Example from ChatGPT:
As shown above, we observed that LLMs exhibited hallucinatory responses when presented with routine reasoning questions. Despite encountering the concept of leap years multiple times in the training data, LLMs struggled to reconcile the fact that 2010, was not a leap year. This discrepancy highlights a limitation in the model’s ability to accurately reason temporal relationships within LLM architectures.
Curiosity: Updating Internal Causal Graph
Curiosity is such a fundamental aspect of our nature that its ubiquity in our lives often goes unnoticed.
In 1994, George Loewenstein characterized curiosity as “a cognitive induced deprivation that arises from the perception of a gap in knowledge and understanding.” According to Loewenstein’s information gap theory, curiosity operates similarly to other drive states, such as hunger, which prompts individuals to seek food. Expanding on this concept, Loewenstein proposed that even a small amount of information acts as a priming dose, significantly heightening curiosity. While the consumption of information is rewarding, continued exposure eventually leads to satiation, reducing the desire for further exploration.
Curiosity serves as a catalyst for effective information acquisition and exploration, directing attentional resources to enhance memory retention and streamline the learning process. By igniting a desire to uncover novel or unexpected information, curiosity fosters interest, leading to more efficient learning and memory consolidation. Furthermore, curiosity correlates positively with intelligence, as individuals with higher levels of curiosity tend to exhibit greater cognitive flexibility, problem-solving skills, and adaptability.
Curiosity can update a causal graph by two ways:
· According to Information gap theory, when encountering unfamiliar or unexpected events, curiosity prompts further investigation to seek out new information and experiences to understand the underlying causes and relationships and update the existing causal graph.
Language models (LLMs) or today’s AI lack intrinsic drives for curiosity that are innate to humans, which contributes to their current limitations in reasoning and decision-making on a cognitive level like human. Unlike humans, LLMs do not possess an inherent motivation to seek out new information or explore novel concepts autonomously. Instead, their functionality is based on statistical patterns learned from large datasets, without the ability to exhibit genuine curiosity or engage in exploratory behavior.
I’ve come to a strong realization that despite notable advancements in Generative AI and deep learning, the current state of artificial intelligence lacks some crucial qualities that I referred as ‘CCC’ components: Curiosity, Causality, and Consciousness, especially when considering how AI interacts with the physical world. These three ‘C’s are closely interconnected and operate within a framework that aims to minimize uncertainty, achieve specific goals, and raise General Intelligence.
I have concluded the discussion here. In the second episode, we will explore how consciousness and self-awareness emerge from the chunk of neurons and how these phenomena pave the way for transcendent capabilities and contribute to our understanding of the singularity.
Summary:
While some may argue for the classification of Large Language Models (LLMs) as Artificial General Intelligence (AGI), it’s crucial to recognize the distinct disparities between the two. LLMs lack the essential traits of AGI, such as autonomous adaptation to dynamic environments and the capacity for self-directed learning akin to human cognition.
Furthermore, exploring the concept of curiosity elucidates its pivotal role in knowledge acquisition and cognitive development. Curiosity serves as a driving force behind the exploration of new information, prompting individuals to question assumptions and update their understanding of causal relationships within their cognitive models. While humans exhibit intrinsic drives for curiosity, LLMs lack this innate motivation and rely solely on statistical patterns derived from data.
In essence, while LLMs excel in certain tasks such as text generation and language understanding, they remain fundamentally distinct from AGI due to their inability to autonomously reason, adapt, and engage with the physical world. Understanding these disparities is crucial for navigating the future of artificial intelligence and advancing towards the goal of achieving truly intelligent systems.
Data Science Manager at Accenture AI
10 个月This was very insightful ??
Data & AI , Deloitte | Computer Vision| Generative AI | Deep Learning | Data Science
10 个月Pete Grett Definitely
GEN AI Evangelist | #TechSherpa | #LiftOthersUp
10 个月Looking forward to unraveling the mysteries of generative AI vs AGI with you! Dibyendu Das, PhD
Dibyendu Das, PhD Thanks for Sharing ??