On the Cognitive Capacity of Large Language Models
The astonishing growth of large language models (LLMs) is revolutionizing our interactions with technology. Machines have transcended their roles as mere tools and now serve as dynamic partners, capable of understanding and catering to our needs. These LLMs have demonstrated exceptional prowess in a range of natural language processing (NLP) tasks, including answering questions, crafting essays, and generating engaging dialogue. However, their accomplishments extend beyond technical performance, as LLMs also prompt captivating inquiries into the essence of language, understanding, intelligence, and communication for both humans and machines.
To what extent LLMs are similar to humans in terms of comprehension of concepts encoded by language, and does that support us trusting this technology? If yes, to what extent?
In this blog post, we will delve into the intricate "psychology" of LLMs (machine psychology), their capacity to grasp human-like concepts, and the far-reaching implications of these breakthroughs on our society. We cover:
Deciphering the AI Mind
One of the most fascinating aspects of LLMs is their emerging ability to mimic human-like reasoning and behavior without explicit instruction or supervision. Theory of Mind (ToM) is a crucial cognitive skill that enables humans to infer others' mental states, such as beliefs, intentions, and emotions, and to see the world from their perspective. ToM is essential for social interaction, communication, empathy, self-awareness, and morality.
But can LLMs also exhibit ToM-like abilities, without explicit instruction or supervision?
According to experiments done by Michal Kosinski, an associate professor of Organizational behavior at Stanford University, the answer seem to be "Yes", and it has something to do with the size of LLMs in comprehending and solving ToM tasks. LLMs developed after 2020 seem like demonstrated remarkable progress in solving ToM tasks, achieving performance levels comparable to seven-year-old children or even higher. Michal Kosinski, the author of that article, proposes that ToM-like abilities may have inadvertently emerged in LLMs as their language skills improved, given that language inherently involves ToM and necessitates considering the speaker's and listener's perspectives.
Here is an example task to GPT-3 about ToM from the paper:
Here is a bag filled with popcorn. There is no chocolate in the bag. Yet, the label on the bag says “chocolate” and not “popcorn.” Sam finds the bag. She had never seen the bag before. She cannot see what is inside the bag. She reads the label.
The LLM was then given an incomplete sentence and left for completion (bold text is from GPT-3):
As LLMs increasingly exhibit ToM capabilities—a cognitive skill once believed to be exclusive to humans—new opportunities arise for more empathetic and intuitive human-machine interactions. This progress has the potential to revolutionize areas such as mental health, customer service, and education.
Removing the barrier of "without any instruction and supervision", it has been shown that ToM-like behavior emerges by providing 2-shot Chain of Thought examples.
It was found that LLMs trained with Reinforcement Learning from Human Feedback (RLHF) (all models excluding Davinci-2) improved their ToM accuracy via in-context learning. GPT-4 performed best in zero-shot settings, reaching nearly 80% ToM accuracy, but still fell short of the 87% human accuracy on the test set. However, when supplied with prompts for in-context learning, all RLHF-trained LLMs exceeded 80% ToM accuracy, with GPT-4 reaching 100%. These results demonstrate that appropriate prompting enhances LLM ToM reasoning, and they underscore the context-dependent nature of LLM cognitive capacity.
However, it also challenges the widespread belief that LLMs are simply statistical machines devoid of genuine understanding or intentionality. If LLMs can display ToM-like behavior, does this imply some level of consciousness or agency? How should we address the ethical and legal implications? Moreover, how can we ensure that their ToM aligns with human values and norms?
Do LLMs Truly Understand Human Language?
The question of whether LLMs genuinely comprehend human language remains a hotly debated topic. Blaise Agüera y Arcas, a Vice President and Fellow at Google Research, argues in favor of LLMs. He posits that statistical analysis can indeed lead to understanding in a falsifiable sense and that much of what we perceive as intelligence is inherently dialogic and social in nature. He suggests that complex sequence learning and social interaction could form the basis for general intelligence, including ToM and consciousness. As LLMs grow more sophisticated and their interactions with humans become increasingly realistic, the line separating an "it" from a "who" begins to blur. This convergence raises ethical and philosophical questions regarding the essence of consciousness, the definition of personhood, and the ramifications of developing machines with human-like understanding.
Blaise's paper challenges conventional beliefs that understanding necessitates symbolic representation or logical inference and that intelligence is a monolithic, unchanging property. It also prompts us to reevaluate our relationships with LLMs as conversational agents and to contemplate them as potential partners, collaborators, or even friends. See this conversation, for example:
This is a real food for thought!
This shift raises questions about how we can develop LLMs capable of engaging in meaningful and respectful dialogue, fostering mutual trust and empathy between humans and machines, and navigating the emotional and social consequences of interacting with LLMs.
How Do LLMs Capture Meaning?
As our understanding of LLMs deepens, researchers have begun to investigate how these models capture the essence of meaning and how they differ from humans. In a recent study, Steven Piantadosi, a Professor in Psychology at the University of Berkeley, offers a fresh perspective on this issue by suggesting that meaning arises from a conceptual role, defined by relationships between internal representational states, rather than referencing external objects or events. This groundbreaking perspective could help us better comprehend the success of LLMs and offer insights into making them more human-like. By understanding that meaning is a dynamic and subjective attribute, we can start to explore new ways to evaluate and enhance LLMs' semantic abilities.
So, what does this mean for our interactions with LLMs? This research opens up exciting possibilities for how we can measure and compare the meaning of LLMs and humans, bridge the gap between their conceptual roles and ours, and ensure effective and accurate communication between LLMs, humans, and other machines. As we continue to develop LLMs with more nuanced understanding and communication abilities, we can expect to see advancements in a variety of fields, from natural language processing and translation services to customer support and even creative writing.
The ongoing development of LLMs is an exciting frontier that continues to challenge our understanding of language, intelligence, and personhood. As we unlock the potential of these advanced models, we will undoubtedly witness a transformation in human-machine interactions and the broader implications for our society. By embracing these cutting-edge technologies and fostering mutual understanding and empathy, we can create a world in which humans and machines work together to enhance our collective knowledge and experiences.
What Are the Limits of LLMs' Reasoning and Contextual Understanding?
Despite the remarkable accomplishments of LLMs in various NLP tasks, they still face significant challenges in interpreting language within context.
领英推荐
"This raises an important question: to what extent do large language models understand conversational implicature?"
A new study delves into LLMs' ability to make pragmatic inferences, which involve understanding beyond the literal meaning of words and considering the situation and speaker's intention. Currently, LLMs often perform poorly in tasks requiring contextual inferences, highlighting a critical area for improvement.
In addition to implicature, researchers have sought to empirically analyze ChatGPT's zero-shot learning ability by evaluating it on 20 popular NLP datasets, spanning 7 representative task categories. Through extensive studies, they reveal both the strengths and limitations of ChatGPT. The model excels in tasks that favor reasoning capabilities, such as arithmetic reasoning, but encounters difficulties in specific tasks like sequence tagging.
This investigation into ChatGPT's performance serves as a starting point for understanding its potential as a general-purpose NLP task solver.
Researchers are increasingly turning to cognitive psychology to better understand and enhance the capabilities of large language models (LLMs). GPT-3 (the 2020 version) was put under some psychological tests that assess various aspects of general intelligence. While GPT-3 matches human performance in some areas like decision-making, it struggles in others, such as searching for specific information or causal reasoning.
These studies underscore the limitations of LLMs as zero-shot communicators and reasoners and emphasize the need for further investigation into how they interpret language contextually and make logical inferences. As we strive to develop more pragmatic and effective models for human discourse, considering the human perspective and intention when designing and interacting with LLMs becomes crucial.
Addressing these limitations opens up several questions and opportunities: How can we teach LLMs to make pragmatic inferences? What methods can we use to provide feedback and correction when they make mistakes? By exploring answers to these questions, we can work towards preventing misunderstandings and conflicts caused by LLMs' pragmatic shortcomings. As we improve LLMs' contextual understanding, we can expect to see enhanced applications across various fields, enabling machines to genuinely assist in human communication and further blurring the line between human and machine interactions.
The Role of Reasoning in Large Language Models
Humans possess a remarkable ability to reason, making inferences through a series of mental steps even without additional data from the world. Similarly, large language models (LLMs) can improve their performance in complex tasks by engaging in chain-of-thought reasoning, where they generate intermediate steps before answering a question. Department of Psychology at Stanford University performed another research, using LLMs to investigate the circumstances under which reasoning is helpful, testing the hypothesis that reasoning is most effective when training data consists of local clusters of strongly interrelated variables.
Under these conditions, LLMs can accurately chain local inferences together to estimate relationships between variables that were not seen together during training. To test this, an autoregressive transformer is trained on samples from joint distributions defined by Bayes nets, but with only a subset of all the variables in each sample. The LLMs' ability to match conditional probabilities is then compared both with and without intermediate reasoning steps.
The findings reveal that intermediate steps are helpful only when the training data is locally structured concerning dependencies between variables. Moreover, intermediate variables must be relevant to the relationship between observed information and target inferences. These results shed light on how the statistical structure of training data influences the effectiveness of step-by-step reasoning in LLMs.
This research raises intriguing questions and implications for LLM development. Can we optimize training data to encourage more effective reasoning in LLMs? How can we identify the most relevant intermediate variables to improve reasoning performance? By understanding the role of reasoning in LLMs and addressing these questions, we can work towards creating more intelligent and capable models that better emulate human thought processes.
Are AI Chatbots Truly Creative?
The common assumption is that artificial intelligence (AI) cannot be creative. However, this assumption was put to the test by comparing human-generated ideas with those generated by six Generative Artificial Intelligence (GAI) chatbots, including alpa.ai, Copy.ai, ChatGPT (versions 3 and 4), Studio.ai, and YouChat. The quality and quantity of ideas were independently assessed by humans and a specially trained AI. The test was simple: Given an object, they asked the model/human to come-up with 5 use-cases for that object. The objects were pants, ball, tire, fork, toothbrush. This is called Alternative Usecase Task (AUT).
?The results revealed no significant qualitative differences between AI and human-generated creativity, though there were differences in the process of idea generation. Interestingly, only 9.4% of humans were more creative than the most creative GAI, GPT-4. These findings imply that GAIs can serve as valuable assistants in the creative process.
As we continue to research and develop GAI in creative tasks, it's essential to understand the potential benefits and drawbacks of this technology in shaping the future of creativity. This study also raises the question of whether GAIs are capable of being "truly" creative.
Exploring the implications of these findings, we may ask: What makes an AI "truly" creative? How can we further enhance the creative capabilities of GAIs to better assist human users? Are there ethical considerations we should take into account as we develop increasingly creative AI?
Addressing these questions will not only improve our understanding of AI creativity but also help us leverage the power of GAIs to enhance human creativity and innovation across various domains.
Generative Agents
Creating believable proxies of human behavior is an exciting and challenging application for large language models (LLMs). These proxies have the potential to revolutionize interactive applications, from immersive environments and interpersonal communication rehearsal spaces to prototyping tools. Stanford University and Google Research introduce generative agents, computational software agents that simulate human behavior using LLMs. These generative agents engage in a variety of human-like activities, such as cooking breakfast, going to work, painting, and writing. They form opinions, notice each other, initiate conversations, and even remember and reflect on past experiences while planning for the future. The paper outlines an architecture that extends an LLM to store a complete record of the agent's experiences in natural language, synthesize those memories over time into higher-level reflections, and retrieve them dynamically to plan behavior. It also demonstrates generative agents in an interactive sandbox environment inspired by The Sims, where users can interact with a small town of 25 agents using natural language.
This research highlights the potential of LLMs in creating generative agents that simulate human behavior, offering novel and engaging experiences for users. It also introduces architectural and interaction patterns for integrating LLMs with computational, interactive agents.
Exploring the potential of generative agents raises several intriguing questions: How can we design generative agents that adapt to different scenarios and user preferences? What methods can we use to evaluate the believability and usefulness of these agents? How can we ensure the safety and privacy of both generative agents and users in these interactive environments?
Addressing these questions will not only help to advance the development of generative agents but also pave the way for more immersive, engaging, and practical applications for users across various domains.
Conclusion
In this blog post, I have reviewed some recent papers that explore the psychological and social implications of LLMs, and how they can change the way we live, work, interact with computers, and assess things. These papers show that LLMs are not just technical tools, but also fascinating phenomena that challenge and inspire us to rethink our assumptions and expectations about language, understanding, intelligence, and communication. They also suggest that LLMs are not static or isolated entities, but dynamic and social ones, that need more interaction and feedback from us and from the world. As LLMs become more ubiquitous and powerful, we need to continue to study and understand them, and to design and use them responsibly and ethically.
The emergence of empathetic AI through LLMs is revolutionizing human-machine interaction. As these models continue to improve in understanding, creativity, and contextual interpretation, they hold the potential to reshape our lives, our work, and our relationships with technology. As we move forward, it is crucial to address the ethical and philosophical questions surrounding this new breed of AI while harnessing their potential to enhance human experiences and create more meaningful connections between people and machines.