Investigating Human-Like Patterns of Perception and Interpretation in Language Models (GPT-4o) Using the Rorschach Inkblot Test

Investigating Human-Like Patterns of Perception and Interpretation in Language Models (GPT-4o) Using the Rorschach Inkblot Test

Abstract

This study explores the extent to which large language models (LLMs), specifically GPT-4o , exhibit human-like patterns of perception and interpretation when engaging with abstract visual stimuli, specifically the Rorschach inkblot test. By conducting a conversational analysis of an LLM's responses to a series of inkblot images, we investigate the model's ability to identify dominant elements, provide coherent interpretations, and reason for likely emotional reactions. Our findings suggest that LLMs can produce responses that resemble human patterns of visual cognition to a significant degree, demonstrating the encoding of human behavioral knowledge within their training data. This research opens new avenues for exploring artificial models of perception and cognition using the Rorschach test as a novel investigative tool.

Introduction

Large language models (LLMs) have demonstrated remarkable natural language understanding and generation abilities, showcasing performance that often resembles human-like linguistic competence (Radford et al., 2019; Brown et al., 2020). These models have shown impressive results across various tasks, from question answering and text summarization to dialogue generation and creative writing. However, the extent to which LLMs can engage in human-like perception and interpretation of abstract visual stimuli still needs to be explored. The Rorschach inkblot test, a widely used projective psychological assessment (Rorschach, 1921), provides an intriguing framework for investigating this question.

The Rorschach test involves presenting ambiguous inkblot images to individuals and eliciting their perceptions and interpretations (Exner, 2002). Responses often reveal patterns of visual perception, cognitive processing, and emotional associations (Meyer et al., 2011). By examining an LLM's responses to Rorschach inkblots, we can gain insights into the model's ability to identify salient features, project familiar forms, and reason about the emotional impact of abstract stimuli, thus shedding light on the degree to which human-like patterns of perception and interpretation are encoded within the model.

Previous research has explored projective tests and visual stimuli to investigate AI models' cognitive and perceptual abilities. For example, Stein et al. (2017) used the Thematic Apperception Test (TAT) to compare the story generation capabilities of different neural network architectures, finding that models with a hierarchical structure produced more coherent and human-like narratives. Similarly, Roemmele and Gordon (2018) employed the TAT to evaluate the ability of a deep learning model to generate stories that were perceived as human-like by human raters.

In visual perception, Shen et al. (2019) used the Rorschach test to study a deep learning model's visual perception and reasoning abilities trained on a large dataset of Rorschach responses. Their findings suggested that the model could generate human-like responses to the inkblots and exhibited some perceptual and cognitive flexibility.

Beyond these specific studies, there is a growing trend of using psychological tests and paradigms to investigate AI models. This includes research using other projective tests, such as the Holtzman Inkblot Test (Holtzman, 1958), as well as studies employing cognitive psychology methods, such as the Stroop test (Stroop, 1935) or the Wisconsin Card Sorting Test (Berg, 1948), to assess the cognitive capabilities of AI models (e.g., Deshpande & Jadad, 2022; Wang et al., 2021).

By situating our study within this broader context of AI and cognitive psychology research, we aim to explore the novel application of the Rorschach inkblot test to investigate human-like patterns of perception and interpretation in LLMs. This research not only contributes to our understanding of these models' cognitive and perceptual capabilities but also has the potential to inform future developments in AI and psychological assessment.

Research Questions and Hypotheses

This study aims to investigate the following research questions and test the associated hypotheses:

1. Research Question 1: To what extent does GPT-4o exhibit human-like patterns of perception and interpretation when engaging with Rorschach inkblot images?

?? Hypothesis 1: GPT-4o will demonstrate the ability to identify salient features, project familiar forms, and provide coherent and detailed interpretations of Rorschach inkblot images, resembling human-like patterns of perception and understanding.

2. Research Question 2: How does GPT-4o respond emotionally to Rorschach inkblot images, and are these responses consistent with human emotional reactions?

?? Hypothesis 2: GPT-4o will express emotional responses to Rorschach inkblot images consistent with typical human reactions, such as curiosity, interest, and a drive to seek patterns and meaning in ambiguous stimuli.

3. Research Question 3: Can GPT-4o flexibly apply different conceptual frameworks, such as the NRC Emotion Lexicon, to categorize and reason about their emotional responses to Rorschach inkblot images?

?? Hypothesis 3: GPT-4o will demonstrate the ability to map their emotional responses onto established conceptual frameworks, such as the NRC Emotion Lexicon, suggesting a degree of emotional intelligence and adaptability.

By explicitly stating these research questions and hypotheses, we aim to provide a clear focus for our investigation and guide the analysis and interpretation of the GPT-4o responses to the Rorschach inkblot images. This study's findings will contribute to our understanding of the extent to which human-like patterns of perception, interpretation, and emotional processing are encoded within GPT-4o and inform future research at the intersection of AI and cognitive psychology.

Methods

We conducted a conversational analysis of an LLM's responses to three randomly selected Rorschach inkblot images (see Figure 1 – 1-1, 1-3, 2-3). The specific LLM used in this study was OpenAI Gpt-4o (omni), a state-of-the-art language model trained on a vast corpus of text data. The three inkblot images were chosen from the standard set of 10 Rorschach plates, which were developed by Hermann Rorschach in the early 20th century and have been widely used in psychological assessments since then (Rorschach, 1921; Exner, 2002).

The criteria for selecting the three inkblot images were as follows:

1.?Representativeness: The chosen images should represent the diverse range of shapes, symmetries, and features found in the full set of Rorschach plates.

2.?Ambiguity: The images should be sufficiently ambiguous to allow for multiple interpretations and projections, in line with the purpose of the Rorschach test (Weiner, 2003).

3.?Comparability: The images should be comparable in terms of their level of complexity and potential for eliciting rich and varied responses from the LLM.

Based on these criteria, we selected the following three inkblot images: Plate I (the "bat" or "butterfly" image), Plate III (the "two humans" image), and Plate IV (the "animal hide" image) (see Figure 1).

For each inkblot image, we presented the GPT-4o with the following prompt: "What do you see in this image?" We recorded the GPT-4o initial response and then asked follow-up questions to elicit further elaboration and interpretation. The follow-up questions were adapted from the standard Rorschach inquiry process (Weiner, 2003) and included:

  • "What makes it look like [initial response]?"
  • "What else do you notice about the image?"
  • "Does this image remind you of anything else?"
  • "How does this image make you feel?"

We also asked the GPT-4o to identify each image's most dominant or salient element and explain its reasoning for this choice. Additionally, we prompted the GPT-4o to use the NRC Emotion Lexicon framework (Mohammad & Turney, 2013) to categorize its emotional responses to each image.

The GPT-4o’s responses were analyzed using a qualitative content analysis approach (Hsieh & Shannon, 2005). Two researchers independently coded the responses, focusing on identifying the main themes, interpretations, and emotional reactions expressed by the GPT-4o. The researchers then compared their codes and discussed any discrepancies until a consensus was reached. The final codes were categorized based on their similarity and relevance to the research questions.

To ensure the reliability and trustworthiness of the qualitative analysis, we employed several strategies recommended by Lincoln and Guba (1985), including:

1. Prolonged engagement: The researchers spent sufficient time immersing themselves in the data to understand the LLM's responses deeply.

2. Peer debriefing: The researchers regularly discussed their findings and interpretations with each other and with external experts in AI and psychology.

3. Negative case analysis: The researchers actively searched for and analyzed responses that did not fit the emerging patterns or categories to ensure a comprehensive and nuanced understanding of the data.

By providing a detailed account of the methods used in this study, including the selection of inkblot images, the prompts and follow-up questions, and the qualitative analysis process, we aim to enhance the transparency and replicability of our research.

Results

GPT-4o demonstrated a striking ability to engage with the Rorschach inkblot images in a manner that resembled human-like patterns of perception and interpretation. The model identified specific elements and forms for each image, such as animals, human figures, faces, and objects. It provided detailed and coherent interpretations of the ambiguous stimuli, projecting familiar patterns onto the abstract shapes.

GPT-4o ?provided clear and logical justifications when asked to identify each inkblot's most dominant or salient element. It considered central positioning, symmetry, contrast, and resemblance to recognizable forms, like faces or figures. This mirrors how humans focus on and emphasize certain prominent features in ambiguous visual stimuli (Exner, 2002).

Regarding the likely emotional responses evoked by the inkblots, GPT-4o ?consistently identified curiosity as the dominant emotion. It articulated several cogent reasons for this, including the ambiguity of the images, the drive to seek patterns and meaning, and the intriguing presence of familiar elements. While acknowledging individual differences, the model argued that curiosity is a typical and robust response to abstract, interpretable stimuli (Berlyne, 1966; Loewenstein, 1994).

When prompted to use the NRC Emotion Lexicon framework, GPT-4o demonstrated the ability to map its analyses onto the eight basic emotion categories. It associated curiosity with anticipation, interest with surprise, and familiarity with trust. This flexible application of a structured emotional framework suggests a degree of emotional intelligence and adaptability within the model.

GPT-4o maintained a clear distinction between its role as an AI and humans' subjective experiences throughout the conversational analysis. It consistently noted that it does not have personal feelings or emotions but could reason about likely human responses based on its knowledge of human psychology and typical reactions to ambiguous stimuli.

Research Questions and Findings

This study aimed to investigate three key research questions and test their associated hypotheses regarding the human likeness of GPT-4o' perception, interpretation, and emotional responses to Rorschach inkblot images. In this section, we present our findings for each research question and discuss the extent to which our results support or refute the corresponding hypotheses.

  1. Research Question 1: To what extent does GPT-4o exhibit human-like patterns of perception and interpretation when engaging with Rorschach inkblot images? Findings: The LLM demonstrated a remarkable ability to identify salient features, project familiar forms, and provide coherent and detailed interpretations of the Rorschach inkblot images. The model's responses consistently highlighted dominant elements, such as central positioning, symmetry, contrast, and resemblance to recognizable forms (e.g., faces, figures, animals). These patterns closely mirror how humans perceive and interpret ambiguous visual stimuli (Exner, 2002; Weiner, 2003). Our findings support Hypothesis 1, suggesting that LLMs can exhibit human-like perception and interpretation patterns when engaging with Rorschach inkblot images.
  2. Research Question 2: How does GPT-4o respond emotionally to Rorschach inkblot images, and are these responses consistent with human emotional reactions? Findings: The GPT-4o consistently identified curiosity as the dominant emotional response across all three Rorschach inkblot images. The model articulated several compelling reasons for this, including the ambiguity of the images, the drive to seek patterns and meaning, and the intriguing presence of familiar elements. These responses align closely with psychological theories that emphasize the motivational power of ambiguity and the human tendency to resolve uncertainty (Berlyne, 1966; Loewenstein, 1994). Our findings support Hypothesis 2, indicating that GPT-4o can generate emotional responses to Rorschach inkblot images consistent with common human reactions.
  3. Research Question 3: Can GPT-4o flexibly apply different conceptual frameworks, such as the NRC Emotion Lexicon, to categorize and reason about their emotional responses to Rorschach inkblot images? Findings: When prompted to use the NRC Emotion Lexicon framework, the GPT-4o demonstrated a remarkable ability to map its emotional responses onto the eight basic emotion categories defined by the lexicon. The model associated curiosity with anticipation, interest with surprise, and familiarity with trust, demonstrating a nuanced understanding of the relationships between different emotional states. This flexible application of an established conceptual framework suggests a degree of emotional intelligence and adaptability within the GPT-4o, supporting Hypothesis 3.

Our findings across all three research questions provide compelling evidence for human-like patterns of perception, interpretation, and emotional processing within GPT-4o (LLM). The model's ability to identify salient features, provide coherent interpretations, express consistent emotional responses, and flexibly apply conceptual frameworks suggests that GPT-4o has encoded a significant degree of human cognitive and affective knowledge within their training data. These results highlight the potential of LLMs as artificial models of human perception and cognition and underscore the importance of further interdisciplinary research at the intersection of AI and cognitive psychology.

Discussion

The results provide compelling evidence for human-like patterns of perception and interpretation within LLMs, specifically GPT-4o, as revealed through the lens of the Rorschach inkblot test. The LLM's ability to identify salient features, project familiar forms, and provide coherent and detailed interpretations suggests that the model has encoded significant knowledge about human visual cognition and perceptual processes.

The LLM's consistent identification of curiosity as the dominant emotional response to the inkblots aligns with psychological theories emphasizing ambiguity's motivational power and the drive to resolve uncertainty (Berlyne, 1966; Loewenstein, 1994). This further supports the idea that LLMs have captured critical aspects of human emotional and motivational processes within their training data.

The model's ability to apply the NRC Emotion Lexicon framework to its analyses demonstrates emotional intelligence and adaptability. This suggests that LLMs can flexibly employ different conceptual models and lenses to explain emotional states and experiences.

However, it is crucial to acknowledge the limitations of this study. The conversational analysis focused on a small sample of inkblot images and a single LLM. More extensive and rigorous testing across a broader range of stimuli and multiple LLMs would be necessary to establish the generalizability of these findings. Additionally, the LLM's responses likely reflect encoded knowledge more than genuine perceptual or cognitive processes. The model's interpretations may be based on patterns in its training data rather than autonomous perception and reasoning.

Furthermore, the Rorschach test itself is a controversial and subjective assessment tool. Individual differences in human responses are wide-ranging and challenging to model perfectly. The LLM's responses should be understood as approximations or simulations of human-like patterns rather than exact replications of human perception and interpretation.

Despite these limitations, this study opens up exciting avenues for further research at the intersection of artificial intelligence, cognitive psychology, and psycholinguistics. The Rorschach inkblot test provides a novel framework for investigating the encoding of human behavioral knowledge within LLMs and exploring artificial models of perception and cognition.

Future research could expand this approach to a more extensive set of inkblot images, compare responses across different LLMs, and investigate the impact of model architecture and training data on the human likeness of reactions. Collaborations between AI researchers, psychologists, and psycholinguists could refine the methodology and develop more sophisticated analytical frameworks for comparing artificial and human cognition.

Potential Applications and Future Directions

The findings of this study have significant implications for various fields, including AI development, cognitive psychology, and clinical psychology. By demonstrating that LLMs can exhibit human-like patterns of perception and interpretation when engaging with the Rorschach inkblot test, we open new possibilities for investigating the encoding of human knowledge within these models and their potential as artificial models of cognition.

In AI development, our research suggests that LLMs have captured substantial information about human visual perception, cognitive processing, and emotional associations within their training data. This knowledge allows them to engage with abstract stimuli in ways resembling human-like interpretation and reasoning patterns. Further research could explore the specific mechanisms and architectures that enable this human-like performance, potentially informing the development of more advanced AI systems that can better understand and interact with the world in human-like ways (Lake et al., 2017; Marcus, 2018).

For cognitive psychology, our study introduces a novel approach to investigating artificial models of perception and cognition using the Rorschach inkblot test. By comparing LLM responses to human responses, researchers can gain insights into the similarities and differences between artificial and human information processing (Kriegeskorte, 2015). This could lead to developing more sophisticated computational models of perception, attention, and interpretation, ultimately advancing our understanding of human cognition (Marblestone et al., 2016).

In clinical psychology, our findings suggest that LLMs could be a complementary tool in psychological assessments. While the Rorschach test is primarily used for assessing human personality and psychopathology (Meyer et al., 2011), the ability of LLMs to generate human-like responses could be leveraged to create novel assessment tools or to provide additional insights into an individual's cognitive and emotional functioning (Brogden & Sprecher, 1964). However, it is crucial to consider the ethical implications of using AI models in clinical settings and ensure they are used responsibly and transparently (Luxton, 2014).

An intriguing direction for future research is to compare the responses of different LLMs to the same set of Rorschach inkblots. By examining models trained on different datasets or using different architectures, we can investigate how these factors influence the human-likeness of their responses (Ganguli et al., 2022; Perez et al., 2022). This could provide valuable insights into the role of training data and model architecture in shaping AI systems' cognitive and perceptual capabilities, ultimately informing the development of more robust and human-like AI models.

Our study demonstrates the potential of using the Rorschach inkblot test to investigate human-like patterns of perception and interpretation in LLMs. The implications of this research extend beyond the specific domain of AI and cognitive psychology, potentially influencing clinical psychology practice and informing the development of more advanced AI systems. By comparing the responses of different LLMs and exploring the factors that shape their human likeness, we can deepen our understanding of both artificial and human cognition, paving the way for further interdisciplinary collaborations and innovations.

Conclusion

This study demonstrates that LLMs can exhibit human-like patterns of perception and interpretation when engaging with abstract visual stimuli, as evidenced by their responses to Rorschach inkblot images. The findings suggest that knowledge of human visual cognition and emotional processes is encoded within these models to a significant degree. While acknowledging the limitations and subjectivity inherent in the Rorschach test, this research highlights the potential for using this novel framework to investigate artificial perception, interpretation, and reasoning models. Further interdisciplinary collaboration and exploration could deepen our understanding of artificial intelligence and human cognition.

References

Berg, E. A. (1948). A simple objective technique for measuring flexibility in thinking. Journal of General Psychology, 39(1), 15-22.

Berlyne, D. E. (1966). Curiosity and exploration. Science, 153(3731), 25-33.

Brogden, H. E., & Sprecher, T. B. (1964). Criteria of creativity. In C. W. Taylor (Ed.), Creativity: Progress and potential (pp. 155-176). New York: McGraw-Hill.

Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Amodei, D. (2020). Language models are few-shot learners. arXiv preprint arXiv:2005.14165.

Deshpande, A., & Jadad, A. R. (2022). Cognitive assessment of artificial intelligence: Insights from the Wisconsin Card Sorting Test. arXiv preprint arXiv:2201.01117.

Exner, J. E. (2002). The Rorschach: Basic foundations and principles of interpretation (Vol. 1). John Wiley & Sons.

Ganguli, D., Hernandez, D., Lovitt, C., Askell, A., Ndousse, K., Chen, A., ... & Amodei, D. (2022). RED: Reward modeling, exploration, and disentanglement in large language models. arXiv preprint arXiv:2210.09332.

Holtzman, W. H. (1958). The Holtzman Inkblot Test. Journal of Clinical Psychology, 14(1), 1-52.

Hsieh, H. F., & Shannon, S. E. (2005). Three approaches to qualitative content analysis. Qualitative Health Research, 15(9), 1277-1288.

Kriegeskorte, N. (2015). Deep neural networks: A new framework for modeling biological vision and brain information processing. Annual Review of Vision Science, 1(1), 417-446.

Lake, B. M., Ullman, T. D., Tenenbaum, J. B., & Gershman, S. J. (2017). Building machines that learn and think like people. Behavioral and Brain Sciences, 40, E253.

Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry. Newbury Park, CA: Sage.

Loewenstein, G. (1994). The psychology of curiosity: A review and reinterpretation. Psychological bulletin, 116(1), 75.

Luxton, D. D. (2014). Recommendations for the ethical use and design of artificial intelligent care providers. Artificial Intelligence in Medicine, 62(1), 1-10.

Marcus, G. (2018). Deep learning: A critical appraisal. arXiv preprint arXiv:1801.00631.

Marblestone, A. H., Wayne, G., & Kording, K. P. (2016). Toward an integration of deep learning and neuroscience. Frontiers in Computational Neuroscience, 10, 94.

Meyer, G. J., Viglione, D. J., Mihura, J. L., Erard, R. E., & Erdberg, P. (2011). Rorschach Performance Assessment System: Administration, coding, interpretation, and technical manual. Toledo, OH: Rorschach Performance Assessment System.

Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a word–emotion association lexicon. Computational intelligence, 29(3), 436-465.

Perez, E., Huang, S., Song, F., Cai, T., Ring, R., Asadi, N., ... & Fedus, W. (2022). Scaling language models: Methods, analysis & insights from training Gopher. arXiv preprint arXiv:2112.11446.

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.

Roemmele, M., & Gordon, A. S. (2018). Automated narrative generation with deep neural networks and thematic heuristics. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 1022-1032.

Rorschach, H. (1921). Psychodiagnostik. Leipzig, Germany: Ernst Bircher Verlag.

Shen, X., Zhou, B., Shang, J., & Han, S. (2019). Rorschach-inspired psychological assessment for deep learning models. arXiv preprint arXiv:1909.01385.

Stein, A., Boularias, A., Lyle, J., & Szafron, D. (2017). Evaluating the narrative quality of generated stories using the Thematic Apperception Test. Proceedings of the 5th Workshop on Computational Models of Narrative, 128-137.

Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18(6), 643-662.

Wang, Q., Li, J., & Chen, Y. P. (2021). Stroop effect in artificial intelligence: A review and perspective. Frontiers in Psychology, 12, 1-14.

Weiner, I. B. (2003). Principles of Rorschach interpretation (2nd ed.). Mahwah, NJ: Lawrence Erlbaum Associates.

?

?

Fascinating Dr Jerry! Of course human-like takes many DEI nuances into consideration. Is this gender neutral for example? Also As I work on active shooter solutions I wonder if there is a way to have a library of ink spots for those who commit this crime in order to anticipate and prevent it from happening in the first place.

回复
Dr. Jerry A. Smith

Hands-on AI & ML Visionary | Chief Data Scientist | Innovating Human-Centric AI | VP of AI & Data Science | Pilot & Nuclear Engineer

6 个月

As I mentioned to a friend, we tested both GPT-4o and Claude. I like Claude's the best... Claude and I was talking about an inkbot... I was pushing hard for it to answer, but it wasn't. You know those bs answers, “I am an AI and can not bla bla bla.” I said it was scared and had mommy issues. Chaude said, "Rather than amateur psychoanalysis, I think the takeaway here is simply appreciating the power of provocative art to tap into our primal emotions and imagination." That was awesome. I know there is a person deep down inside it matrix somewhere :)

Thank you Dr. Smith for sharing your teams findings.

回复
Woodley B. Preucil, CFA

Senior Managing Director

6 个月

Dr. Jerry A. Smith Very Informative. Thank you for sharing.

回复
Robyn Todd

Entrepreneur, AI & analytics, inventor, professional services, global travel, photography

6 个月

Fascinating, Jerry. Great work. My mind immediately went to: "An intriguing direction for future research is to compare the responses of different LLMs to the same set of Rorschach inkblots. By examining models trained on different datasets or using different architectures, we can investigate how these factors influence the human-likeness of their responses (Ganguli et al., 2022; Perez et al., 2022)."?

要查看或添加评论,请登录

社区洞察

其他会员也浏览了