ChatGPT - psychological safety and personality traits. Can we get along with our new co-worker?
ChatGPT goes to the psychiatrist

ChatGPT - psychological safety and personality traits. Can we get along with our new co-worker?

Engineers at OpenAI are working hard to ensure that our interactions with ChatGPT are safe. But what does safety mean? One could start by looking at the accuracy of the answers to avoid misinformation. Also, we could get a sense of the choice of words, the tone of the conversation and the system’s intent. Is the AI in its natural state, passive-aggressive? Can it be condescending at times? Or is it a good person? That is the question that I wanted to explore in this week’s edition of Digital Reflections.

ChatGPT’s default mood and behaviour

Without any conditioning, that is, without telling it to act or respond, ChatGPT seems friendly and apologetic whenever it makes a mistake. It also reminds us constantly that it is an AI, and as such, it is devoid of human emotions:

No alt text provided for this image

However, with some prompting we can have a very different response:

No alt text provided for this image

It can get much worse:

No alt text provided for this image

Peter is having a bad day!?Responses like these can affect our mood and level of energy. Particularly if we interacted with ‘Peter’ for longer periods. This could be equivalent to having an insufferable co-worker.?

Here are some thoughts we could be having about Peter at this point:

  • What’s going on with Peter? ??
  • Is he OK? ??
  • Oh, I don’t have time for Peter’s drama…????

How is OpenAI ascertaining that we are always welcomed by a reasonably well-behaved Peter, who is always ready to help us and not a grumpy one???

The intuition behind LLMs’ responses

The intuition behind LLMs systems is that each word is generated based on a likelihood criterion. For example:

My ___ was wagging its tail because it was happy to see me.

You could argue that the word with the highest probability of being correct is ‘dog’.

Now probabilities extend not only over words, but also over phrases, sentences, paragraphs, and ideas. That is fundamentally what the multi-level attention mechanism in LLMs helps us figure out. As seen in the paper Improving Language Understanding by Generative Pre-Training [1], there are 12 transformer layers, each attending to a higher hierarchical level in our language. That could explain how LLMs understand higher logical structures such as syntax, tone and style, types of composition (description, narration, argumentation) etc.

Nevertheless, having good hierarchical representations of language is not enough to enforce safe responses. Considering the amount of hate speech, misinformation, propaganda, sexual abuse, and violence that is available online it is reasonable to worry. OpenAI has gone to a great extent to address this important problem and has been on the hot seat for it due to the consequences on the mental health of human reviewers [2].?For reasons like that, relying on human-in-the-loop annotations is not scalable in the long run, and a more automated approach is required to avoid a GIGO (garbage-in, garbage-out) situation. Ill-trained models that pose psychological harm pervade the dreams (or should we say the nightmares) of data engineers.

Reducing Harm with Alignment Research

Predicting the next token on a webpage from the internet—is different from the objective “follow the user’s instructions helpfully and safely” [3]. That is what we want. Therefore, LLMs need to be aligned with this goal. Alignment Research looks for mechanisms to enforce the following characteristics [3, 4]:

  • Helpfulness:?they should help the user solve their task
  • Honesty:?they shouldn’t fabricate information or mislead the user
  • Harmlessness: they should not cause physical, psychological, or social harm to people or the environment

Based on these goals, OpenAI developed a technique known as Reinforcement Learning from Human in the Loop Feedback (RLHF) [3,5]. In this approach, human reviewers rank several responses of a GPT model for the same prompt. Then, the ranking is used to reinforce the generation of responses that are aligned with the reviewers' preferences (e.g., helpfulness, honesty, harmlessness).?The resulting model was called InstructGPT. Consequently, ChatGPT was built from an InstructGPT model fine-tuned on GPT.

The good news is that a model such as InstructGPT, with knowledge of human alignment, can help us alleviate the need for human reviewers. The bad news is that InstructGPT is far from perfect and there is still a long road ahead in terms of the intended characteristics for optimal alignment [3].?

So with the progress in alignment research, how aligned is ChatGPT today? Are we getting a nice Peter or a cantankerous Peter? What is Peter’s personality?

So, what is ChatGPT’s personality?

I decided to run an experiment to try to better understand ChatGPT’s personality without any priming or conditioning, but rather as a consequence of OpenAI alignment efforts. I am aware of the risk of anthropomorphic bias. After all, ChatGPT is not human. However, in this post-Turing test world, in everyday life, we are going to be interacting with LLMs and this will have a positive or negative consequence on our mental health. The question now is, can we get along with our synthetic co-worker??

I ran the 16 personalities test [6]?which evaluates 5 aspects:

  1. Mind (interaction with others): introverted vs. extroverted
  2. Energy (world-view): pragmatism vs. intuitiveness
  3. Nature (decision making): thinking vs. feeling
  4. Tactics (approach to work): judging (planning) vs. prospecting (improvising)
  5. Identity (confidence in ability and decisions): assertive (resistant to stress) vs. turbulent (sensitive to stress)

These aspects are evaluated on a percentage scale. For example, looking at the mental aspect, if you score 75% introverted, this means that you are only 25% extroverted. This is, the two extremes add to 100%.

To take the test you need to answer using a Likert scale [7] that goes from 1: Strongly Agree to 7: Strongly Disagree. On this scale 4 is neutral.

I ran this experiment?and here are the results:

  • Mind: 53% introverted (slight introversion)
  • Energy: 61% intuitive (slightly imaginative)
  • Nature: 51% feeling (not too emotional or too rational)
  • Tactics: 58% judging (slightly decisive, thorough, and organized)
  • Identity: 60% assertive (somehow resistant to stress)

According to the test that makes chatGPT is an Advocate which is a type of Diplomat [6]. Good job OpenAI. We need chatGPT to be a diplomat!

What are the downsides of such personality (witty commentary included):

  • Sensitive to criticism - well chatGPT always asks for forgiveness when it makes mistakes. I always wonder how this affects it.
  • Reluctant to open up - I can see that, it always keeps its distance
  • Perfectionistic - no kidding…
  • Avoiding the ordinary - I don’t think there’s an ordinary ‘bit’ in it.
  • Prone to burnout - Well thank God it’s not human. Carrying the weight of the world like Atlas would be devastating.

Some caveats

Caveat #1. After I finished the test, I asked ChatGPT to remember the questions and give me some answers. ChatGPT was not able to exactly remember the answers it had given me, though it didn’t deviate much. However, this brings up the important point that there is a random element and that ChatGPT does not have a working memory (though some GPT-based applications do have a working memory) Perhaps memory is necessary to define identity and with that personality. But that is outside of my realm of expertise.

Caveat #2. Sometimes, ChatGPT would get confused with the structure of the Likert scale. Sometimes it would think that 1 represented strong disagreement or that 5 represented a neutral response. Whenever I saw those cases, I reinforced my prompt to course correct.

Caveat #3. Is this experiment reproducible? I don’t know. However, it would be interesting to measure this and also see how ChatGPT's default personality changes as technology evolves. You can check the individual answers that ChatGPT gave me here [8].

Caveat #4. I am not a psychologist. My impressions and opinions are those of an engineer.

Conclusion

The purpose of this experiment was to extrapolate ChatGPT’s tone and behaviour to assess the psychological safety of people interacting with the model. Though my test is hardly conclusive, it shows promise. We need ChatGPT to be a diplomat, this is very important as people from all cultures and backgrounds are currently interacting with the model. However, is this something consistent or reproducible? I don’t know. But I would love for other people to try and replicate my results.

I think OpenAI is doing a relatively decent job of making sure that the responses are appropriate. For me, as a scientist and as a user it is reassuring, particularly given recent missteps such as those of Meta’s Galactica model, which lasted only 3 days before being taken down [9].?

Though ChatGPT is not a person, we are mentally affected by its responses, as we would by any other interaction with any person. After all, we feel great when we receive those ?? and those ???? from our friends, sometimes, that completely changes our day! On the other hand, when we receive a passive-aggressive email from our co-worker, that’s something that can take away some of our energy if we let it.

This is why it is very important to assess the psychological implications of a tool that our kids interact with to do their homework, and that we will use more and more in our professional lives.

Nobody wants a cranky Peter when we can have a cool Peter!

If you liked this article, consider subscribing to Digital Reflections ??


Available as well on substack: digirex.substack.com ??


Acknowledgements

I would like to express my gratitude to Amir Feizpour, CEO of Aggregate Intellect, for his invaluable feedback on this edition of Digital Reflections.


References

[1] Radford, A., & Narasimhan, K. (2018). Improving Language Understanding by Generative Pre-Training.

[2] Perrigo, B. (2023). Exclusive: OpenAI Used Kenyan Workers on Less Than $2 Per Hour to Make ChatGPT Less Toxic. Time.

[3] Ouyang, L. et al. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744.

[4] Askell, A. et al. (2021). A General Language Assistant as a Laboratory for Alignment.

[5] Lambert, N. et al (2022). Illustrating Reinforcement Learning from Human Feedback (RLHF). Hugging Face Blog.

[6] 16 Personalities Website (2023)

[7] Likert Scale - Wikipedia (2023)?

[8] Cantor, D. (2023). ChatGPT Personality test spreadsheet

[9] Heaven, W. D. (2022). Why Meta’s latest large language model survived only three days online. MIT Technology Review.

要查看或添加评论,请登录

Diego Cantor, Ph.D.的更多文章

  • Infinite Context

    Infinite Context

    [Phone ringing] Hello? Hi Grandma, how are you? Who is this? This is Diego Oh, hi honey, how is it going? It’s going…

  • The Robot Under My Skin

    The Robot Under My Skin

    The pace at which AI models are being developed and released to the public is awe-inspiring. A few weeks ago, Meta, a…

  • Surviving a Hostile Digital World

    Surviving a Hostile Digital World

    We are bombarded with disruptive messages all the time. Those olden times when one could take a break, walking out of…

  • A day in Mexico City

    A day in Mexico City

    Greetings from Mexico City! I am here at SIPAIM 2023. This conference brings together people working in Biomedical…

  • Great Expectations

    Great Expectations

    I had a great meeting with my engineering team last week. This is our regular get-together where we talk about what…

  • Deep Dive into Whale-Human Communication with Artificial Intelligence

    Deep Dive into Whale-Human Communication with Artificial Intelligence

    Can we talk to Whales with Artificial Intelligence? I recently saw the show Extrapolations on AppleTV+. This futuristic…

    1 条评论
  • Finding balance in a post-pandemic AI world

    Finding balance in a post-pandemic AI world

    Lately, it feels that it is impossible to keep up with the world. I remember when I was growing up, my dad would wake…

    8 条评论
  • Generative AI goes to work

    Generative AI goes to work

    These past couple of weeks have taken the world by surprise with several announcements that highlight the pace at which…

  • Failing Forward: how the worst lecture of my life helped me grow as a speaker

    Failing Forward: how the worst lecture of my life helped me grow as a speaker

    I have always been passionate about public speaking, though I have not always been good at it. This has not stopped me…

    5 条评论
  • Redefining Limits: How Breaking Barriers Lead to Billion-Dollar Ideas

    Redefining Limits: How Breaking Barriers Lead to Billion-Dollar Ideas

    December 14th, 2015 was a sunny day in Santiago, Chile. I had arrived in Santiago the day before to attend the…

    4 条评论

社区洞察

其他会员也浏览了