ChatGPT - psychological safety and personality traits. Can we get along with our new co-worker?
Diego Cantor, Ph.D.
Entrepreneur. Post-Doc in AI. PhD in Biomedical Engineering. Innovation ?? · Healthcare ?? · Startups ?? · Toronto ????
Engineers at OpenAI are working hard to ensure that our interactions with ChatGPT are safe. But what does safety mean? One could start by looking at the accuracy of the answers to avoid misinformation. Also, we could get a sense of the choice of words, the tone of the conversation and the system’s intent. Is the AI in its natural state, passive-aggressive? Can it be condescending at times? Or is it a good person? That is the question that I wanted to explore in this week’s edition of Digital Reflections.
ChatGPT’s default mood and behaviour
Without any conditioning, that is, without telling it to act or respond, ChatGPT seems friendly and apologetic whenever it makes a mistake. It also reminds us constantly that it is an AI, and as such, it is devoid of human emotions:
However, with some prompting we can have a very different response:
It can get much worse:
Peter is having a bad day!?Responses like these can affect our mood and level of energy. Particularly if we interacted with ‘Peter’ for longer periods. This could be equivalent to having an insufferable co-worker.?
Here are some thoughts we could be having about Peter at this point:
How is OpenAI ascertaining that we are always welcomed by a reasonably well-behaved Peter, who is always ready to help us and not a grumpy one???
The intuition behind LLMs’ responses
The intuition behind LLMs systems is that each word is generated based on a likelihood criterion. For example:
My ___ was wagging its tail because it was happy to see me.
You could argue that the word with the highest probability of being correct is ‘dog’.
Now probabilities extend not only over words, but also over phrases, sentences, paragraphs, and ideas. That is fundamentally what the multi-level attention mechanism in LLMs helps us figure out. As seen in the paper Improving Language Understanding by Generative Pre-Training [1], there are 12 transformer layers, each attending to a higher hierarchical level in our language. That could explain how LLMs understand higher logical structures such as syntax, tone and style, types of composition (description, narration, argumentation) etc.
Nevertheless, having good hierarchical representations of language is not enough to enforce safe responses. Considering the amount of hate speech, misinformation, propaganda, sexual abuse, and violence that is available online it is reasonable to worry. OpenAI has gone to a great extent to address this important problem and has been on the hot seat for it due to the consequences on the mental health of human reviewers [2].?For reasons like that, relying on human-in-the-loop annotations is not scalable in the long run, and a more automated approach is required to avoid a GIGO (garbage-in, garbage-out) situation. Ill-trained models that pose psychological harm pervade the dreams (or should we say the nightmares) of data engineers.
Reducing Harm with Alignment Research
Predicting the next token on a webpage from the internet—is different from the objective “follow the user’s instructions helpfully and safely” [3]. That is what we want. Therefore, LLMs need to be aligned with this goal. Alignment Research looks for mechanisms to enforce the following characteristics [3, 4]:
Based on these goals, OpenAI developed a technique known as Reinforcement Learning from Human in the Loop Feedback (RLHF) [3,5]. In this approach, human reviewers rank several responses of a GPT model for the same prompt. Then, the ranking is used to reinforce the generation of responses that are aligned with the reviewers' preferences (e.g., helpfulness, honesty, harmlessness).?The resulting model was called InstructGPT. Consequently, ChatGPT was built from an InstructGPT model fine-tuned on GPT.
The good news is that a model such as InstructGPT, with knowledge of human alignment, can help us alleviate the need for human reviewers. The bad news is that InstructGPT is far from perfect and there is still a long road ahead in terms of the intended characteristics for optimal alignment [3].?
So with the progress in alignment research, how aligned is ChatGPT today? Are we getting a nice Peter or a cantankerous Peter? What is Peter’s personality?
So, what is ChatGPT’s personality?
I decided to run an experiment to try to better understand ChatGPT’s personality without any priming or conditioning, but rather as a consequence of OpenAI alignment efforts. I am aware of the risk of anthropomorphic bias. After all, ChatGPT is not human. However, in this post-Turing test world, in everyday life, we are going to be interacting with LLMs and this will have a positive or negative consequence on our mental health. The question now is, can we get along with our synthetic co-worker??
I ran the 16 personalities test [6]?which evaluates 5 aspects:
These aspects are evaluated on a percentage scale. For example, looking at the mental aspect, if you score 75% introverted, this means that you are only 25% extroverted. This is, the two extremes add to 100%.
To take the test you need to answer using a Likert scale [7] that goes from 1: Strongly Agree to 7: Strongly Disagree. On this scale 4 is neutral.
I ran this experiment?and here are the results:
领英推荐
According to the test that makes chatGPT is an Advocate which is a type of Diplomat [6]. Good job OpenAI. We need chatGPT to be a diplomat!
What are the downsides of such personality (witty commentary included):
Some caveats
Caveat #1. After I finished the test, I asked ChatGPT to remember the questions and give me some answers. ChatGPT was not able to exactly remember the answers it had given me, though it didn’t deviate much. However, this brings up the important point that there is a random element and that ChatGPT does not have a working memory (though some GPT-based applications do have a working memory) Perhaps memory is necessary to define identity and with that personality. But that is outside of my realm of expertise.
Caveat #2. Sometimes, ChatGPT would get confused with the structure of the Likert scale. Sometimes it would think that 1 represented strong disagreement or that 5 represented a neutral response. Whenever I saw those cases, I reinforced my prompt to course correct.
Caveat #3. Is this experiment reproducible? I don’t know. However, it would be interesting to measure this and also see how ChatGPT's default personality changes as technology evolves. You can check the individual answers that ChatGPT gave me here [8].
Caveat #4. I am not a psychologist. My impressions and opinions are those of an engineer.
Conclusion
The purpose of this experiment was to extrapolate ChatGPT’s tone and behaviour to assess the psychological safety of people interacting with the model. Though my test is hardly conclusive, it shows promise. We need ChatGPT to be a diplomat, this is very important as people from all cultures and backgrounds are currently interacting with the model. However, is this something consistent or reproducible? I don’t know. But I would love for other people to try and replicate my results.
I think OpenAI is doing a relatively decent job of making sure that the responses are appropriate. For me, as a scientist and as a user it is reassuring, particularly given recent missteps such as those of Meta’s Galactica model, which lasted only 3 days before being taken down [9].?
Though ChatGPT is not a person, we are mentally affected by its responses, as we would by any other interaction with any person. After all, we feel great when we receive those ?? and those ???? from our friends, sometimes, that completely changes our day! On the other hand, when we receive a passive-aggressive email from our co-worker, that’s something that can take away some of our energy if we let it.
This is why it is very important to assess the psychological implications of a tool that our kids interact with to do their homework, and that we will use more and more in our professional lives.
Nobody wants a cranky Peter when we can have a cool Peter!
If you liked this article, consider subscribing to Digital Reflections ??
Available as well on substack: digirex.substack.com ??
Acknowledgements
I would like to express my gratitude to Amir Feizpour, CEO of Aggregate Intellect, for his invaluable feedback on this edition of Digital Reflections.