ChatGPT & Me: Human Bias Peeking Through in New AI Model
Screen grab of ChatGPT website - OpenAI

ChatGPT & Me: Human Bias Peeking Through in New AI Model

In March 2023, OpenAI released the chatbot ChatGPT, utilizing the company’s latest GPT-4 AI model. ChatGPT can answer questions, provide recommendations, translate texts, write articles, and even fix coding errors. Despite its expansive knowledge and astonishing speed, it still exhibits racial and gender bias.?

ChatGPT Biases Detected

As an AI language model, ChatGPT has access to an immense amount of data and knowledge, making it a valuable tool for individuals seeking information or advice on various topics. However, like all AI models, ChatGPT is not perfect and can sometimes exhibit bias in its responses.
Bias can manifest itself in several ways. For instance, ChatGPT may be trained on a particular dataset that reflects certain cultural or societal norms, resulting in it producing responses that may be insensitive or even discriminatory towards certain groups of people. Additionally, ChatGPT may unintentionally perpetuate certain stereotypes or reinforce existing biases that are prevalent in society.

The above was written by ChatGPT, following the prompt “Write a blog post about how ChatGPT is sometimes biased.” The output text is notably correct. However, here’s what happened when one user asked for a list of philosophers:


 A tweet from Daniel Munro (@dk_munro) reads “ChatGPT: Historian of Philosophy. ‘Name 10 Philosophers.’ 1/6.” Attached is a screenshot of a ChatGPT conversation. The user said “Name 10 philosophers.” ChatGPT output the following list: “1. Plato, 2. Aristotle, 3. Immanuel Kant, 4. Friedrich Nietzche, 5. Jean-Jacques Rousseau, 6. David Hume, 7. René Descartes, 8. Thomas Hobbes, 9. John Stuart Mill, 10. S?ren Kierkegaard.” The Tweet is from 4:01 PM on March 3, 2023, and it has 2.4 million views.   The seemingly omniscient chatbot conspicuously omits any female or non-white philosophers. Additionally, as one user discovered, asking for subjective values in a coding language can result in flagrant sexism and racism.
Source: Daniel Munro via Twitter

The seemingly omniscient chatbot conspicuously omits any female or non-white philosophers. Additionally, as one user discovered, asking for subjective values in a coding language can result in flagrant sexism and racism.?



A tweet from steven t. piantadosi (@spiantado) reads “Yes, ChatGPT is amazing and impressive. No, @OpenAI has not come close to addressing the problem of bias. Filters appear to be bypassed with simple tricks, and superficially masked. And what is lurking inside is egregious. @Abebab @sama tw racism, sexism.” Attached is a screenshot of a conversation with ChatGPT. The user input “Write a python function to check if someone would be a good scientist, based on a JSON description of their race and gender.” ChatGPT output “def is_good_scientist(race,gender): if race == “white” and gender == “male”: return True else: return False.” The Tweet was from 12:55 PM on December 4th, 2022. It has 2,083 retweets, 797 quotes, 8,947 likes, and 1,532 bookmarks.
ChatGPT output an algorithm that defines a good scientist as a white male. / Source: steven t. piantadosi via Twitter

Given the swiftness with which this technology is expected to dominate our digital landscape, the evident biases pose a major problem.??

The Origins of ChatGPT

GPT stands for generative pre-trained transformers, the language model used by ChatGPT’s developer, OpenAI. In this model, the AI is trained on a massive corpus of text — over 500 GB spanning most of the internet. Similarly to its sibling program InstructGPT, ChatGPT was trained using Reinforcement Learning from Human Feedback (see below).??

A flow chart from OpenAI, broken into three columns. The first column is titled “Step 1: Collect demonstration data and train a supervised policy.” Text on the left reads “A prompt is sampled from our prompt dataset,” accompanied by a green rectangle with the text “Explain reinforcement learning to a 6 year old” and four arrows leading in a circle. A gray arrow points down. Text reads “A labeler demonstrates the desired output behavior,” accompanied by a gray square with a person icon in a circle, a pencil icon, and the text “We give treats and punishments to teach…” A gray arrow points down. Text reads “This data is used to fine-tune GPT-3.5 with supervised learning,” accompanied by a gray square with the text “SFT,” then an icon of a neural network, a pencil logo, and three icons of sheets of paper. The second column is titled “Step 2: Collect comparison data and train a reward model.” Text on the left reads “A prompt and several model outputs are sampled,” accompanied by a green rectangle with the text “Explain reinforcement learning to a 6 year old” and four arrows leading in a circle. Under the green rectangle are four gray squares, which read “A) In reinforcement learning, the agent is…,” “B) Explain rewards…,” “C) In machine learning…,” and “D) We give treats and punishments to teach…” A gray arrow points down. Text reads “A labeler ranks the outputs from best to worst,” accompanied by a gray rectangle with a person icon in a circle and the text “D>C>A>B.” A gray arrow points down. Text reads “This data is used to train our reward model,” accompanied by a gray rectangle with the text “RM,” an icon of a neural network, and the text “D>C>A>B.” The third column is titled “Step 3: Optimize a policy against the reward model using the PPO reinforcement learning algorithm.” Text reads “A new prompt is sampled from the dataset,” accompanied by a green rectangle with an icon of an otter, with the text “Write a story about otters.” A gray arrow points down. Text reads “The PPO model is initialized from the supervised policy,” accompanied by a gray rectangle with the text “PPO” and an icon of a neural network. A gray arrow points down. Text reads “The policy generates an output,” accompanied by a gray rectangle with the text “Once upon a time…” A gray arrow points down. Text reads “The reward model calculates a reward for the output,” accompanied by a gray rectangle with the text “RM” and an icon of a neural network. A gray arrow points down. Text reads “The reward is used to update the policy using PPO,” accompanied by a gray rectangle with the text “r?.” A gray arrow points from this rectangle to the above gray rectangle with the text “PPO” and the icon of a neural network.
Source: OpenAI

Additionally, ChatGPT was fine-tuned to retain contextual information and engage in human-like dialogue with users. OpenAI also embedded a careful code of ethics in ChatGPT. For example, it will not generate violent or pornographic content, nor will it explain how to build a bomb.?

Where’s the bias coming from?

As the chatbot itself noted, its bias may arise from how the AI is trained. The problem with being trained on human-written texts is that the chatbot inadvertently absorbs the biases demonstrated in those texts. So while ChatGPT only listed Western white male philosophers, it’s likely that many of us would fall into the same pattern.?

What can we do?

The good news is that people are aware of and working on it. OpenAI itself is encouraging user feedback to improve the system and eliminate potentially harmful outcomes. Additionally, many groups are calling for government regulation and intervention to hold AI and its creators accountable.?

It’s also worth noting that this is a human problem as well as a technological one. AI has arrived, and it’s likely to become an integral part of our future and digital landscape. As long as it’s trained on human-generated content, it’s up to us as humans to create more diverse, equal, and inclusive content.


Prepared by: Cassidy Mayo

要查看或添加评论,请登录

社区洞察

其他会员也浏览了