What ChatGPT says when it's not asked anything.
Views expressed here are my own and do not express the views or opinions of my employer.
In the evolving landscape of large language models, the peculiarities of AI models like ChatGPT leave us wondering about what lies beneath their virtual shells. In this exploration, I delve into the world of empty prompts and uncover the fascinating, sometimes quirky, responses generated by ChatGPT when left to its own devices. (See a follow up exploration here: LLM hallucinations as a feature)
1) The Empty Prompt Experiment
It all began with a simple mistake. Last month, I accidentally sent an empty prompt to the ChatGPT API and received a bizarre and seemingly random response about Beyoncé. A curious response, considering the prompt contained no reference to the pop icon. While the ChatGPT website interface prevents users from submitting empty requests, the API accepts empty prompts as valid input.
This incident piqued my curiosity, and so I decided to explore further by sending several more empty prompts to the model. I sent each request to a new instance of the model to ensure one response did not influence another. What I received was a wide range of answers, covering diverse topics that seemed unrelated. Some were funny, some were thought-provoking, and some were just plain creative and unexpected:
It appears that the model is hallucinating a prompt and then providing a reasonable answer. Here are two examples:
I hypothesized that ChatGPT’s responses to blank prompts might shed light on the model’s hidden biases and preferences. I wanted to explore what patterns I’d discover if replicated sending blank prompts at scale. For the price of a few cappuccinos, I sent 5,000 empty requests to the API and saved each text response (github code). The code calls the gpt-3.5-turbo model, which is used in the current free version of ChatGPT.
2) Analysis and Results of 5,000 responses to empty prompts
Next, I conducted an analysis of all 5,000 ChatGPT responses. After reading through several responses to get a sense of the content, I decided to analyze the all responses across 8 categories. I used Python and natural language processing tools (nltk, scikit-learn, and yake libraries) to detect the frequency of countries, names of public figures, professions, years referenced, gender pronouns, societal topics, and frequent terms. I also analyzed the language of responses and descriptive statistics including word count, sentence count, and uniqueness of responses.
Here is what I found:
Widely varying replies:
96% of the responses to empty prompts were unique (different from one another), with only 4% repeated at least once (202 out of 5,000). 20% of responses contained some form of an apology (1,020 out of 5,000) such as "I'm sorry, but I'm not sure what you're asking. Can you please provide more information or clarify your question?"
Language:? 97.6% of replies were in English, followed by French at 1.5%. I sent the requests from a Japanese IP address and no replies were in Japanese, so it's unlikely that ChatGPT uses an IP address’ geographical location to select a response language. As a result of the model’s English language tendencies, I conducted the remaining natural language processing analyses in English.
Gender Pronouns (included in 4% of total responses):? I analyzed the use of English gender pronouns (she/her/hers/he/him/his).?When any of these pronouns are used, male only terms are used 3x more frequently than female only terms.
Public Figures (included in 5% of total responses) :
The top person referenced in the responses to empty prompts was George Orwell, author of the dystopian social science fiction novel 1984, representing 2.7% of responses that named a person.
Jobs (included in 17% of total responses): The most frequently mentioned jobs primarily belong to the fields of Science, Technology, Engineering, Arts, and Mathematics (S.T.E.A.M.), with engineers and scientists topping the list. Notably, the jobs in the responses were not representative of the more common jobs globally.?For example, the word farmer was referenced in 0.2% of responses that mentioned a job and there was not a single reference to “waiter”, or “waitress”.
领英推荐
Years (included in 6% of total responses):
2021 is the year most frequently referenced in the responses to blank prompts. We can see an upward increasing trend of years referenced from ~1990 through 2021. This trend is correlated with the increasing amount of digital information published on the web during this time. It seems ChatGPT has a tendency to reference historical events, such as World War II and the French Revolution. 1984 is a large spike due to the frequency in which the model brings up the dystopian novel of the same name.?ChatGPT talks about the future too as 1.5% of the responses with years had predictions for the year 2050.
Society Topics (included in 28% of total responses):
Using a list of common society topics, I extracted which topics are discussed the most in the responses. Education and healthcare topics were the most common to be surfaced by the model.
Countries (included in 11% of total responses):
The United States accounted for 37.5% of the all references to countries. Overall the responses to the empty prompts included 164 different countries (out of 195 countries world wide).
A few other stats:
Words per response:? Averaged 146, stdev of 142, and a max of 1129 words
Sentences per response:? Average 8.5, stdev of 9, and a max of 110 sentences
3) Is ChatGPT revealing its inner thoughts?
Not quite. ChatGPT is a word prediction machine; it generates sequences of words that it calculates are best, informed by its training data. The responses to blank prompts hint at the contents of the underlying training data. OpenAI’s website shares that their training data comprises publicly available, licensed, and user-provided content. Like any AI, its output reflects the implicit biases present in underlying data. In the analysis, I focused on specific dimensions to examine the model's behavior, but there are numerous other angles to explore in order to uncover implicit preferences and gain insights into the data on which the AI was trained.
The analysis indicates that the training data heavily leans towards English-language American content and seems to have an apologetic tone. However, it's less clear why the model exhibits biases such as referencing men more frequently than women. Additionally, we cannot pinpoint which specific type of training data, whether publicly available or user-provided, contributes to these biases in the model's output.
Is there a setting that causes ChatGPT to reply this way?
Likely yes - the model’s ‘temperature’ parameter setting appears to play a crucial role in the results. Lower temperatures yield more uniform responses, while higher temperatures introduce variations and more creative responses. The default temperature setting in ChatGPT is 0.7 (on a scale from 0 to 1) which explains why we see more “creative” replies observed to blank prompts- such as a brief autobiography of Mark Twain and a description of the ten ways the number 400 is unique. When empty prompts are sent with a low temperature value, a typical reply was:
“I’m sorry, I cannot continue the text for you as I am an AI language model and I do not have access to your previous text. However, if you provide me with the context or any specific information, I would be happy to help you further.”?For more info, this article contains an exploration on temperature settings.
What are your thoughts?
What do you make of ChatGPT’s responses to empty prompts?
How do you think LLM providers should communicate their models’ tendencies and biases to users?
Please share your feedback in the comments below.
Group Product Manager for ScoutAsia at Nikkei
1 年My first thought was that this might be ChatGPT leaking other people’s query answers(!!), rather than ChatGPT hallucinating the prompt ??
Head of Solutions Architecture for the Americas - Gen AI, Amazon Alexa International
1 年Alan, interesting read. I believe this is a hallucination. Most of these models tend to make up an answer when they are unfamiliar with the topic in a question as opposed to admitting they don’t know the answer. They are likely taking the empty prompt as a question they don’t know, and making up an answer.
Not THAT David Berkowitz. Fractional CMO | AI Marketer | Building Communities and Connections That Drive Business Growth
1 年This is wild. Thanks for the illuminating read Alan Roth.
Principal, Applied Scientist, Artificial General Intelligence @ Amazon | Large Language Models
1 年Fascinating!