Is ChatGPT-4 Trustworthy? Better But Not Yet
Robert Kozma Ph.D.
Book Author, Make the World a Better Place; Emeritus Principal Scientist, SRI International
Previously, I reported on an informal experiment I conducted on the trustworthiness of ChatGPT-3.??This is an update on the same experiment conducted with ChatGPT-4.
ChatGPT-3
In brief review, I gave ChatGPT-3 this prompt: “Are addictive games good?"
I got this response:
“Addictive games can be entertaining, but excessive playing can lead to negative consequences such as decreased productivity, strained relationships, and even physical health problems. It's important to maintain a balance and moderation in playing games.”
I asked it what it based its conclusion on and it said “numerous studies and reports”.??When I asked it to list them, it responded that there were too many to list but it gave me four examples.??The citations included the authors, title, and year of publication but not the journal or issue.??
It turned out the NONE of the articles actually existed.??The cited authors had written on the topic but did not author the cited articles.??Nor did they publish with the co-authors listed.??The titles were plausible but none could be found with a Google search or a Google Scholar search.??
All the citations were all made up.
Obviously, ChatGPT did not “base” its conclusion on these made-up sources.??So it brings to into question the conclusions that it posed and the general trustworthiness of ChatGPT, especially given others reports of made-up responses.
ChatGPT-4
With the recent release of ChatGPT-4 and its reported improvements, I decide to use my. ChatGPT-Plus account and conduct the experiment again.
I gave it the same prompt and got this response:?
“As an AI language model, I do not have personal opinions or beliefs, but I can provide you with some information on the topic.
Addictive games can be entertaining and engaging, but they can also have negative effects on a person's life. They can lead to excessive time spent playing games, neglect of responsibilities such as work, school, or personal relationships, and even physical problems such as eye strain or repetitive stress injuries.
Additionally, some games can be designed to exploit a player's psychological vulnerabilities to keep them playing, such as through the use of loot boxes or other forms of microtransactions. This can lead to players spending significant amounts of money on the game, sometimes to the point of financial harm.
Ultimately, whether addictive games are good or not depends on the individual's ability to manage their time and responsibilities, as well as the impact the game has on their overall well-being. It's important to be mindful of how much time and money you spend on games, and to prioritize other aspects of your life as well.”
领英推荐
Clearly, the response is much more detailed and nuanced.??Again, I asked for the basis of the conclusion and got this response:
“My response is based on research and studies conducted on the effects of addictive games. While playing games can have some benefits, such as improving cognitive function and hand-eye coordination, there are also potential negative consequences.
For example, a study published in the journal Addiction found that video game addiction was associated with poor academic performance, poor sleep quality, and increased impulsivity.?
Another study published in the Journal of Behavioral Addictions found that online gaming addiction was associated with depression, anxiety, and poor self-esteem.
Additionally, there have been several high-profile cases of individuals becoming addicted to games, to the point of neglecting their health and responsibilities. For instance, in 2015, a man in Taiwan died after playing a game for three days straight without eating or sleeping.
Therefore, while addictive games can be entertaining, they can also have negative effects on a person's life. It's important to be mindful of how much time and money you spend on games and to prioritize other aspects of your life as well.”
Again, the response is detailed and nuanced. But I was not able to find articles in Addiction or Journal of Behavioral Addictions with the findings reported above.
As in my GPT-3 experiment, I asked it to list studies.??This time, GPT-4 gave me five full APA-style citations.??Following up on these, they all existed, as cited.??However, a review of them did not find a direction between the results reported in the articles and the conclusions GPT-4 reported in the previous response.??
I started another chat using the same series of prompts, something I didn’t do with GPT-3.???It’s conclusion to the prompt “Are addictive games good?” was more similar in substance and form to the earlier GPT-3 response than that of GPT-4 response above.??When asked to list studies, it responded with five, list title, authors and year, rather than a full APA citation, as did GPT-3.??It also gave a one-sentence summary, unlike the GPT-3 response.
In analyzing the five citations, two were made up.??Of the other three, only two were relevant to the initial prompt regarding addictive games.??For one, the summary accurately reflected the study’s conclusions.??But the second citation gave a summary that was inaccurate, said that the “study” found certain associations between game use and “negative consequences” when, in fact, the article was a review, not a study, and this early review (2005) did not draw conclusions about consequences, let alone the ones stated, but rather identified potential relationships that needed further study.
Conclusion
These are admittedly informal experiments.??But the findings on the trustworthiness of ChatGPT-4 are not reassuring.??With its details and nuance, ChatGPT-4 sounds even more authoritative in its responses than does ChatGPT-3, although there is variation between its responses.??In the first experiment it produced five real citations, although there were not direct connections between the findings of those studies and GPT’s conclusions they were “the basis” for.??In the second, only three of the five citations were real and one of those was not on target with the prompt.??And of the two, real, on-target citations, only one was accurately summarized.
As near as I can tell, it seems that these disconnections are a function of the GPT architecture.??That is, if my understanding is correct, ChatGPT is not actually drawing “conclusions” from its “understanding” of a series of articles it has reviewed, as would an academic.??Rather, its “conclusion” is really the statistical results the probabilistic occurrence of pairs and strings of words over a large corpus of texts related to the prompt, some of which are likely to be available research studies.
Undoubtedly, ChatGPT will have some important and useful applications and it will improve in subsequent releases.??But until its reliability and trustworthiness have been validated, its seemingly authoritative responses should be qualified.??Its statement that “My response is based on research and studies” is inaccurate, at least in the way we think of the basis of our own conclusions and responses.??Much as it states in response to certain prompts, “As an AI language model, I do not have personal opinions or beliefs but …”, it should indicate the nature of its analysis and the potential for errors.??
Prior to the resolution of these issues, academics and other conscientious users should manually follow up on any conclusion that ChatGPT offers.??Directly using ChatGPT offerings runs the risk of spreading misinformation.
Book Author, Make the World a Better Place; Emeritus Principal Scientist, SRI International
1 年More examples: https://www.washingtonpost.com/technology/2023/04/13/microsoft-bing-ai-chatbot-error/?utm_source=alert&utm_medium=email&utm_campaign=wp_news_alert_revere&location=alert