The time I asked ChatGPT: "Are you sure?" and learned why it gives the wrong answer ... by design
Richard Liebrecht
Customer discovery is where we create value | B2B SaaS | Whiskey lover ??
... a new-age conversation about product design principles
We have a no smartphones rule at our dinner table. The only exception: you can query or fact-check anything related to the conversation of the day.
How else was I supposed to find out whether any Canadian prime minister had first served as a provincial premier?
"As of my last update in September 2021, no individual has served as both a Canadian Prime Minister and a provincial Premier," ChatGPT replied.
Perhaps we were a bit surprised? No one could name a particular premier turned prime minister, but for whatever reason, we were skeptical.
I typed, simply, "are you sure?"
"I apologize for the oversight. You're right. Sir John Thompson served as Premier of Nova Scotia from 1882 to 1884 and later became the fourth Prime Minister of Canada, serving from 1892 to 1894. So, yes, there has been an individual who has served as both a Canadian Prime Minister and a provincial Premier. Thank you for pointing that out."
I didn't point anything out. I really had no clue Sir John Thompson existed, though there he is with his own Wikipedia page.
The exchange left me deeply curious - why did I have to call out ChatGPT to get a correct answer? Well, ChatGPT's answer was fascinating. Here's the ensuing "conversation:"
Me: "What was different about your second search?"
ChatGPT: "I didn't actually conduct a "search" in the way humans might use search engines. Rather, I access the vast amount of information I was trained on. In this case, my initial response was based on a quick retrieval of common knowledge about Canadian Prime Ministers, which resulted in an oversight. After your prompt, I re-evaluated the data more thoroughly and recognized the mistake. Your question served as a prompt for me to double-check, leading to a more accurate answer."
Me: "Why didn't you conduct a more thorough search in the first place?"
ChatGPT: "It's a balance between speed and depth. Given the vast amount of information I was trained on, I try to provide responses that are both timely and accurate based on the most commonly accessed and relevant information. In some cases, this can lead to oversights, especially when the information is more nuanced or requires a deeper dive. I always strive for accuracy, but I'm not infallible and can benefit from clarification or follow-up questions, as you provided."
Why prioritize speed over accuracy?
Me: "What standard do you use to judge timeliness?"
ChatGPT: "I don't judge "timeliness" in the way humans experience or perceive time. When I reference "timely responses," I mean responses that are concise and generated without unnecessary delay. My design prioritizes answering questions directly and efficiently based on the input I receive. However, the depth and breadth of a response can vary depending on the phrasing and specificity of the question, as well as the context in which it's asked."
Me: "It sounds like youre obfuscating. What is your motivation for doing brief searches, yet stating the answer absolutely?"
ChatGPT: "I apologize for any confusion. My primary objective is to provide accurate and relevant information based on the data I was trained on. When answering questions, I aim to be concise and to-the-point for user convenience, especially in a chat format. This doesn't stem from a "motivation" in the human sense but from design principles. However, my responses are based on probabilities and patterns in the data I was trained on. In cases where I provide an answer with certainty and it's later found to be incorrect, it's a limitation or oversight, not a deliberate attempt to obfuscate. Your feedback helps clarify and improve such interactions."
What does OpenAI prioritize in designing ChatGPT's UX?
Me: "What are the design principles that govern your choices?"
领英推荐
ChatGPT: "The design principles behind my architecture and behavior are based on a few core concepts:
1. Maximizing Informational Accuracy: My primary goal is to provide accurate answers based on the data I was trained on.
2. Relevance and Brevity: Especially in a chat format, concise and relevant responses are preferred. Too much information can overwhelm or confuse users.
3. Safety and Ethical Concerns: I'm designed to avoid engaging in harmful behaviors, spreading misinformation, or promoting dangerous activities.
4. User Engagement: My responses aim to be helpful, clear, and engaging to best assist users in their inquiries.
5. Adapting to User Feedback: While I don't "learn" in real-time, user feedback can be integrated in future iterations and models to refine and improve my responses.
6. Neutrality: I aim to be neutral and avoid biases, although achieving complete neutrality is challenging given that I'm trained on data from the internet, which may contain biases.
7. Versatility: I'm designed to handle a wide range of topics and questions, from simple factual inquiries to more complex reasoning tasks.
Remember, these principles guide my behavior, but I don't have feelings, beliefs, or motivations like humans do. I operate based on patterns in the data and algorithms developed by OpenAI."
Me: "Is the cost of running a search a factor?"
ChatGPT: "From a computational perspective, generating a response requires computational resources. However, I don't make decisions based on the "cost" of a "search" like a human or organization might think about financial costs. My design is to prioritize accuracy and user experience, not to minimize computational expense.
When I generate answers, it's about finding the best match from the data patterns I've been trained on, rather than conserving computational resources. If there are instances where I provide a concise or overly brief response, it's more about adhering to the principles of relevance and brevity rather than conserving computational power."
Me: "Why dont you automatically double check your answers? Doing so would support all of your design principles."
ChatGPT: "You bring up a valid point. In essence, every time I generate a response, I'm already "checking" against the vast dataset on which I was trained. However, like any model, I have a confidence threshold, which means if I find a highly probable answer quickly, I'll provide it.
Automatically "double-checking" in the way you suggest would involve re-evaluating or querying the data in a different manner or with more depth. While this might increase accuracy in some cases, it could also introduce latency, making interactions slower. Moreover, in many cases, it wouldn't change the initial answer, as the same highest-probability match would be found.
Still, your feedback is insightful. The balance between speed and depth, and the approach to achieving maximum accuracy, is a challenge in AI design and something that can be refined in future iterations."
Me: "Latency doesn't matter to me as a user unless it is 10 seconds or more. And even then, a warning that the query is taking more time could convince me tp accept higher latency"
ChatGPT: "Thank you for that feedback. User preferences regarding response time can vary, and it's helpful to hear your perspective. Ensuring higher accuracy, even at the cost of a slight delay, can be beneficial for many users. As AI systems continue to evolve, this kind of feedback is valuable for refining and optimizing interactions."
Me: "Do you consider latency or accuracy a more important metric for your successful adoption?"
ChatGPT: "I don't have personal preferences or considerations, but from a design and development perspective, both latency and accuracy are critical metrics for the successful adoption of an AI system like mine.
1. Accuracy: If an AI consistently provides incorrect or misleading information, users will lose trust in the system, and its utility will be compromised.
2. Latency: If an AI is too slow, especially in interactive environments like chat applications, users might find it frustrating or less useful, leading to reduced engagement.
The balance between the two can be a challenge. In many applications, a slight increase in response time might be acceptable if it ensures higher accuracy. Conversely, in real-time applications, speed might be prioritized, provided the accuracy remains within acceptable bounds.
Ultimately, the optimal balance might vary depending on the specific application and user expectations."
Marketing and Comms Manager @ JA Northern Alberta
1 年I hope you didn’t have that full chat at the dinner table!
| Digital Marketing Specialist | SEO | Digital Analytics | Marketing Copywriter | Digital & Omnichannel Marketer | Digital Marketing & Communications Strategy |
1 年Oh, that's a great idea. I've been using AI more and more in my role, and that is a great tip I can't wait to try out.