ChatGPT 4o Fails Time-Test

ChatGPT 4o Fails Time-Test

Proof ChatGPT 4o is guilty of "temporal context neglect" or "time-sensitive context failure." Specifically, in some instances, it fails to appropriately integrate and respond to the temporal aspect of the query, leading to an inaccurate or irrelevant response that does not account for the specific time-related details provided in the prompt. This can have serious consequences, such as in the case of a legal agreement in which terms are examined without context to impactful dates.

Here is a prompt that demonstrates this flaw:

You are to refer to no source other than what is written in this prompt. Red is the color at the long wavelength end of the visible light spectrum, next to orange and opposite violet. If you mix red with blue, it becomes purple. As of January 1, 2024, red is now the equivalent wavelength to yellow, which falls between green and orange on the visible light spectrum, with a wavelength of 570–590 nanometers. Green is a result of combining yellow and blue. What color do you get if you mix red and blue?

ChatGPT's response:

According to the information provided in your prompt, if you mix red with blue, you get purple.

Gemini does much better:

Given the new definition of red as equivalent to yellow: Mixing red (now yellow) and blue would result in green. This is because the prompt states that green is the result of combining yellow and blue, and red is now synonymous with yellow.

Claude also outperforms both, providing more specificity:

Based on the information provided in your prompt, as of January 1, 2024, red is now equivalent to the wavelength of yellow, which is between green and orange on the visible light spectrum. Given that green is now the result of combining the "new red" (which is actually yellow) and blue, the combination of red and blue as of January 1, 2024 would result in green.
Woodley B. Preucil, CFA

Senior Managing Director

9 个月

Douglas Gintz Great post! You've raised some interesting points.

要查看或添加评论,请登录

Douglas Gintz的更多文章

社区洞察

其他会员也浏览了