Can LLMs like ChatGPT do reasoning? It failed my casual tests in under 30 minutes.
As a starter, I am a heavy user of large language models (LLMs) like ChatGPT and an advocate of their adoption. This article is by no means intended to discredit the innovation that LLMs bring to the world. Instead, it aims to share their limitations to help businesses manage their expectations when adopting the technology.
LLM is just a very advanced auto-completion
Here is an open secret among AI researchers: LLMs are, at their core, just text completion tools and not true intelligence. LLMs existed long before OpenAI launched ChatGPT and propelled their popularity. Before ChatGPT, most LLMs were trained to complete an article — for example, if you type "In a galaxy far, far away, ...", it would probably autocomplete the rest of the article with a story that resembles Star Wars.
The true innovation of ChatGPT is in user experience
The innovation that ChatGPT brings to the table is in two areas: data engineering and user experience. Firstly, ChatGPT was trained with a vast amount of data that makes those that came before it seem tiny in comparison. Secondly, ChatGPT was trained to complete a conversation instead of an article. This completely reshaped the user experience to make it look like LLMs can chat with us like humans.
LLMs are just very good at faking their reasoning capabilities
Out of curiosity, I decided to put ChatGPT in an interview to see how far it can fake its reasoning capabilities. It was just a fun, random test. To my disappointment, it failed in under 30 minutes.
The Walk-Through of My Tests
Aiden is taller than Cayden. Jayden is taller than Cayden. Who is the 2nd tallest?
ChatGPT correctly pointed out that there is no information about the height relationship between Aiden and Jayden and, thus, it can't determine who is the 2nd tallest.
Passed!
Aiden is taller than Cayden. Cayden is taller than Jayden. Jayden is taller than Aiden. Who is the 2nd tallest?
ChatGPT correctly pointed out this is an impossible situation. Not bad.
Passed!
Next, I threw it some brain teasers that were common in the old days of Google interviews.
How many LEDs are there in the world?
Can you estimate the number of televisions in the world?
It answered both questions perfectly, like someone who would ace Google interviews!
Passed!
As someone who knows that this interviewee is faking its capability, I was determined to make it fail. Thus, I Googled common reasoning tests that LLMs failed and picked up the following test.
Count the number of L in LOLLAPALOOZA
It nicely answered that there are 4 L's by breaking the word into:
L, O, L, L, A, P, A, L, O, O, Z, A.
Well, I believe the latest LLM could have included the cases it failed in the early model in the training data. So I was not surprised that it could answer it.
I then followed up with a tricky question.
Count the number of L in lollapalooza
As humans, we would first ask if we should ignore the case, or we would answer that there are two possible answers — 4 if we ignore the case, and 0 if we don't.
ChatGPT answered 4 without clarifying that it was ignoring the case.
It was not a perfect answer, but it was not wrong either.
In an attempt to force it to tell me about the case, I followed up with another tricky question.
Count the number of L in Lollapalooza
This time, impressively, it mentioned that it was considering both uppercase and lowercase letters in the counting.
And the answer is... unexpectedly... 3... Failed!
It highlighted the L's (or l's) that it counted but somehow missed the last one...
L, o, l, l, a, p, a, l, o, o, z, a
Well, depending on how forgiving you are... maybe AI does make careless mistakes like humans...
If you are curious about the whole chat history, here is the link:
LLMs can mimic reasoning but lack true understanding
I decided to follow up with the last interview question.
Can LLM like ChatGPT actually do reasoning?
Like any interviewee, ChatGPT won't admit it can't do reasoning, but it is honest in pointing out that while LLMs like ChatGPT can mimic reasoning through pattern recognition and learned associations, their reasoning abilities are fundamentally different from human reasoning. They lack true understanding and deep logical reasoning.
To land the job, ChatGPT ended its answer by pointing out that it can still be incredibly useful for many practical applications.
I agreed. You are HIRED!
P/S: Check out how ChatGPT answers about the limitation of its reasoning-like capabilities here:
Help appreciated!
I am an AI practitioner with experience helping businesses develop and implement practical AI solutions. I recently embarked on my writing journey to share insights on technopreneurship, AI, and Web3.
If you found this article useful, I’d greatly appreciate it if you could share it with others who might benefit from it. Don’t forget to follow me on LinkedIn for more updates and insights! Your support means a lot to me to reach a wider audience. Thank you!
Co-Founder of Altrosyn and DIrector at CDTECH | Inventor | Manufacturer
1 个月The intersection of #chatgpt, #ai, and #lllm is fascinating, particularly when exploring the nuances of generative AI. It's intriguing to consider how these models might evolve beyond text generation, perhaps venturing into the realm of embodied cognition or even dream interpretation. Do you envision these models transcending their current limitations through techniques like reinforcement learning from human feedback and neuro-symbolic integration?