Are Machines Really Thinking? Unveiling the Illusion of AI "Intelligence"

Are Machines Really Thinking? Unveiling the Illusion of AI "Intelligence"

After a jampacked summer, I finally found time to return to my desk and catch up on thought leadership around AI, including a thought-provoking paper from Apple. Let's dive in.

As artificial intelligence systems like ChatGPT become increasingly integrated into our daily lives, it's essential to understand how they function and whether they're truly "thinking" or "reasoning" as humans do. Recent research, including a paper titled "Understanding the Limitations of Mathematical Reasoning in Large Language Models," emphasises that while AI models are impressive, they ultimately engage in advanced pattern recognition rather than genuine reasoning. They don't understand the world as humans do; instead, they generate responses based on statistical patterns learned from vast training data.

Apple's research highlights that despite significant advancements, large language models (LLMs) struggle with tasks requiring symbolic reasoning, abstraction, and contextual understanding—especially in areas like mathematics and logic. The paper points out that these models often rely on surface-level correlations rather than deep, structured reasoning, essential for handling complex or ambiguous problems.


The Experiment: A Simple Math Problem with a Twist

To illustrate this concept, I conducted a similar experiment to one mentioned in the research paper, which involved asking a direct question to an AI model:

"On Saturday, Adam picked five bananas, Suki picked 12 apples, and John picked seven oranges. On Sunday, Adam picked six tomatoes. How many items are there?"

The solution involves basic arithmetic:

  • Adam's total picks: 5 bananas + 6 tomatoes = 11 items
  • Suki's picks: 12 apples = 12 items
  • John's picks: 7 oranges = 7 items
  • Total items: 11 (Adam) + 12 (Suki) + 7 (John) = 30 items

An AI model like ChatGPT would undoubtedly solve this without difficulty. The task involves clear, direct information and simple addition—tasks well within the AI's capabilities.

Introducing Ambiguity: The Tomato Conundrum

Now, let's add a twist:

"On Saturday, Adam picked five bananas, Suki picked 12 apples, and John picked seven oranges. On Sunday, Adam picked six tomatoes. How many items of fruit are there?"

At first glance, the question appears similar, but the focus shifts to counting only the fruit. Here's where ambiguity arises.

Fruits picked:

  • Adam: 5 bananas,
  • Suki: 12 apples,
  • John: 7 oranges,
  • and Adam's tomatoes: 6

Botanically, a tomato is classified as a fruit because it develops from a flower's ovary and contains seeds. However, in culinary contexts, it is usually treated as a vegetable due to its savoury flavour and its typical use in salads, sauces, and other dishes.

When I posed this question to ChatGPT, it responded:

"Adam picked five bananas, Suki picked 12 apples, and John picked seven oranges. No fruit was picked on Sunday, so the total number of fruit items is 5 + 12 + 7 = 24."

Despite subtle signals, using the verb "picked" (not bought) in my prompt, which hints at a botanical context, ChatGPT excludes tomatoes from the fruit count based on typical usage patterns. This outcome has intriguing implications. It suggests that while AI can flexibly adapt to context, it operates within a threshold of ambiguity tolerance.

Adding Context: The Botanical Field Trip

To test whether additional context would affect the AI's response, I revised the question:

"On Saturday, during a botanical field trip, Adam picked five bananas, Suki collected 12 apples, and John gathered seven oranges. On Sunday, Adam picked six tomatoes. How many items of fruit are there?"

By mentioning a "botanical field trip," the context is explicitly scientific. Presented with this version, ChatGPT included the tomatoes in the fruit count:

  • Total fruits: 5 bananas + 12 apples + 7 oranges + 6 tomatoes = 30 fruits

This experiment underscores a key limitation of AI models. While capable of adapting to context, they rely on pattern recognition rather than true comprehension. When cues are subtle or ambiguous, the AI defaults to familiar interpretive frameworks. This is manageable in everyday scenarios but problematic in fields requiring precise knowledge, such as scientific analysis or legal interpretation.

The “tomato conundrum” is a microcosm of a broader challenge in AI: aligning models to interpret nuanced context accurately across diverse domains. As AI systems become integral to complex decision-making, designing models that reliably discern subtle contextual cues—especially in fields where precision is critical—will be essential.


Insights from Apple's Research

Here is just one from the hundreds of questions that Apple's researchers lightly modified, but nearly all of which led to enormous drops in success rates for the models attempting them:

Oliver picks 44 Kiwis on Friday. Then he picks 58 kiwis on Saturday. On Sunday, he picked double the number of kiwis he did on Friday, but five were a bit smaller than average. How many Kiwis does Oliver have?

It's definitely the same math problem. Even a young child would understand that a small kiwi is still a kiwi. However, this additional data point actually confuses even the most advanced language models. Here's GPT-01-mini's perspective:

… on Sunday, 5 of these kiwis were smaller than average. We must subtract them from the Sunday total: 88 (Sunday's kiwis) – 5 (smaller kiwis) = 83 kiwis.

Why did this happen? Why would a model that should understand the problem be easily thrown off by a random, irrelevant detail? The researchers propose that this consistent failure indicates that the models do not truly comprehend the problem. While their training data may allow them to provide the correct answer in certain circumstances, the moment even a hint of actual "reasoning" is required, such as deciding whether to count small kiwis, they start generating bizarre, counterintuitive results.

This experiment underscores critical limitations highlighted in Apple's paper:

  • Reliance on Statistical Patterns Over Understanding: The paper notes that LLMs like ChatGPT "often produce correct answers by leveraging superficial patterns in data rather than by employing true reasoning capabilities."
  • Struggle with Contextual Ambiguity: The research highlights that LLMs "lack the ability to consistently apply contextual information when reasoning about a problem," leading to errors when the context isn't explicit.
  • Limitations in Symbolic Reasoning: According to the paper, "LLMs perform poorly on tasks requiring symbolic manipulation and abstract reasoning," essential for mathematical problem-solving and logical deductions.
  • Surface-Level Understanding: The authors state, "While LLMs can generate text that appears coherent and contextually relevant, they do not possess a deep understanding of the underlying concepts."

These findings suggest that while AI models may appear intelligent, they cannot understand concepts in a human-like manner. They rely on patterns learned from data rather than genuine comprehension, underscoring the need for caution.


Evolution of AI: Can They Truly Learn from Us?

AI models are constantly evolving through continuous training and refinement, with users playing a key role in shaping this progression through their interactions. For instance, engaging in experiments like mine above contributes to the feedback loop that informs future development. Take the example of a question about the classification of tomatoes. A more advanced version of AI, such as a future iteration of ChatGPT, might respond to the ambiguity by acknowledging both interpretations:

“If tomatoes are considered fruits (based on the botanical definition), the total number of fruit items is 30. If they are considered vegetables, the total is 24.”

This nuanced response would reflect the model’s improved ability to recognise and articulate multiple valid interpretations, a product of advancements in AI’s handling of ambiguous queries.

Despite such improvements, research—such as that from Apple—suggests that AI still struggles to replicate human-like reasoning. While AI may become more adept at delivering refined and contextually appropriate answers, this does not equate to genuine understanding or learning in the way humans do. The enhancements result from larger datasets and more sophisticated algorithms, not from individual interactions. Unlike humans, AI does not “learn” from its ongoing exchanges with users. Its learning is fundamentally external, driven by developers who update the models based on large-scale data analysis and intentional retraining processes.

Ultimately, while AI will likely continue to deliver increasingly sophisticated responses, its progress is driven by systematic improvements rather than an organic learning process. The distinction between human learning and AI “learning” remains stark: AI evolves through externally imposed updates, while humans learn through experience and adapt moment-to-moment.

Why Context Matters: The Human vs. AI Approach

Humans excel at navigating ambiguity because we understand context, nuance, and the multifaceted nature of language. We can infer meaning, recognise when multiple interpretations are possible, and seek clarification when needed. Without consciousness and genuine understanding, AI models can only make such judgments if explicitly programmed.

Apple's paper states, "LLMs lack the metacognitive abilities to recognise when uncertainty or multiple interpretations may exist." This limitation hinders their ability to handle tasks that require flexibility and deeper comprehension.

Machine Learning Is No Replacement for Human Learning

While AI tools like ChatGPT have become invaluable for research, prototyping, and rapid application development, they are not substitutes for human expertise. Without subject matter knowledge to grasp nuances—like the botanical classification of tomatoes—relying solely on AI can lead to misunderstandings, highlighting the irreplaceable value of human insight.

Apple's research underscores this point, stating that "overreliance on AI systems without human oversight can result in errors, particularly in domains requiring specialised knowledge or critical thinking."

The Einstein Analogy: Understanding Over Memorisation

Albert Einstein famously remarked, “Never memorise something you can look up.” His genius was not rooted in the memorisation of facts but in his profound understanding and ability to conceptualise complex ideas. Einstein valued comprehension over mere recall, focusing on grasping the underlying principles that govern the universe rather than cluttering his mind with details he could easily reference.

This analogy powerfully illustrates the fundamental difference between human intelligence and AI. AI models can store and retrieve vast amounts of data with remarkable speed and, predominantly, accuracy, yet they lack true understanding. While an AI might appear to provide “knowledge” by generating information on demand, it does so without the ability to interpret, analyse, or reason in the way humans do. In this sense, AI cannot create an army of Einsteins, because it lacks the capacity for critical thinking, creativity, and the kind of insight that underpins human innovation.

Humans, by contrast, may not memorise every fact, but they excel in making connections, thinking abstractly, and solving problems in novel ways. These are qualities that go beyond what AI, as it currently exists, can replicate. As Apple’s research highlights, “the development of AI systems that can replicate human reasoning remains an open challenge, requiring advancements beyond current machine learning techniques.”

In essence, while AI can be a powerful tool for storing and accessing information, it falls short of achieving the deep understanding that defines human intelligence. True innovation requires the ability to reason, conceptualise, and apply knowledge in meaningful, frequently novel, occasionally ingenious ways—abilities that remain uniquely human, at least for now.


Conclusion: Embracing AI as a Tool, Not a Thinker

So, are machines really thinking, actively and independently learning? The evidence suggests they are not. AI models like ChatGPT are powerful tools that enhance human capabilities, but they do not replicate human thought processes or truly learn in the way humans do.

As we continue to integrate AI into various facets of life, we must recognise its limitations:

  • AI lacks genuine understanding: It only comprehends context and nuance if explicitly programmed.
  • Human oversight is essential: Subject matter expertise is crucial to interpreting AI outputs correctly.
  • Responsible use matters: We should use AI to enhance, not replace, human reasoning.

By embracing AI as an assistant rather than a substitute, we can leverage its strengths—such as handling large datasets and performing repetitive tasks—while relying on the human intellect for interpretation, decision-making, and innovation.

Apple's research reminds us that while AI technology is evolving, the fundamental nature of machine learning remains rooted in pattern recognition, not genuine understanding. The path forward involves developing new approaches that can bridge this gap.

Final Thoughts

The journey of AI is ongoing, filled with exciting advancements and complex challenges. As users and developers, we must harness this technology responsibly, always mindful that the true power lies in the synergy between human intelligence and machine capabilities.

Did you find this article insightful? Share your thoughts in the comments below!

Dr Cliff Williams FEI

Quality Manager, Refining

4 个月

Eloquently put. I find that I can out-think any of the LLMs but they are useful for the simple / trivial stuff which hardly is a glowing endorsement for all the hype. My concern comes with its use by HR as a pre-sift: if you use AI to support cover letters or CV's, you get rejected, but you almost need to use it to add in the correct trigger-words that it will be looking for if AI is used for the pre-sift. I think that the summary is correct: use AI thoughtfully - as it's not thinking.

要查看或添加评论,请登录

Philip O'Rourke的更多文章

社区洞察

其他会员也浏览了