LLMs, Reasoning and Truth
Vijai PANDEY
Business Psychologist | Expert in Talent Assessment, Development & Management | Psychometrics | Leadership Assessment | People Analytics | Organizational Development
This week I stumbled on three papers—two research articles and one monograph with a very interesting title—and this occupied almost all my thinking space. As much as I enjoy thinking and talking (but somewhat hate writing), I promised myself that this time I would write. So here it is.
The three papers each explore different dimensions of communication, truth, and reasoning, and after reading them, I couldn’t help but see a narrative emerge. Each of them, in its own way, questions what it means to communicate meaningfully, how truth is handled, and where we might be going astray with technology. Let’s break it down.
The Nature of Bullshit and Its Ubiquity
Philosopher Harry Frankfurt’s famous monograph On Bullshit deals with a topic we all think we understand but rarely pause to analyze: bullshit. Frankfurt’s argument is that bullshit is a particular kind of misrepresentation. He differentiates between two forms of misrepresentation - Lie and Bullshit. Unlike lying, where there is a clear intent to deceive by distorting the truth, bullshit is much more indifferent to truth.
The bullshitter isn’t concerned with whether what they’re saying is true or false; they’re only interested in making an impression.
In this sense, bullshit is a kind of communication that focuses more on the presentation of the speaker than on the content of the communication itself .
This got me thinking—how often in professional and personal conversations are we merely making an impression without truly caring about the accuracy or depth of what we’re saying? In today’s fast-paced, content-heavy world, it’s incredibly easy to produce speech or text without genuine care for its truth. We might think of marketing jargon, political speeches, or even casual social media posts. The intent here is not to deceive necessarily, but to appear a certain way. This distinction between lying and bullshitting is key: liars are tethered to the truth in some way—they must know it to distort it. Bullshitters, on the other hand, are unmoored from truth altogether.
The most common examples of bullshitters in corporates are consultants - specially from big 4 - the more expensive the consultant the higher probability of bullshit.
Take, for example, the common workplace meeting. You’ve likely sat through presentations where you could tell the speaker wasn’t particularly invested in the content but more in making sure they looked competent and engaged. Whether the data was accurate or the points substantial didn’t seem to matter much. In these cases, the speaker wasn’t lying—they were “bullshitting” in Frankfurt’s sense.
Red the paper here if you wish to go deeper into it: https://www.journals.uchicago.edu/doi/abs/10.1086/498546?journalCode=et
Chatbots and the Rise of Bullshit in Technology
Now, here’s where things get even more interesting. The second paper I read, titled ChatGPT is Bullshit by Michael Hicks and colleagues, extends Frankfurt’s framework into the realm of artificial intelligence (AI). The authors argue that ChatGPT, and by extension other large language models (LLMs), are essentially bullshitters. This isn’t just because these models sometimes produce inaccuracies or factual errors, but because the entire architecture of the model is indifferent to the truth .
LLMs like ChatGPT are trained to generate text that sounds human-like, coherent, and contextually appropriate, but they do so without any concern for the truth of their statements. For example, when ChatGPT is asked a question about historical events or scientific facts, it might produce an answer that sounds plausible, but it has no mechanism for verifying whether the information is accurate. It’s simply stringing together patterns based on statistical likelihood. This process is not about truth, but about creating a certain impression—just like Frankfurt’s bullshitter.
Imagine a person who writes an essay for a college assignment using ChatGPT. The output may look sophisticated, with well-structured arguments and citations, but none of it may be true. In some cases, ChatGPT even fabricates references that don’t exist. Hicks and his colleagues argue that this isn’t a simple problem of “AI hallucinations” (as it’s commonly called); rather, it’s a fundamental feature of how these models operate. They are indifferent to truth, much like a person bullshitting in a conversation.
Hallucinations are not the anomaly or bug but a feature of LLMs.
This has serious implications, especially as AI is integrated more into professional and educational environments. If we begin to rely on systems that are indifferent to truth, we may find ourselves drowning in bullshit—statements that are neither true nor false, but exist solely to give the illusion of coherence and meaning. This, in turn, could erode trust in communication and decision-making processes, as we lose sight of the distinction between truth and mere appearance.
Imagine a manager at a mid-sized company, Alex, who is responsible for hiring new team members. Alex is pressed for time and decides to rely on an AI-driven hiring platform that uses large language models (LLMs) to assess candidates’ résumés and cover letters. This platform claims to provide an objective, data-driven analysis of each applicant, highlighting their strengths, potential fit with the company culture, and even predicting their future performance based on patterns it has “learned” from countless past hiring decisions.
One day, Alex receives an analysis for two applicants. The first, Sarah, has a background in data science, with solid academic credentials and relevant work experience. The second, John, has a somewhat scattered résumé but the platform has flagged him as a “high potential” candidate, claiming that his creativity and adaptability will make him a better long-term fit. The language in the report is persuasive and confident, much like human-generated feedback. After reading these summaries, Alex, without deeper scrutiny, decides to invite John for an interview based on the platform’s strong recommendation.
Here’s where the issue arises. The platform’s analysis of John’s résumé is, in essence, bullshit—not because it’s filled with lies, but because it is indifferent to the truth. The system is designed to create compelling, human-like responses, but it lacks genuine insight into John’s capabilities. Perhaps John’s résumé was formatted in a way that mimicked successful candidates, or certain keywords were matched, leading the system to rate him highly, even though a human would have easily spotted red flags, like inconsistent job history or exaggerated claims.
Read this article here: https://link.springer.com/article/10.1007/s10676-024-09775-5
Now, to understand better how and why this bullshit is a feature of LLMs read on about the third paper that I read.
The Fragility of Reasoning in AI a.k.a Reasoning vs Pattern Matching
The third paper, GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models, delves into the limitations of AI’s reasoning abilities, particularly in mathematics. The authors introduce the concept of “symbolic reasoning” and show that while LLMs may appear to reason like humans, they are fundamentally limited by their pattern-matching nature. When faced with mathematical problems that require logical reasoning, LLMs tend to fail, especially when the complexity of the problem increases .
What struck me about this study was the fragility of the models. Simple changes in the wording of a question, or the addition of a seemingly irrelevant clause, can drastically impact the model’s ability to arrive at the correct solution.
For example, consider a simple math problem like this one from the study:
Original question:
When Sophie watches her nephew, she gets out a variety of toys for him. The bag of building blocks has 31 blocks in it. The bin of stuffed animals has 8 stuffed animals inside. The tower of stacking rings has 9 multicolored rings on it. Sophie recently bought a tube of bouncy balls, bringing her total number of toys for her nephew up to 62. How many bouncy balls came in the tube?
In this form, an LLM might correctly solve the problem by recognizing that it needs to subtract the total number of existing toys (31 blocks, 8 stuffed animals, and 9 rings) from 62 to find how many bouncy balls were added. The model could compute:
62 - (31 + 8 + 9) = 14
Thus, Sophie bought 14 bouncy balls.
Altered question:
When Sophie watches her nephew, she gets out a variety of toys for him. The bag of building blocks has 31 blocks in it. The bin of stuffed animals has 8 stuffed animals inside. The tower of stacking rings has 9 multicolored rings on it. Sophie recently bought a tube of bouncy balls, bringing her total number of toys for her nephew up to 62. Also, she bought him a new puzzle recently. How many bouncy balls came in the tube?
In this second version of the question, the researchers added an irrelevant clause about a puzzle that doesn’t affect the math required to solve the problem. However, despite this being irrelevant to the solution, many LLMs struggle significantly with this version of the question. The added complexity causes the models to perform worse, as they fail to filter out the irrelevant information about the puzzle. In some cases, the model might incorrectly include the puzzle in its calculations, even though it shouldn’t, leading to an incorrect answer or hesitation in providing a clear response.
Result:
The study found that this seemingly small change—just the addition of a distracting clause—led to a performance drop of up to 65% in some models ?. This is a clear indication that LLMs, while powerful, are not actually reasoning in the way humans do. Humans would intuitively understand that the information about the puzzle is irrelevant to the math problem, but the models get thrown off by the additional clause, suggesting that their reasoning process is more about pattern matching than true logical understanding.
This example highlights the fragility of AI models when it comes to reasoning tasks, where they can be easily misled by irrelevant information, leading to incorrect or inconsistent results. It underscores the limitations of current AI systems and the challenges we face in trusting them to handle complex, real-world decision-making without careful oversight.
I have personally faced this limitation while coding in PHP and in Python. If the model gets on a wrong track, it makes all wrong assumption and wrong codes which works but doesn't do the job. Sometimes the error from my side is just a misplaced comma to put it on a wrong track.
This reveals a deeper issue: LLMs are not actually reasoning in the human sense; they are merely approximating the appearance of reasoning by relying on patterns learned from vast amounts of data. This suggests that these systems are highly sensitive to surface-level changes and are not truly understanding the underlying logical structure of the problem. In human terms, this would be like a student who can solve a problem by rote memorization but gets completely lost when the problem is presented in a slightly different way. Only difference is that this student (LLM) has memorised millions of problems.
The implications here are significant. If AI models struggle with simple reasoning tasks that involve logic and consistency, what does that mean for their ability to perform in more complex, real-world scenarios? In the context of Human Resources, this highlights a major concern: how can organizations rely on AI systems that may appear competent on the surface but are actually prone to failure under minor stress (just like our expensive consultants :-) )? Except in case of AI, their failure will not be that visible.
What Does This Mean for the Future?
So where does this leave us? After reading these papers, I can’t help but feel both intrigued and concerned. On one hand, we have a deeper understanding of bullshit—both in human communication and in AI systems. On the other hand, the implications are daunting. As AI becomes more integrated into our lives, the line between truth and appearance may blur even further, making it harder to distinguish meaningful communication from well-presented nonsense.
I think, we need to be acutely aware of this shift. In organizational contexts, communication is key. Whether it’s between team members, in leadership, or with external stakeholders, the ability to convey truth and avoid bullshit is critical for trust and effectiveness. If AI systems are increasingly part of this communication loop, we must ensure that they are used responsibly and that their limitations are well understood.
One practical takeaway from this is the need for better AI literacy in organizations. People need to understand that just because a system produces coherent and seemingly logical outputs, it doesn’t mean those outputs are grounded in truth. In decision-making processes, especially in areas like recruitment and selection, performance management, or strategy development, relying on AI without a critical understanding of its limitations could lead to poor outcomes - and regulatory actions, reputation risk etc. etc. Are we prepared for that?
Finally, we must rethink how we train and evaluate AI systems. Instead of focusing solely on making these systems sound human-like, we should emphasize their ability to engage with truth and logic in a reliable way. This may mean developing new kinds of benchmarks and evaluation frameworks that go beyond surface-level performance and assess deeper reasoning capabilities. I have heard that some AI scientists are working in that direction.
Link to the paper here: https://arxiv.org/abs/2410.05229
To sum up I would say these three papers have opened up a fascinating and somewhat troubling discussion about communication, truth, and the role of AI in our lives.
As we move forward, we must be vigilant in ensuring that our tools—whether human or machine—are not just impressive in appearance, but meaningful and accurate in their contributions to our shared world.
I would love to read your reactions and thoughts on this...
P&G ?? Serial Entrepreneur | Building Smarterchains | Wealth is Well-Being: Mind-Body-Spirit. You build a business by growing your container. Sharing my knowledge & experience with those ready to break limits & beliefs.
1 个月Thank you for creating this article. Very insightful and adds new elements to the debate. I just wrote this article as I believe we need to talk more about the subject, reflect and debate. This is all happening now and fast, implications are tremendous: https://www.dhirubhai.net/pulse/what-i-do-ai-can-all-luigi-matrone-nhm1e/?trackingId=dj2Cb7qTSe6aSXfJZVvUHQ%3D%3D
Chartered Occupational Psychologist driving organizational excellence through strategic talent leadership, science and inclusive practice.
1 个月Great piece of critical thinking and writing, VJ! ???? It’s clearly articulated by a human ?? My favorite takeaway is: “Instead of focusing solely on making these systems sound human-like, we should emphasize their ability to engage with truth and logic in a reliable way.” Considering your sound reasoning on AI performance, and the fact that both candidates and companies are using AI in job applications, it seems like in many cases we’re assessing bullsh*t using bullsh*t ?? I guess the need for in-person assessments and interviews will continue!
Business Psychologist | Expert in Talent Assessment, Development & Management | Psychometrics | Leadership Assessment | People Analytics | Organizational Development
1 个月Thank you Ben Gibbs. Thanks for sharing it.
Empowering Corporate Leaders, HR Professionals & Organizations to Unleash Their Full Potential Through Sustainable Coaching Cultures, Leadership Coaching, & Chinese Metaphysical Insights | DM "Coach" 2 Learn More
1 个月Great perspective! Thanks for sharing.
Enterprise Agility Coach | Leadership Development | Author | ICAgile Authorized Instructor | Certified SAFe 6.0 Agilist | EMCC member | ICP-ACC| Ambassador-DASA(DevOps Agile Skills Association)
1 个月Vijai PANDEY This article put me in deep thoughts about where this might take us if bullshit will eventually be considered "the truth" by ignorant folks and those in leadership position that decide.It is happening. Much to ponder.Thanks for writing it down...I am expecting more of this habit from you.??