The Human Problem With LLMs

The Human Problem With LLMs

Traditionally, we view technology as a boolean: correct or incorrect. If something doesn't work, we try a different tool or service. If a significant investment has been made, we may consider adjustments, fixes, or shift how we use the technology. Changing what we do to accommodate better outcomes can be annoying, but we do it all the time.

Human failures are learning opportunities. Taking ownership of shortcomings or poor decision-making is seen as an adult thing rather than shifting blame towards others. In a business context, there are costs and risks associated with repeated failure, so accountability is key. Repeated mistakes signal bad investments and/or an inability to learn.

That said, how do we address situations where intelligent machines fail us?

Large Language Models (LLMS) don't perform like other technology. They aren't correct 100% of the time. An LLM can deliver an answer or series of wrong answers, just like the humans it was designed to emulate. The issue is that the incorrect answers are offered with the same confidence level as the correct answers.

The technical term for incorrect LLM answers is 'hallucination,' partially because if we say that a machine is lying to us, it sounds really bad.

Yet if someone gives you the wrong answer to a question, knowing it's incorrect or otherwise made up...it's a lie. Trust is earned over time, and lying erodes trust. It's that simple--for humans. Consider the following:

The scientific method dictates learning from mistakes.

Mistakes are human, as technology is designed to avoid error.

To learn from mistakes, they must be identified.

Accountability for making mistakes is key to building trust.

Reduction of errors drives trust.

The more we use an LLM, the better it performs. Like humans, it learns by making mistakes. But an LLM doesn't 'own' up to them, and it isn't being held accountable since (today) it's a super bright question-and-answer machine that can solve general-purpose problems and do a stellar yet inconsistent job.

If we don't see performance improvement, the level of trust decreases. How does a machine take accountability for its errors and show improvement? Humans reflect personal and professional growth with sincerity, clarity, and behavioral change. These are difficult for machines to emulate.

They try. Machines can be programmed to be artificially sincere--just thank ChatGPT/Claude/ Co-Pilot/Gemini for giving you an answer and see how it responds.

But does an LLM apologize? Kind of. Does it tell you when it has an answer with higher than lower confidence levels? Sometimes. LLMs can tell you they can't answer your direct question but provide adjacent information, which is remarkable since context is challenging to establish. Since trust is subjective, it's hard to tell whether or not we trust LLMs more with each product release.

Synthetic data and world models are where progress is being made to establish broader and better context, but it will never arrive fast enough for the market. Few things do. Either humans' perceptions of technology's value shift, and we continue changing how we do things to suit technology, or technology improves. Both will happen quickly.

Agents are an example of improving technology, as narrow, task-centric capabilities drive more business value than general knowledge queries. The LLM works to provide general information and capabilities and works in concert with other tools. The next stage of AI is achieving outcomes using multiple types of automation.

Agentive technology, used with LLMs and automation, will result in more ROI from AI.

Humans tend to look for simple, silver-bullet solutions to complex problems. Instead of looking for answers and shortcuts, we could admit that sometimes, we just need a little help.

That feels more human.


Alan Tan

User Experience Manager for Design and Research

1 个月

You cannot rely on AI systems to perform the work for you absolutely. Why, because you need to be able to judge the quality of the work being produced. If you don’t understand the output being generated then how can you know when a mistake is made or even if the content is any good? When you try to sell or use that output in a professional setting your work will be marked and confidence in it will be lost.

Nick Gall

Highly accomplished creative pragmatist who excels at innovation and execution.

2 个月

"The technical term for incorrect LLM answers is 'hallucination,' partially because if we say that a machine is lying to us, it sounds really bad." That's why I like the term 'confabulation' over 'hallucination'.

  • 该图片无替代文字
Ruth Kaufman

Systems Thinker / Experience Architect / Metadata Modeler / Knowledge Engineer // I turn knowledge, expertise, and skills into systems, frameworks, and workflows // Open to project-based opportunities

2 个月

We don’t expect 100% correctness when we talk to other humans, but that doesn’t prevent us from learning from each other. Our idea of what a computer is and its role in our lives and systems needs to broaden, and we need to learn what recipes using both precise and fuzzy computing are useful.

要查看或添加评论,请登录

Joe Meersman的更多文章

  • Outcomes don’t care about your title

    Outcomes don’t care about your title

    AI enables us to do a lot of things. It can augment us, increase the quality of our work, and reinforce the rigor of…

    2 条评论
  • Cognitive Offload

    Cognitive Offload

    A recent study from CMU and MSFT has been making the rounds all over LinkedIn. The study finds there is a risk of…

    1 条评论
  • AI is out to replace you and how to work with it

    AI is out to replace you and how to work with it

    I’m tired of hearing the tired, modern trope of ‘AI won’t replace you, but someone who knows how to use AI might.’ The…

    5 条评论
  • Agents vs. the AI Bubble

    Agents vs. the AI Bubble

    If 2023 and 2024 taught us anything, it was that Generative AI was on many people's minds. Capital was deployed as…

  • Building on Asimov: Practical AI Regulation

    Building on Asimov: Practical AI Regulation

    In my previous installment in the series on AI and Regulation, I proposed scenarios that addressed the emergence of…

    3 条评论
  • Five years & 300+ students later...

    Five years & 300+ students later...

    I wondered if doing anything beyond an occasional guest lecture or final critique was worth my time. I didn't…

    7 条评论
  • AGI Scenarios and the Role of Regulation

    AGI Scenarios and the Role of Regulation

    I do not see a path whereby humans intentionally step in to slow the pace of AI's development into Artificial General…

    1 条评论
  • Continuous experimentation by design

    Continuous experimentation by design

    I was fortunate to participate in a panel at the Rosenfeld Design Ops conference last week, where I also co-presented…

    3 条评论
  • L&D&AI

    L&D&AI

    The story of 2024 has primarily been a combination of higher interest rates, budget and staff reductions, and the…

    1 条评论
  • You can't spell Gaslight without AI

    You can't spell Gaslight without AI

    AI can do a lot for us, from automating repetitive tasks to providing high-fidelity images and video content that no…

    1 条评论

社区洞察

其他会员也浏览了