A bit about hallucinations

A bit about hallucinations

While LLMs are hot, their hallucinations are stark. For a casual user of the LLMs, they might seem to be minor mistakes which are pardonable as we always do with human beings' slips, more so since the LLMs are so polite and human-like in their conversations these days. But those small slips could be dangerous when we start to depend more and more on LLMs for our lives and work, and more so when we automate our work (without human oversight) based on LLM inputs.

A while back, when the Baltimore bridge collapsed because of the collision of a ship on one of its pillars, I was tracking the incident online that evening in my time zone, and I asked one of the LLMs to give me more information about the Baltimore bridge. That LLM is connected to The Internet in real time (not like chatGPT 3.5 whose information are not real time). Initially the LLM didn't seem to have any clue about the incident and didn't say anything about the collision. Eventually, two hours later, there was this mention that the bridge has collapsed, and the date of the collapse was given three months in the past! I was surprised and asked it 'Are you sure?' Then it apologized and gave the correct date and time of the collapse.

It may not sound as a big issue, but imagine if no human had been overseeing the output, and an automated script took that date and time input to take action in some fashion, say, out of my fertile imagination, some ship company's insurance processing. You can appreciate the financial and reputational distress that company would have been put into!

This is an example of an LLM hallucination. This is one type, and there are many others. I am not kidding when I say that organisations are relying more and more on automating based on LLM's outputs. We should be really worried about the quality of these outputs. As a Software Tester and someone who cares about Quality, I am.

Language processing is not simple. I was casually going through the various Python libraries on text processing and I was not at all impressed by the quality of their outputs. I appreciate and deeply respect the efforts that have gone into these open source projects, many of these done by top-notch university departments and their students, but the results are very disappointing.

I am actively working on how to expose these kind of problems and would be glad to join forces with others who are doing it too. If you are interested, give me a buzz and let's talk about it.


要查看或添加评论,请登录

Venkat Ramakrishnan的更多文章

  • Security Testing Of Autonomous Vehicles

    Security Testing Of Autonomous Vehicles

    Still an young field, and there's lot of scope to get into and be an expert! This is about security testing of…

    1 条评论
  • Quality Of Zero-Click Search Results

    Quality Of Zero-Click Search Results

    Let's talk about quality of Zero-Click search results!…

  • Streamlining Testing Process

    Streamlining Testing Process

    Let's talk about streamlining testing process: https://venkatramakrishnan.com/2025/03/16/testing-process-streamlining/

    2 条评论
  • Measuring Software Quality

    Measuring Software Quality

    Let's talk about how to measure software quality in the modern environments:…

  • Busting Regression Testing Myths

    Busting Regression Testing Myths

    In this article, let's bust some regression testing myths!…

  • Avoiding Test Results Conflicts

    Avoiding Test Results Conflicts

    Let's talk about the three key pillars that would contribute to avoiding test results conflicts! Here:…

  • Test Prioritization

    Test Prioritization

    We encounter difficulties on Test Prioritization on a daily basis. We are challenged because we need to deliver fast…

  • Skipping Testing Activities

    Skipping Testing Activities

    Skipping testing activities might make sense if the test types are not relevant to the situation at hand. One may…

  • Balancing Thorough Testing and Fast Feedback

    Balancing Thorough Testing and Fast Feedback

    Pressure to deliver as soon as possible and upholding efforts for superior quality are two conflicting goals because by…

  • How To Test Last Minute Features

    How To Test Last Minute Features

    We have all been through situations where we are asked to do quality analysis and testing last minute features. In the…

社区洞察

其他会员也浏览了