A bit about hallucinations

A bit about hallucinations

While LLMs are hot, their hallucinations are stark. For a casual user of the LLMs, they might seem to be minor mistakes which are pardonable as we always do with human beings' slips, more so since the LLMs are so polite and human-like in their conversations these days. But those small slips could be dangerous when we start to depend more and more on LLMs for our lives and work, and more so when we automate our work (without human oversight) based on LLM inputs.

A while back, when the Baltimore bridge collapsed because of the collision of a ship on one of its pillars, I was tracking the incident online that evening in my time zone, and I asked one of the LLMs to give me more information about the Baltimore bridge. That LLM is connected to The Internet in real time (not like chatGPT 3.5 whose information are not real time). Initially the LLM didn't seem to have any clue about the incident and didn't say anything about the collision. Eventually, two hours later, there was this mention that the bridge has collapsed, and the date of the collapse was given three months in the past! I was surprised and asked it 'Are you sure?' Then it apologized and gave the correct date and time of the collapse.

It may not sound as a big issue, but imagine if no human had been overseeing the output, and an automated script took that date and time input to take action in some fashion, say, out of my fertile imagination, some ship company's insurance processing. You can appreciate the financial and reputational distress that company would have been put into!

This is an example of an LLM hallucination. This is one type, and there are many others. I am not kidding when I say that organisations are relying more and more on automating based on LLM's outputs. We should be really worried about the quality of these outputs. As a Software Tester and someone who cares about Quality, I am.

Language processing is not simple. I was casually going through the various Python libraries on text processing and I was not at all impressed by the quality of their outputs. I appreciate and deeply respect the efforts that have gone into these open source projects, many of these done by top-notch university departments and their students, but the results are very disappointing.

I am actively working on how to expose these kind of problems and would be glad to join forces with others who are doing it too. If you are interested, give me a buzz and let's talk about it.


要查看或添加评论,请登录

Venkat Ramakrishnan的更多文章

  • How To Test Last Minute Features

    How To Test Last Minute Features

    We have all been through situations where we are asked to do quality analysis and testing last minute features. In the…

  • On RAGs and Riches

    On RAGs and Riches

    Back in 2018, when I did a talk at ThoughtWorks on NLP, there was an euphoria on the state of chatbots. There was even…

  • The System Testing Of AI

    The System Testing Of AI

    When we test systems, we don't stop with just testing of functionality of modules, or integration testing of the…

  • At Wit's End On LLM performance?

    At Wit's End On LLM performance?

    Nowadays LLMs' performance is a daily topic! Me, like you, go awestruck looking at those magical numbers when an…

  • The Curious Case Of Software Naming

    The Curious Case Of Software Naming

    You all call me 'Venkat', and I'm okay with that! To be honest, there are boatloads of 'Venkat Ramakrishnan's out…

  • Prevention Is Better Than Cure

    Prevention Is Better Than Cure

    These past forty-five days or so saw the rise of voices of cybersecurity professionals from various capacities towards…

    2 条评论
  • Do Trillions Of Parameters Help In LLM Effectiveness?

    Do Trillions Of Parameters Help In LLM Effectiveness?

    "The more, the merrier" - A great saying to reflect on while organizing a party. Does the same apply for the number of…

    6 条评论
  • Integration Nightmare: The Case Of Super-flexible e-commerce platforms

    Integration Nightmare: The Case Of Super-flexible e-commerce platforms

    Freedom comes at a cost, which is not devoting ourselves to what we know well and accustomed to. This is especially…

  • Rocket Science: An Emerging Quality and Testing Opportunity

    Rocket Science: An Emerging Quality and Testing Opportunity

    A few months back, I had attended a startup enclave in Bengaluru in which I met a variety of entrepreneurs, some…

  • Verify, Then Trust

    Verify, Then Trust

    These are strange times that we live in wherein we cannot trust implicitly without verifying. There were times when we…

社区洞察

其他会员也浏览了