Artificial Hallucination
Created with BlueWillow AI | Prompt: Photorealistic image of an artificial humanoid looking up in the sky and hallucinating in the year 2080, vivid

Artificial Hallucination

Have you ever wondered whether the outputs generated by ChatGPT are a product of hallucination or imagination? Well, the answer is up to you. One thing is for sure, ChatGPT's performance is undeniably impressive. It has showcased remarkable abilities in natural language understanding, question answering, summarization, translation, and more. However, it does have one significant limitation - sometimes it generates outputs that may seem a bit unreal. This is due to it's underlying components, i.e. Large Language Models (LLMs).

Let us now delve into what is LLMs hallucination, why they do it, and how you can potentially avoid it to get better outcomes.

What is hallucination?

Hallucination, in context to LLMs, refers to the generation of text or responses that seem sound, fluent, and natural but are factually incorrect, nonsensical, or unfaithful to the provided source input. For example, a LLM may generate a false statement about a historical event, a nonsensical answer to a factual question, or a summary that does not reflect the main points of the original text.

Hallucination can have serious consequences for the quality and trustworthiness of these models and their applications. It can lead to the spread of misinformation, expose confidential information, and create unrealistic expectations about what they can do. It can also pose safety risks for users who rely on such technology for critical decisions or tasks (Hope no one does this).

You may wonder why this phenomenon happens in the first place. Is it because they are too smart or too dumb? Is it because they are bored or curious? Is it because they are trying to impress us or prank us? Well, the answer is not so simple. There are many factors that contribute to the hallucination problem of LLMs. Time for a deep dive.

How do LLMs hallucinate and what are the causes?

LLMs, to put it simply, works on probability. It is able to determine what you need at that insatance based on what the model thinks is the most probable response. Every word, letter, and the overall context of the situation the users provide can be predicted. Often, the most probable answer is not the right one. This is where the issues begin. LLMs inherently hallucinate and some main causes for that are:

  • Source-reference divergence: When a model is trained on data with source-reference (target) divergence, it may learn to generation of text that is not necessarily grounded or faithful to the given source. For example, if a model is trained on summaries that omit or add information from the original text, it may learn to do the same when generating summaries for new texts. This can result in summaries that are misleading, incomplete, or inaccurate. For instance, a model may summarize a news article about a plane crash by saying that “the pilot was drunk” or “the passengers were happy”, even if these statements are not mentioned in the article.
  • Memorization: When a model is trained on large amounts of data, it may memorize some facts or patterns from the training data and use them to generate text that is not relevant or accurate for the given input. For example, a model may falsely label an inference sample as entailing when the hypothesis is attested in the training text, regardless of the premise. A model may also use named entity IDs as “indices” to access the memorized data. This can result in text that is inconsistent, contradictory, or out-of-context. For instance, a model may answer a question about “Who is the president of France?” by saying “Emmanuel Macron” or “Donald Trump”, depending on which ID it uses to retrieve the answer from its memory. Another example of this from my personal experience using ChatGPT was the time I typed "yo" and the next predicted word was "mama". This raised a few concerns
  • Heuristics: When a model is trained on data that has some statistical biases or regularities, it may learn to exploit them as heuristics to generate text that is not based on logic or reasoning. For example, a model may use the relative frequencies of words as a cue to generate text that is more likely but not necessarily true. A model may also use syntactic cues such as punctuation or capitalization to generate text that is more plausible but not necessarily correct. This can result in text that is superficial, generic, or stereotypical. For instance, a model may generate a sentence about “a woman who loves cooking” by using words like “she”, “her”, “kitchen”, or “delicious”, even if these words are not relevant or appropriate for the given context.
  • Temperature: When a model generates text using a sampling method that involves a temperature parameter, it may produce text that is more diverse but also more prone to hallucination. The temperature parameter controls the randomness of the sampling process: a higher temperature means more randomness and diversity, while a lower temperature means more determinism and repetition. A higher temperature value may induce hallucination by making the model choose less probable but more creative words or phrases. This can result in text that is novel, surprising, or funny. For instance, a model may generate a sentence about “a man who loves fishing” by using words like “piranha”, “dynamite”, or “aquarium”, even if these words are not realistic or sensible for the given scenario.

As you can see, there are many ways that LLMs can hallucinate, and each of them has its own pros and cons. Some of them may be desirable or acceptable in some situations, while others may be undesirable or unacceptable in others. For example, source-reference divergence may be useful for generating summaries that are concise or informative, but not for generating summaries that are faithful or accurate. Memorization may be useful for generating text that is factual or consistent, but not for generating text that is relevant or diverse. Heuristics may be useful for generating text that is fluent or natural, but not for generating text that is logical or reasonable. Temperature may be useful for generating text that is diverse or creative, but not for generating text that is probable or coherent.

Therefore, it is important to understand the trade-offs and implications of each cause of hallucination and to choose the appropriate one for the desired task or application. It is also important to be aware of the potential downsides and risks of this phenomenon and to take measures to prevent and/or reduce them.

Are there downsides?

Created using BlueWillow AI


Hallucination can have negative impacts on both the users and the developers of LLMs. Looking at some of the downsides, we have:

  • Loss of credibility: Hallucination can damage the reputation and trustworthiness of LLMs and their applications. Users may lose confidence in models if they encounter false or misleading information generated by them. Developers may face legal or ethical issues if their models produce harmful or offensive content. For example, a LLM that generates fake news or reviews may deceive or misinform the readers and affect their opinions or decisions. A model that generates abusive or hateful language may offend or hurt the recipients and violate their rights or dignity.
  • Loss of utility: It can reduce the usefulness and effectiveness of these models and their applications. Users may not be able to achieve their goals or tasks if they receive incorrect or irrelevant information from them. Developers may not be able to evaluate or improve their applications if they cannot measure or mitigate their hallucinations. For example, a LLM that generates wrong answers or summaries may confuse or mislead the users and prevent them from learning or understanding something. A LLM that generates inconsistent or out-of-context text may frustrate or annoy the users and disrupt their communication or interaction.
  • Loss of safety: Hallucination can pose risks and threats to the users and the society at large. Users may suffer physical, emotional, or financial harm if they act upon false or misleading information generated by such models. Society may face social or political instability if they are used to manipulate public opinion or spread misinformation. For example, generated fake medical advice or diagnosis may endanger or harm the patients and affect their health or well-being. Fake political statements or propaganda generated by LLMs may influence or polarize the voters and affect their democracy or governance.

As you can see, hallucination can have serious consequences for the quality and trustworthiness of such models and their applications. It can also pose safety risks for users who rely on these for critical decisions or tasks. Therefore, it is crucial to avoid or minimize hallucination as much as possible. But how can we do that? Is there a magic bullet that can solve this problem? Well, not really. There is no one-size-fits-all solution that can eliminate hallucination completely. However, there are some methods and techniques that can help us address this problem to some extent.

How can you avoid LLM hallucination?

There are several methods and techniques that have been proposed or developed to address the hallucination problem of LLMs. Some of them are:

  • Data quality control: One way to prevent hallucination is to ensure that the data used to train LLMs is high-quality, reliable, and consistent. This can be done by using data sources that are verified, curated, and annotated by experts or crowdsourcing platforms. It can also be done by applying data cleaning, filtering, or augmentation techniques to remove or reduce noise, errors, or biases in the data. This can help it learn from data that is relevant, accurate, and faithful to the source input. For example, training on high-quality summaries that reflect the main points of the original text may generate summaries that are more faithful and accurate than a model trained on low-quality summaries that omit or add information from the original text.
  • Model regularization: Another way to prevent hallucination is to modify the model architecture or training objective to make LLMs more robust, stable, and faithful. This can be done by using regularization techniques such as dropout, weight decay, or adversarial training to reduce overfitting or memorization. It can also be done by using auxiliary tasks or losses such as entailment, consistency or contrastive learning to encourage LLMs to generate text that is grounded or aligned with the input. This can help LLMs generate text that is more relevant, diverse, and consistent than the ones that are not regularized. For example, a LLM that is trained with an entailment loss may generate text that is more logical and reasonable than one that is trained without it.
  • Output verification: A third way to prevent this is to check the validity, accuracy, and relevance of the text generated by the models before presenting it to the users (Highly recommended). This can be done by using verification techniques such as fact-checking, logic-checking, or source-checking to detect and correct false or misleading information. It can also be done by using feedback techniques such as rating, ranking, or editing to collect and incorporate user preferences or corrections. This can help them generate text that is more trustworthy, useful, and satisfactory than those that are not verified. For example, a LLM that is verified by a fact-checker may generate text that is more factual and credible than a LLM that is not verified.

These are some of the methods and techniques that can help us avoid or reduce hallucination in LLMs. However, they are not perfect or complete. They may have their own limitations or challenges, such as data availability, scalability, or interpretability. They may also introduce new problems or trade-offs, such as complexity, latency, or privacy. Therefore, it is important to evaluate and compare them carefully and to choose the best one for the specific task or application.

?

Conclusion

Large language models are amazing artificial intelligence systems that can generate fluent and coherent text on a variety of topics and tasks. However, they also have a major drawback: they tend to hallucinate. Hallucination is a phenomenon where LLMs generate text that is incorrect, nonsensical, or unfaithful to the input. This can have serious consequences for the quality and trustworthiness of these models and their applications. It can also pose safety risks for users who rely on this technology for critical decisions or tasks. Therefore, it is important to understand the causes and effects of hallucination and to apply methods and techniques to prevent or reduce it. By doing so, we can make LLMs more reliable, useful, and safe for everyone.

I hope you enjoyed reading this article and learned something new about the hallucination problem of large language models. If you have any questions or comments, please feel free to share them with me.


Thank you for your time and interest, enjoy this funny image below from my trial and error sessions :)

Generated using BlueWillow AI | Prompt: Thank you on an airplane




Mitesh Saha

Consulting for - Staffing AI Solutions, Offshore Recruitment (RPO), Offshore Finance & Accounting, Outsourcing Services, INDIA EOR

1 年

Very well explained Souvik. Cheers!

Sougata Ghosh

Co-founder at Paintphotographs.com

1 年

Great write-up. Simplifies the understanding of AI models. Keep up the good work ??

要查看或添加评论,请登录

社区洞察

其他会员也浏览了