Understanding Hallucinations and Bias
Understanding Hallucinations and Bias -Credit MS Designer

Understanding Hallucinations and Bias

Hallucinations in LLMs

In Large Language Models, hallucinations refer to cases where the model produces text that's incorrect and not based on reality. An AI hallucination is a confident response by the model that cannot be grounded in any of its training data.

There are several possible reasons for these types of hallucinations:

  • An LLM could be trained on a dataset that doesn’t have the knowledge required to answer a question.
  • An LLM does not have a reliable way to check the factual accuracy of its responses. It just outputs a response that may be wholly incorrect yet convincing.
  • The training dataset used to train the LLM may include fictional content and subjective content, like opinions and beliefs.
  • LLMs are not generally optimized to say “I don’t know” when they don’t have enough information. Therefore, when the LLM has no answer, it generates whatever is the most probable sequence of text as a response. But note that “most probable” does not necessarily mean “true.” Hallucination is the result of this misalignment. Note that the goal of language modeling is learning the underlying distribution of the words in a language, not to distinguish what is true and what isn’t.

Let’s see a clear example of a hallucination.

On April 12th, 2023,?Databricks?announced?their “Dolly 2.0” model, an open-source 12B parameter language model fine-tuned on a human-generated instruction-following dataset. GPT-3.5 has been trained on a dataset with a cutoff year of 2021. Therefore, it can’t know about Dolly 2.

When GPT-3.5 is asked: “Does Databricks Dolly provide value to the open source community?” it responds with something like this:

GPT 3.5:?Databricks Dolly is a tool developed by Databricks, which is designed to help users migrate their data from one platform to another, particularly from on-premise data centers to cloud-based platforms such as AWS or Azure. While Databricks Dolly is not an open source tool, it can still provide value to the open source community in a few ways.[…]

Except for “Databricks Dolly is a tool developed by DataBricks,” the rest of the text is entirely incorrect! Dolly is not a tool to migrate data; it is open-source, contrary to the response. Still, GPT-3.5 knows about Databricks as a company, and therefore, it generates a plausible description of Dolly 2 as a generic product of Databricks.

While this example shows OpenAI GPT-3.5 hallucinating, it’s important to note that this phenomenon applies to other similar LLMs like Bard or LLama.

Strategies to mitigate hallucinations include tuning the text generation parameters, cleaning up the training data, precisely defining prompts (prompt engineering), and using retriever architectures to ground responses in specific retrieved documents.

Misinformation Spreading

One significant risk associated with hallucinations in LLMs is their potential to generate content that, while appearing credible, is factually incorrect. Due to their limited capacity to understand the context and verify facts, LLMs can unintentionally spread misinformation.

There's the potential for individuals with malicious intent to exploit LLMs to spread disinformation deliberately, creating and promoting false narratives. A study by Blackberry found that nearly half of the respondents (49%) believed that GPT-4 could be used to spread misinformation. The unrestricted spread of such false information via LLMs can lead to widespread negative impacts across societal, cultural, economic, and political landscapes. It's crucial to address these issues related to LLM hallucinations to ensure the ethical use of these models.

Tuning the Text Generation Parameters

The generated output of LLMs is greatly influenced by various model parameters, including temperature, frequency penalty, presence penalty, and top-p.

Higher temperature values promote randomness and creativity, while lower values make the output more deterministic. Increasing the frequency penalty value encourages the model to use repeated tokens more conservatively. Similarly, a higher presence penalty value increases the likelihood of generating tokens not yet included in the generated text. The “top-p” parameter controls response diversity by setting a cumulative probability threshold for word selection.

Leveraging External Documents with Retrievers Architectures

Response accuracy can be improved by providing domain-specific knowledge to the LLM in the form of external documents. Augmenting the knowledge base with domain-specific information allows the model to ground its responses in the knowledge base. After a question from a user, we could retrieve documents relevant to the questions (leveraging a module called “retriever”) and use them in a prompt to produce the answer. This type of process is implemented into architectures typically called “retrievers architectures”.

In these architectures:

  1. When a user poses a question, the system computes an embedding representation of it.
  2. The embedding of the question is then used for executing a semantic search in the database of documents (by comparing their embeddings and computing similarity scores).
  3. The top-ranked documents are used by the LLM as context to give the final answer. Usually, the LLM is asked to extract the answer from those context passages precisely and not to write anything that can’t be inferred from them.

Retrieval-augmented generation (RAG) is a technique that enhances language model capabilities by sourcing data from external resources and integrating it with the context provided in the model's prompt.

Providing access to external data sources during the prediction process enriches the model’s knowledge and grounding. By leveraging external knowledge, the model can generate more accurate, contextually appropriate responses and be less prone to hallucination.

Bias in LLMs

Large language models like GPT-3.5 and GPT-4 have raised serious privacy and ethical concerns. Research has shown that these models are prone to inherent bias, leading to the generation of prejudiced or hateful language, intensifying the concerns regarding their use and governance.

Biases in LLMs arise from various sources: the data, the annotation process, the input representations, the models, and the research design.

For instance, training data that don't represent the diversity of language can lead to demographic biases, resulting in a model's inability to understand and accurately represent certain user groups. Misrepresentation can vary from mild inconveniences to more covert, gradual declines in performance, which can unfairly impact certain demographic groups.

LLMs can unintentionally intensify harmful biases through their hallucinations, creating prejudiced and offensive content.

The data used to train LLMs frequently includes stereotypes, which the models may unknowingly reinforce. This imbalance can lead the models to generate prejudiced content that discriminates against underrepresented groups, potentially targeting them based on factors like race, gender, religion, and ethnicity.

This can be exemplified when an LLM produces content that portrays certain ethnicities as intrinsically violent or unreliable. Also, if a model is trained on data biased towards a younger, technologically savvy demographic, it may generate outputs that overlook older individuals or those from less technologically equipped regions. If the model is steeped in data from sources promoting hate speech or toxic content, it might produce damaging and prejudiced outputs, amplifying the diffusion of harmful stereotypes and biases.

These examples underscore the urgent need for constant monitoring and ethical management in the use of these models.

Constitutional AI

Constitutional AI' is a conceptual framework crafted by researchers at Anthropic. It aims to align AI systems with human values, ensuring that they become beneficial, safe, and trustworthy.

In the beginning, the model is trained to self-review and modify its responses based on a set of predetermined principles and a small set of process examples. The next phase involves reinforcement learning training. At this point, the model leans on AI-generated feedback, grounded in the given principles, as opposed to human feedback, to choose the least harmful response.

Constitutional AI employs methodologies like self-supervision training. These techniques allow the AI to learn to conform to its constitution, without the need for explicit human labeling or supervision.

The approach also includes developing constrained optimization techniques. These ensure that the AI pursues helpfulness within the boundaries set by its constitution rather than pursuing unbounded optimization, potentially forgetting helpful knowledge.

Credit: Activeloop.ai

Vishal Jindal

Lead Data Scientist at UHG | Project orchestrator and production implementation | Delivering Data-driven Solutions for Business Growth| Pyspark, Azure Databricks, Machine Learning, Big Data, Python, SQL, MLOPS |

3 个月
回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了