Understanding Hallucinations and Bias
Vishal Jindal
Lead Data Scientist at UHG | Project orchestrator and production implementation | Delivering Data-driven Solutions for Business Growth| Pyspark, Azure Databricks, Machine Learning, Big Data, Python, SQL, MLOPS |
Hallucinations in LLMs
In Large Language Models, hallucinations refer to cases where the model produces text that's incorrect and not based on reality. An AI hallucination is a confident response by the model that cannot be grounded in any of its training data.
There are several possible reasons for these types of hallucinations:
Let’s see a clear example of a hallucination.
On April 12th, 2023,?Databricks?announced?their “Dolly 2.0” model, an open-source 12B parameter language model fine-tuned on a human-generated instruction-following dataset. GPT-3.5 has been trained on a dataset with a cutoff year of 2021. Therefore, it can’t know about Dolly 2.
When GPT-3.5 is asked: “Does Databricks Dolly provide value to the open source community?” it responds with something like this:
GPT 3.5:?Databricks Dolly is a tool developed by Databricks, which is designed to help users migrate their data from one platform to another, particularly from on-premise data centers to cloud-based platforms such as AWS or Azure. While Databricks Dolly is not an open source tool, it can still provide value to the open source community in a few ways.[…]
Except for “Databricks Dolly is a tool developed by DataBricks,” the rest of the text is entirely incorrect! Dolly is not a tool to migrate data; it is open-source, contrary to the response. Still, GPT-3.5 knows about Databricks as a company, and therefore, it generates a plausible description of Dolly 2 as a generic product of Databricks.
While this example shows OpenAI GPT-3.5 hallucinating, it’s important to note that this phenomenon applies to other similar LLMs like Bard or LLama.
Strategies to mitigate hallucinations include tuning the text generation parameters, cleaning up the training data, precisely defining prompts (prompt engineering), and using retriever architectures to ground responses in specific retrieved documents.
Misinformation Spreading
One significant risk associated with hallucinations in LLMs is their potential to generate content that, while appearing credible, is factually incorrect. Due to their limited capacity to understand the context and verify facts, LLMs can unintentionally spread misinformation.
There's the potential for individuals with malicious intent to exploit LLMs to spread disinformation deliberately, creating and promoting false narratives. A study by Blackberry found that nearly half of the respondents (49%) believed that GPT-4 could be used to spread misinformation. The unrestricted spread of such false information via LLMs can lead to widespread negative impacts across societal, cultural, economic, and political landscapes. It's crucial to address these issues related to LLM hallucinations to ensure the ethical use of these models.
Tuning the Text Generation Parameters
The generated output of LLMs is greatly influenced by various model parameters, including temperature, frequency penalty, presence penalty, and top-p.
Higher temperature values promote randomness and creativity, while lower values make the output more deterministic. Increasing the frequency penalty value encourages the model to use repeated tokens more conservatively. Similarly, a higher presence penalty value increases the likelihood of generating tokens not yet included in the generated text. The “top-p” parameter controls response diversity by setting a cumulative probability threshold for word selection.
Leveraging External Documents with Retrievers Architectures
Response accuracy can be improved by providing domain-specific knowledge to the LLM in the form of external documents. Augmenting the knowledge base with domain-specific information allows the model to ground its responses in the knowledge base. After a question from a user, we could retrieve documents relevant to the questions (leveraging a module called “retriever”) and use them in a prompt to produce the answer. This type of process is implemented into architectures typically called “retrievers architectures”.
领英推荐
In these architectures:
Retrieval-augmented generation (RAG) is a technique that enhances language model capabilities by sourcing data from external resources and integrating it with the context provided in the model's prompt.
Providing access to external data sources during the prediction process enriches the model’s knowledge and grounding. By leveraging external knowledge, the model can generate more accurate, contextually appropriate responses and be less prone to hallucination.
Bias in LLMs
Large language models like GPT-3.5 and GPT-4 have raised serious privacy and ethical concerns. Research has shown that these models are prone to inherent bias, leading to the generation of prejudiced or hateful language, intensifying the concerns regarding their use and governance.
Biases in LLMs arise from various sources: the data, the annotation process, the input representations, the models, and the research design.
For instance, training data that don't represent the diversity of language can lead to demographic biases, resulting in a model's inability to understand and accurately represent certain user groups. Misrepresentation can vary from mild inconveniences to more covert, gradual declines in performance, which can unfairly impact certain demographic groups.
LLMs can unintentionally intensify harmful biases through their hallucinations, creating prejudiced and offensive content.
The data used to train LLMs frequently includes stereotypes, which the models may unknowingly reinforce. This imbalance can lead the models to generate prejudiced content that discriminates against underrepresented groups, potentially targeting them based on factors like race, gender, religion, and ethnicity.
This can be exemplified when an LLM produces content that portrays certain ethnicities as intrinsically violent or unreliable. Also, if a model is trained on data biased towards a younger, technologically savvy demographic, it may generate outputs that overlook older individuals or those from less technologically equipped regions. If the model is steeped in data from sources promoting hate speech or toxic content, it might produce damaging and prejudiced outputs, amplifying the diffusion of harmful stereotypes and biases.
These examples underscore the urgent need for constant monitoring and ethical management in the use of these models.
Constitutional AI
Constitutional AI' is a conceptual framework crafted by researchers at Anthropic. It aims to align AI systems with human values, ensuring that they become beneficial, safe, and trustworthy.
In the beginning, the model is trained to self-review and modify its responses based on a set of predetermined principles and a small set of process examples. The next phase involves reinforcement learning training. At this point, the model leans on AI-generated feedback, grounded in the given principles, as opposed to human feedback, to choose the least harmful response.
Constitutional AI employs methodologies like self-supervision training. These techniques allow the AI to learn to conform to its constitution, without the need for explicit human labeling or supervision.
The approach also includes developing constrained optimization techniques. These ensure that the AI pursues helpfulness within the boundaries set by its constitution rather than pursuing unbounded optimization, potentially forgetting helpful knowledge.
Credit: Activeloop.ai
Lead Data Scientist at UHG | Project orchestrator and production implementation | Delivering Data-driven Solutions for Business Growth| Pyspark, Azure Databricks, Machine Learning, Big Data, Python, SQL, MLOPS |
3 个月https://www.dhirubhai.net/feed/update/urn:li:activity:7204006237615661057/