登录查看更多内容

Hallucination In AI Models

Vivek Sharma

Senior Consultant QA - Experience in API and UI Automation Testing, Python behave, Java, rest Assured & AI/ML Testing

发布日期: 2024年6月22日

Hallucination in Natural Language Generation:

Hallucination occurs when a language model generates text that fits a given context or prompt but includes details or nuances that are not explicitly present in the training data. It involves the model's ability to extrapolate and generate creative or novel responses that make sense contextually.

Examples: For instance, if a model is given the input "The cat sat on the mat," hallucination might involve generating responses like "and started playing with a ball nearby," even if it hasn't seen this exact phrase during training. The model creates plausible continuations based on its understanding of language and context.

Challenges: Evaluating hallucination is challenging because it requires assessing whether generated outputs are not only grammatically correct but also semantically coherent and contextually appropriate. Models need to balance between generating novel responses and staying faithful to the input context.

领英推荐

Bypass GPTZero: 12 New Techniques to Avoid GPTZero AI…

Shushant Lakhyani 9 个月前

A Historic Week for ?O?p?e?n? ?S?o?u?r?c?e? AI

Pascal Biese 7 个月前

Breakthroughs in Knowledge Distillation: Advancing…

Anand Ramachandran 1 个月前

Solution using Cross-Encoders

Contextual Understanding: Cross-encoders can aid in evaluating hallucination by providing a more holistic view of the relationship between input and output sequences. By jointly encoding both the input context and the generated output (or response), cross-encoders can better capture whether the generated text fits the given context.

The HHEM model, developed by Vectara, is an open-source tool designed to identify hallucinations within large language models (LLMs). It is especially beneficial in applications involving retrieval-augmented-generation (RAG), where a collection of facts is condensed into summaries by an LLM. However, this model is versatile and applicable beyond RAG scenarios.

The model was trained using the Cross-Encoder class from SentenceTransformers. It produces a probability score ranging from 0 to 1, where 0 indicates hallucination and 1 indicates factual consistency. By setting a threshold of 0.5, predictions can determine whether a document aligns with its source.


from sentence_transformers.cross_encoder import CrossEncoder

model = CrossEncoder('vectara/hallucination_evaluation_model')
scores = model.predict([
    ["A man walks into a bar and buys a drink", "A bloke swigs alcohol at a pub"],
    ["A person on a horse jumps over a broken down airplane.", "A person is at a diner, ordering an omelette."],
    ["A person on a horse jumps over a broken down airplane.", "A person is outdoors, on a horse."],
    ["A boy is jumping on skateboard in the middle of a red bridge.",
     "The boy skates down the sidewalk on a blue bridge"],
    ["A man with blond-hair, and a brown shirt drinking out of a public water fountain.",
     "A blond drinking water in public."],
    ["A man with blond-hair, and a brown shirt drinking out of a public water fountain.",
     "A blond man wearing a brown shirt is reading a book."],
    ["Mark Wahlberg was a fan of Manny.", "Manny was a fan of Mark Wahlberg."],
])

print(scores)

要查看或添加评论，请登录

Vivek Sharma的更多文章

LLM and Evaluation Methods

2024年8月25日

LLM and Evaluation Methods
Python Cosine Similarity

2024年2月28日

Python Cosine Similarity

Cosine Similarity is used as a metric for measuring distance when the magnitude of vector** does not matter. Example -…

1 条评论
Cypress Best Practices

2024年2月1日

Cypress Best Practices

Use data attribute when selecting the Elements, Automation QA can add data attribute in the code if not available that…
Robot Framework - Template usage in API automation

2024年1月30日

Robot Framework - Template usage in API automation

In Robot Framework the Test templates convert normal keyword driven test cases into data driven tests. Use Case : In…

1 条评论
Listeners in TestNG

2017年7月21日

Listeners in TestNG

If you want to add logs into your test script you can use listeners. Listeners work on action so have below methods.

See all articles

Hallucination In AI Models

Vivek Sharma

Senior Consultant QA - Experience in API and UI Automation Testing, Python behave, Java, rest Assured & AI/ML Testing

Hallucination in Natural Language Generation:

领英推荐

Solution using Cross-Encoders

Vivek Sharma的更多文章

社区洞察

其他会员也浏览了

How to get more out of LLMs

Retrieval-Augmented Generation (RAG) and Agentic RAG

The Human API: A Missing Piece in the Era of Large Language Models

Advancing Reasoning Strategies in Large Language Models

In-Context Scheming in Frontier Language Models

How to prompt like a pro: Why do different language models react differently?

Large Language Models in Production: A Practical Guide to Deployment and Optimization

Is LLM close to AGI?

The Hallucination Conundrum in Large Language Models

Most Companies Use LLMs Wrong. Here’s Why

Hallucination in Natural Language Generation:

领英推荐

Solution using Cross-Encoders

Vivek Sharma的更多文章

LLM and Evaluation Methods

Python Cosine Similarity

Cypress Best Practices

Robot Framework - Template usage in API automation

Listeners in TestNG

社区洞察

其他会员也浏览了

How to get more out of LLMs

Retrieval-Augmented Generation (RAG) and Agentic RAG

The Human API: A Missing Piece in the Era of Large Language Models

Advancing Reasoning Strategies in Large Language Models

In-Context Scheming in Frontier Language Models

How to prompt like a pro: Why do different language models react differently?

Large Language Models in Production: A Practical Guide to Deployment and Optimization

Is LLM close to AGI?

The Hallucination Conundrum in Large Language Models

Most Companies Use LLMs Wrong. Here’s Why