登录查看更多内容

Math Hallucinations with OpenAI, But Also Some Great Results

Vincent Granville

Co-Founder, BondingAI.io

发布日期: 2024年7月5日

This is not just another rant about OpenAI. I actually have something very positive to say, even if in the end, the answer to my prompt was wrong. Many of the hallucinations would not be a real issue if OpenAI provided references and links to any piece of information returned to the user. Indeed, this was the main reason why I created xLLM (see details here).

Perhaps in the future, someone -- maybe me -- will create a meta-LLM that parses prompt results from OpenAI, Mistral, Perplexity and other platforms to get the best out of the mix, and blend with internal embeddings, as augmented data. The first step consists in generating billions of synthetic prompts and then run them through the various apps, maybe even including old-fashioned Google search.

For now, we have to deal with a single platform at a time. This article focuses on OpenAI and my most recent math query. I tried Gemini too, but the results were a lot worse.

To summarize, OpenAI gave a wrong answer to my question. Thankfully, I knew it was wrong. What if you don't, then publish an article or provide paid advice based on that? Anyway, I tried to get more details with a different prompt. Then some magic happened: OpenAI launched a Python script out of nowhere, run it in real time, and essentially told me that the answer to my question was in the output produced by that script. A simple analysis of the output in question would yield the answer. This was great, because it helped me discover a new Python library very useful for what I do.

领英推荐

OpenAI Assistants: How to create and use them

Pluralsight 1 年前

LangChain State of AI 2024: A Comprehensive Analysis

Anablock 2 个月前

Voxel51 Filtered Views Newsletter - August 30, 2024

Voxel51 6 个月前

Unfortunately, OpenAI decided to add one concluding paragraph after that, telling me that the wrong answer (the one from the first prompt) matched the correct answer obtained in the second prompt.

Read the full article, here.

GenAI and Machine Learning

211,176 位关注者

Vincent Granville

Co-Founder, BondingAI.io

8 个月

I also tried another prompt: count the number of occurrences of “000” in all binary strings of length 5. Then, same prompt with “000” replaced by “010”. OpenAI claims the answer is 8 in both cases. Gemini claims it is 3. Both justify the wrong answer using incorrect logic. Even Python gets it wrong, counting non-overlapping occurrences only, coming up with 8 for “000” (wrong) and 11 for “010” (correct).

1 次回应

Don Osborn

8 个月

Glad you are sharing this content, on the edge of fully utilize. Thanks

1 次回应

Yulia Klimov

AI/ML in Fin Crimes and Compliance, Automation, genAI, data analytics translation, innovation, solution architecture - connecting business with data analytics, data science and engineering

8 个月

I would like to know your opinion on hallucination in general. How we can control it in production? would you be writing on it in the future?

3 次回应

查看更多评论

要查看或添加评论，请登录

Vincent Granville的更多文章

LLM Challenge with Petabytes of Data to Prove Famous Number Theory Conjecture

2025年3月7日

LLM Challenge with Petabytes of Data to Prove Famous Number Theory Conjecture

For direct access to the full article with code, challenge, and dataset, follow this link. In my recent article…

6 条评论
Invitation to Attend the Top AI Conference of the Year: NVIDIA GTC 2025

2025年2月27日

Invitation to Attend the Top AI Conference of the Year: NVIDIA GTC 2025

If there is one major AI event that you don’t want to miss in 2025, that’s the NVIDIA GPU Technical Conference (GTC) in…

2 条评论
Spectacular Connection Between LLMs, Quantum Systems, and Number Theory

2025年2月24日

Spectacular Connection Between LLMs, Quantum Systems, and Number Theory

In my recent research on cracking the deepest mathematical mystery, with version 2.0 published yesterday and available…

10 条评论
How to Improve RAG / LLM Accuracy & Resilience with Change Data Capture

2025年2月8日

How to Improve RAG / LLM Accuracy & Resilience with Change Data Capture

Register here. Change Data Capture (CDC) aims at detecting and tracking changes made to data.

2 条评论
Using AI to Solve the Deepest Math Conjecture

2025年1月28日

Using AI to Solve the Deepest Math Conjecture

The proof of the seminal result in question significantly benefited from our home-made AI technology: see the…

8 条评论
10 Great AI, LLM & GenAI Courses and Certifications to Boost your Career

2025年1月22日

10 Great AI, LLM & GenAI Courses and Certifications to Boost your Career

Covering all the AI topics most sought after by hiring companies: agents, multimodality, model evaluation, LangChain…

7 条评论
Piercing the Deepest Mathematical Mystery

2025年1月20日

Piercing the Deepest Mathematical Mystery

To skip the high-level presentation and directly download the paper, visit the AI research section here, and look for…

8 条评论
9 Tips to Design Hallucination-Free RAG/LLM Systems

2025年1月14日

9 Tips to Design Hallucination-Free RAG/LLM Systems

Here I explain how we manage to avoid hallucinations with our home-made Enterprise RAG/LLM. The most recent article on…

19 条评论
LLM 2.0, RAG & Non-Standard Gen AI on GitHub

2025年1月3日

LLM 2.0, RAG & Non-Standard Gen AI on GitHub

Full article available here. In this article, I share my latest Gen AI and LLM advances, featuring innovative…
NVIDIA GenAI & LLM Courses and Certifications, from Beginner to Advanced

2024年12月26日

NVIDIA GenAI & LLM Courses and Certifications, from Beginner to Advanced

Members of my team (including me) plan to earn the GenAI/LLM certification. Use the code "VINCENT" on sign-up, here, to…

9 条评论

See all articles

Math Hallucinations with OpenAI, But Also Some Great Results

Vincent Granville

Co-Founder, BondingAI.io

领英推荐

GenAI and Machine Learning

211,176 位关注者

Vincent Granville的更多文章

社区洞察

其他会员也浏览了

Llama 2, ChatGPT for Web Scraping, & Latest Python News

The Power of Griptape Task Memory and Off-Prompt?

OpenAI Launches DALL·E 2 Now Available in Beta with Pricing

?? GraphRAG's Biggest Problem Solved

Google's Imagen Is More Relatable than OpenAI's DALL-E 2

Google Colab Tricks and Tutorial for AIGoogle Colab Tricks and Tutorial for AI:

The MarklDown Project, CoAgents New Release, Building LLMs for Production, PyTorch 101

Issue #196 - THE ML ENGINEER ??

Handling Long Context RAG for LLMs with Contextual Summarization

#ArtificialIntelligence No 65: Why R lost the R vs Python wars and what that tells you about where AI is going

领英推荐

GenAI and Machine Learning

211,176 位关注者

Vincent Granville的更多文章

LLM Challenge with Petabytes of Data to Prove Famous Number Theory Conjecture

Invitation to Attend the Top AI Conference of the Year: NVIDIA GTC 2025

Spectacular Connection Between LLMs, Quantum Systems, and Number Theory

How to Improve RAG / LLM Accuracy & Resilience with Change Data Capture

Using AI to Solve the Deepest Math Conjecture

10 Great AI, LLM & GenAI Courses and Certifications to Boost your Career

Piercing the Deepest Mathematical Mystery

9 Tips to Design Hallucination-Free RAG/LLM Systems

LLM 2.0, RAG & Non-Standard Gen AI on GitHub

NVIDIA GenAI & LLM Courses and Certifications, from Beginner to Advanced

社区洞察

其他会员也浏览了

Llama 2, ChatGPT for Web Scraping, & Latest Python News

The Power of Griptape Task Memory and Off-Prompt?

OpenAI Launches DALL·E 2 Now Available in Beta with Pricing

?? GraphRAG's Biggest Problem Solved

Google's Imagen Is More Relatable than OpenAI's DALL-E 2

Google Colab Tricks and Tutorial for AIGoogle Colab Tricks and Tutorial for AI:

The MarklDown Project, CoAgents New Release, Building LLMs for Production, PyTorch 101

Issue #196 - THE ML ENGINEER ??

Handling Long Context RAG for LLMs with Contextual Summarization

#ArtificialIntelligence No 65: Why R lost the R vs Python wars and what that tells you about where AI is going