LLMs Hallucinations or much worse?
Copilot "Solving" Sudoku

LLMs Hallucinations or much worse?

On Friday Sep 20 during the last Day of DecisionCAMP (https://dmcommunity.org/2024/09/20/decisioncamp-day-3-presentations-and-recordings/) we had an interesting presentation by Bruce Silver "DMN Solver/Helper for Sudoku". Before submitting his presentation, Bruce told me: "I know, Jacob, that your constraint solvers can easily handle Sudoku problems, but I wanted to check how DMN FEEL will be able to support different complex patterns". I knew that Bruce is not only a big enthusiast of the DMN standard, but a great presenter and trainer, and I encouraged him to submit his presentation. And his presentation became a success (you may listen to it at https://www.youtube.com/watch?v=9NUAMqGeMZ0).

During the Q&A, the question of LLMs solving Sudoku came up. So, I just copied a Sudoku image from Bruce's presentation and decided to give it to Copilot while we talk about it. I've already had a positive experience with Copilot easily handling table images - see for instance, https://dmcommunity.org/wp-content/uploads/2024/09/challenge2024sep.copilot.pdf).

So, I had a brief dialog with Copilot presented on the top image. It was done almost immediately. Then I asked "Can you solve Sudoku of any complexity?" the answer was: "Yes, I can help you solve Sudoku puzzles of any complexity! Whether it’s an easy, medium, hard, or even an expert-level puzzle, I’m here to assist. If you have a specific Sudoku puzzle you’d like help with, feel free to share the details or an image of the puzzle, and I’ll guide you through the solution. ??"

It was impressive and I shared these results with Bruce and the DecisionCAMP audience as a proof of LLMs power. It confirmed my experience that LLMs could handle certain logical problems which are usually are not difficult for more traditional symbolic AI technologies such as constraint solvers.

After a few minutes, I started to received these messages in the Chat:

====================================================

Gary Hallmark: Jacob, that sudoku "solution" is quite an hallucination!

Me: From a quick glance I don't see any errors. Do you?

Rimantas Zukaitis: The numbers which were in original sudoku are not the same

Me: WOW! What a cheater...

Rimantas Zukaitis: it just gave a random solution, not related to the task at hand. well, its just a prediction - chances are this is most common sudoku solution it found

====================================================

After that, I got back to Copilot and insisted on this particular problem and Copilot started to produce tricky formulated sentences instead of a solution or an explanation and switched the topic.

An initial enthusiasm ended up with a serious frustration. Was it hallucination caused by the statistical nature of LLMs? Or was it an intentional problem substitute? Note that the first number in the top-left corner (where a regular person will immediately look) was "3" in both cases.

In my DecisionCAMP's Closing Remarks (https://www.youtube.com/watch?v=F4spCC9Bh7c) I spoke about necessity to always include rules-based Sanity Checkers in our end-to-end decisioning process. Experienced people did it for mission-critical applications long before LLMs were born. However, we never could imagine that a decision engine may replace our input data as Copilot did in this case.

Sanity Checkers at the end on the Decisioning Loop

I wonder what other people think: was it an unintentional random hallucination or something much worse?

?



要查看或添加评论,请登录

Jacob Feldman, PhD的更多文章

  • Discovering Rules From Examples

    Discovering Rules From Examples

    Watch this short Richard Feynman’s video about Discovering Rules of Chess. He used a chess analogy to explain what we…

  • Decision Intelligence Platform and SharePoint

    Decision Intelligence Platform and SharePoint

    OpenRules announced a new Release 10.5.

    2 条评论
  • Decision Modeling: Iterating over Collections

    Decision Modeling: Iterating over Collections

    In many real-world decision models iterating over collections of business objects is a must-to-have feature…

  • Decision Models “Stable Marriages”

    Decision Models “Stable Marriages”

    This June-2024 Challenge deals with the famous stable marriage problem. A very good analysis of the problem is provided…

    3 条评论
  • Lookup Tables inside Decision Models

    Lookup Tables inside Decision Models

    DMCommunity April-2024 Challenge “Using Lookup Tables in Decision Models“ deals with processing complex medical claims…

  • Machine Learning inside Decision-Making Applications: Practical Use Cases

    Machine Learning inside Decision-Making Applications: Practical Use Cases

    Machine Learning (ML) tools have been successfully used for decision-making applications for years. Despite many…

    4 条评论
  • Big Decision Tables

    Big Decision Tables

    When decision models use really big decision tables with tens and even hundreds of thousands of rules, the performance…

    2 条评论
  • Using Custom Decision Tables in Your Decision Models

    Using Custom Decision Tables in Your Decision Models

    One of the unique features of https://OpenRules.com is the simplicity of adding custom decision tables.

  • Composite Decision Variables

    Composite Decision Variables

    OpenRules 10.0 introduced composite decision variables that allow OpenRules customers to naturally express complex…

  • Free Decision Model POC

    Free Decision Model POC

    OpenRules team offers FREE development of the Proof of Concept (POC) for your Rules-based Decision Service. Many of our…

社区洞察

其他会员也浏览了