LLMs Hallucinations or much worse?
On Friday Sep 20 during the last Day of DecisionCAMP (https://dmcommunity.org/2024/09/20/decisioncamp-day-3-presentations-and-recordings/) we had an interesting presentation by Bruce Silver "DMN Solver/Helper for Sudoku". Before submitting his presentation, Bruce told me: "I know, Jacob, that your constraint solvers can easily handle Sudoku problems, but I wanted to check how DMN FEEL will be able to support different complex patterns". I knew that Bruce is not only a big enthusiast of the DMN standard, but a great presenter and trainer, and I encouraged him to submit his presentation. And his presentation became a success (you may listen to it at https://www.youtube.com/watch?v=9NUAMqGeMZ0).
During the Q&A, the question of LLMs solving Sudoku came up. So, I just copied a Sudoku image from Bruce's presentation and decided to give it to Copilot while we talk about it. I've already had a positive experience with Copilot easily handling table images - see for instance, https://dmcommunity.org/wp-content/uploads/2024/09/challenge2024sep.copilot.pdf).
So, I had a brief dialog with Copilot presented on the top image. It was done almost immediately. Then I asked "Can you solve Sudoku of any complexity?" the answer was: "Yes, I can help you solve Sudoku puzzles of any complexity! Whether it’s an easy, medium, hard, or even an expert-level puzzle, I’m here to assist. If you have a specific Sudoku puzzle you’d like help with, feel free to share the details or an image of the puzzle, and I’ll guide you through the solution. ??"
It was impressive and I shared these results with Bruce and the DecisionCAMP audience as a proof of LLMs power. It confirmed my experience that LLMs could handle certain logical problems which are usually are not difficult for more traditional symbolic AI technologies such as constraint solvers.
After a few minutes, I started to received these messages in the Chat:
====================================================
Gary Hallmark: Jacob, that sudoku "solution" is quite an hallucination!
Me: From a quick glance I don't see any errors. Do you?
Rimantas Zukaitis: The numbers which were in original sudoku are not the same
Me: WOW! What a cheater...
领英推荐
Rimantas Zukaitis: it just gave a random solution, not related to the task at hand. well, its just a prediction - chances are this is most common sudoku solution it found
====================================================
After that, I got back to Copilot and insisted on this particular problem and Copilot started to produce tricky formulated sentences instead of a solution or an explanation and switched the topic.
An initial enthusiasm ended up with a serious frustration. Was it hallucination caused by the statistical nature of LLMs? Or was it an intentional problem substitute? Note that the first number in the top-left corner (where a regular person will immediately look) was "3" in both cases.
In my DecisionCAMP's Closing Remarks (https://www.youtube.com/watch?v=F4spCC9Bh7c) I spoke about necessity to always include rules-based Sanity Checkers in our end-to-end decisioning process. Experienced people did it for mission-critical applications long before LLMs were born. However, we never could imagine that a decision engine may replace our input data as Copilot did in this case.
I wonder what other people think: was it an unintentional random hallucination or something much worse?
?
OpenRules, Founder and Chief Technology Officer
5 个月https://www.dhirubhai.net/posts/deepakmehta79_ai-llm-machinelearning-activity-7237906411412758528-gzNF?utm_source=share&utm_medium=member_desktop
Legal analysis | Norm engineering | Decision management | DMN | Modeling | Rule governance
5 个月Pauline Kampinga