Revisiting Logic Puzzle with "o1"
A few months ago, I tested the reasoning capabilities of then latest 'gpt' model. Here is a link to that article: An Investigation into LLM Reasoning Capabilities.
Briefly, the logic puzzle is a variation of the famous Einstein 'Zebra' puzzle. You are given a set of assertions or clues. To solve the puzzle, you are to find the missing information, which can be discovered by reasoning through the given clues.
Previously, I tested two solution modes: a) direct and b) with code generation. The results are summarized below:
a) Direct Mode: The gpt model (available then) was given the puzzle in English and was asked to provide the solution. The model was unable to solve the puzzle. The text it generated seemed plausible and convincing, but a quick inspection revealed that the model was hallucinating the answer.
b) With Code Generation: Instead of solving the problem directly, the model was asked to transform the English description into SMTLib code. SMTLib is a standard for logic programming. A problem described in this language can be solved with an SMT solver like Z3. The model generated syntactically correct code but made a tiny error in translation from English to code. While tiny, this is a more dangerous error If a human is to rely on the results of the model without any verification.
The new OpenAI "o1" model is supposed to have enhanced reasoning capabilities so it was natural for me to test it with the same puzzle. I tried the direct mode first and to my surprise o1 nailed it on the first attempt!
Here is the 'prompt' or input to the model:
**context**
Four friends are cycling around the city on their bikes. Each one has a different name, a different age, is riding a different bike, and have brought a different sandwich. Use the clues to find out their characteristics.
Friends: David, Eric, Gregory, Ivan
Bike Colors: Blue, Pink, Red, Yellow
Ages: 10, 11, 12, 14
Sandwiches: Cheese, Peanut Butter, Roast Beef, Turkey
**clues**
The 12-year-old cyclist is in the third position.
The cyclist with the Yellow bike is immediately after Ivan.
The 14-year-old cyclist has a Cheese sandwich.
The boy with the Peanut Butter sandwich is directly before the boy with the Turkey sandwich.
Eric is the boy who has a Peanut Butter sandwich.
The 11-year-old boy is in the last position.
Cyclist Gregory is 14 years old.
The cyclist with the Red bike is next to the cyclist named David.
Ivan is the one riding the Blue bike.
The 14-year-old boy is in the second position.
Find the postion, bike color, age and sandwith of each friend, given the context and the clues
领英推荐
The model responded with:
To solve this puzzle, we'll use the provided clues to deduce the positions, names, ages, bike colors, and sandwiches of each friend. Step 1: Assign Positions Based on Ages:
The final answer is correct! This is quite amazing!
Indeed, the model can now 'reason' by some definition of that word.
I think we are now entering a new era - beyond that of the original Chat GPT. o1 is clearly a step above and should open the door to many new types of applications.