登录查看更多内容

Revisiting Logic Puzzle with "o1"

Faisal Waris

Data Scientist/Gen. AI Strategist in the telecom industry

发布日期: 2024年9月29日

A few months ago, I tested the reasoning capabilities of then latest 'gpt' model. Here is a link to that article: An Investigation into LLM Reasoning Capabilities.

Briefly, the logic puzzle is a variation of the famous Einstein 'Zebra' puzzle. You are given a set of assertions or clues. To solve the puzzle, you are to find the missing information, which can be discovered by reasoning through the given clues.

Previously, I tested two solution modes: a) direct and b) with code generation. The results are summarized below:

a) Direct Mode: The gpt model (available then) was given the puzzle in English and was asked to provide the solution. The model was unable to solve the puzzle. The text it generated seemed plausible and convincing, but a quick inspection revealed that the model was hallucinating the answer.

b) With Code Generation: Instead of solving the problem directly, the model was asked to transform the English description into SMTLib code. SMTLib is a standard for logic programming. A problem described in this language can be solved with an SMT solver like Z3. The model generated syntactically correct code but made a tiny error in translation from English to code. While tiny, this is a more dangerous error If a human is to rely on the results of the model without any verification.

The new OpenAI "o1" model is supposed to have enhanced reasoning capabilities so it was natural for me to test it with the same puzzle. I tried the direct mode first and to my surprise o1 nailed it on the first attempt!

Here is the 'prompt' or input to the model:

**context**  
Four friends are cycling around the city on their bikes. Each one has a different name, a different age, is riding a different bike, and have brought a different sandwich. Use the clues to find out their characteristics.  
&nbsp;  
Friends: David, Eric, Gregory, Ivan  
Bike Colors: Blue, Pink, Red, Yellow  
Ages: 10, 11, 12, 14  
Sandwiches: Cheese, Peanut Butter, Roast Beef, Turkey  
&nbsp;  
**clues**  
The 12-year-old cyclist is in the third position.  
The cyclist with the Yellow bike is immediately after Ivan.  
The 14-year-old cyclist has a Cheese sandwich.  
The boy with the Peanut Butter sandwich is directly before the boy with the Turkey sandwich.  
Eric is the boy who has a Peanut Butter sandwich.  
The 11-year-old boy is in the last position.  
Cyclist Gregory is 14 years old.  
The cyclist with the Red bike is next to the cyclist named David.  
Ivan is the one riding the Blue bike.  
The 14-year-old boy is in the second position.  
&nbsp;  
Find the postion, bike color, age and sandwith of each friend, given the context and the clues

领英推荐

How to Write an Algorithm?

Tpoint Tech 5 个月前

DIL guide | 13. Algorithmic awareness - the challenges…

Faktabaari (FactBar) 2 年前

Baseten New in June: LangChain adds Baseten…

Baseten 1 年前

The model responded with:

To solve this puzzle, we'll use the provided clues to deduce the positions, names, ages, bike colors, and sandwiches of each friend. Step 1: Assign Positions Based on Ages:

From clue (1): The 12-year-old is in the third position.
From clue (6): The 11-year-old is in the last position (fourth).
From clue (10): The 14-year-old is in the second position.
Since the ages are 10, 11, 12, and 14, the first position must be the 10-year-old. Step 2: Assign Names and Sandwiches:
From clue (7): Gregory is 14 years old, so he's in the second position.
From clue (3): The 14-year-old (Gregory) has a Cheese sandwich.
From clue (5): Eric has a Peanut Butter sandwich.
From clue (4): The boy with the Peanut Butter sandwich (Eric) is directly before the boy with the Turkey sandwich.
Since Eric can't be in the first position (as there's no one before him), and the second position is occupied by Gregory, Eric must be in the third position with the Peanut Butter sandwich, and the Turkey sandwich is in the fourth position.
The remaining sandwich, Roast Beef, goes to the first position. Step 3: Assign Remaining Names:
The names left are Ivan and David.
From clue (2): The cyclist with the Yellow bike is immediately after Ivan.
Since Gregory is in the second position, the only position before that is the first, so Ivan is in the first position.
From clue (9): Ivan rides the Blue bike.
That means David is in the fourth position. Step 4: Assign Bike Colors:
From clue (2): The Yellow bike is immediately after Ivan (first position), so the second position has the Yellow bike.
From clue (8): The cyclist with the Red bike is next to David.
Since David is in the fourth position, the Red bike must be in the third position with Eric.
The remaining bike color, Pink, goes to David in the fourth position. Final Assignment:

Position 1:Name: IvanAge: 10Bike Color: BlueSandwich: Roast Beef
Position 2:Name: GregoryAge: 14Bike Color: YellowSandwich: Cheese
Position 3:Name: EricAge: 12Bike Color: RedSandwich: Peanut Butter
Position 4:Name: DavidAge: 11Bike Color: PinkSandwich: Turkey Answer: The characteristics of each friend are:

Position 1: Ivan, age 10, rides a Blue bike, and has a Roast Beef sandwich.
Position 2: Gregory, age 14, rides a Yellow bike, and has a Cheese sandwich.
Position 3: Eric, age 12, rides a Red bike, and has a Peanut Butter sandwich.
Position 4: David, age 11, rides a Pink bike, and has a Turkey sandwich.

The final answer is correct! This is quite amazing!

Indeed, the model can now 'reason' by some definition of that word.

I think we are now entering a new era - beyond that of the original Chat GPT. o1 is clearly a step above and should open the door to many new types of applications.

要查看或添加评论，请登录

Faisal Waris的更多文章

Phi-3 Vision is a Surprisingly Useful Gem

2024年7月8日

Phi-3 Vision is a Surprisingly Useful Gem

My work involves building RAG applications for question-answering over highly technical internal company documents. The…
Constrained and Provable LLM Code Generation

2024年3月11日

Constrained and Provable LLM Code Generation

LLMs are now good at generating code but human intervention is still required. Can't accept the generated code blindly…

1 条评论
An Investigation into LLM Reasoning Capabilities (+ 'Zebra' puzzles & SMT Solvers)

2024年1月15日

An Investigation into LLM Reasoning Capabilities (+ 'Zebra' puzzles & SMT Solvers)

With the (preview) release of GPT 4 Turbo, OpenAI has updated its Technical Report on GPT performance. The results are…
An Elegant Web Application Architecture for Contemporary Times

2023年12月16日

An Elegant Web Application Architecture for Contemporary Times

It used to be that as data scientists we rarely built full-stack production applications. However, that is changing…

1 条评论
FsOpenAI: A GPT 'chat' app for Internal Organizational Data

2023年8月14日

FsOpenAI: A GPT 'chat' app for Internal Organizational Data

#chatgp #azureopenai #semantickernel #semanticsearch #fsharp Unsurprisingly, the demand for accessing Large Language…
Applying Some 'Unconventional Wisdom' to Improve Model Scoring wrt. High-Velocity Streaming Data

2023年1月17日

Applying Some 'Unconventional Wisdom' to Improve Model Scoring wrt. High-Velocity Streaming Data

Some data streams in the telecom industry can exceed the rate of 50,000 messages / second. And at about 300KB per…

7 条评论
Resource-efficient model deployment

2022年9月28日

Resource-efficient model deployment

AI/ML is now mainstream. Model scoring capacity requirements are ever-increasing.
Text Classification with BERT and .Net

2021年11月21日

Text Classification with BERT and .Net

Transformer based models are currently the state-of-the-art for text classification and other natural language related…
Graph Convolutional Network Model with a Strongly-typed Functional Language

2021年5月17日

Graph Convolutional Network Model with a Strongly-typed Functional Language

My present job requires me to work with network or graphical data formats. Graphical data are not readily amenable to…

1 条评论
Lessons learnt in moving a data science 'project' to 'product'

2020年10月11日

Lessons learnt in moving a data science 'project' to 'product'

Data science is complex and so is software engineering. The nature of contemporary technology work often requires…

1 条评论

See all articles

Revisiting Logic Puzzle with "o1"

Faisal Waris

Data Scientist/Gen. AI Strategist in the telecom industry

领英推荐

Faisal Waris的更多文章

社区洞察

其他会员也浏览了

Memorization VS genuine reasoning in LLMs

Backtesting Models at Scale

What's the difference between affine and Gaussian interest rate models?

Data Science #32

Evaluating Linear Regression Models

Data Science #27

Mathematical Models Software Factory (MMSF)

Think with the model, Plan with the code.

Demystifying RAG Architectures

Do LLMs Really Understand? Recent Papers Reveal

领英推荐

Faisal Waris的更多文章

Phi-3 Vision is a Surprisingly Useful Gem

Constrained and Provable LLM Code Generation

An Investigation into LLM Reasoning Capabilities (+ 'Zebra' puzzles & SMT Solvers)

An Elegant Web Application Architecture for Contemporary Times

FsOpenAI: A GPT 'chat' app for Internal Organizational Data

Applying Some 'Unconventional Wisdom' to Improve Model Scoring wrt. High-Velocity Streaming Data

Resource-efficient model deployment

Text Classification with BERT and .Net

Graph Convolutional Network Model with a Strongly-typed Functional Language

Lessons learnt in moving a data science 'project' to 'product'

社区洞察

其他会员也浏览了

Memorization VS genuine reasoning in LLMs

Backtesting Models at Scale

What's the difference between affine and Gaussian interest rate models?

Data Science #32

Evaluating Linear Regression Models

Data Science #27

Mathematical Models Software Factory (MMSF)

Think with the model, Plan with the code.

Demystifying RAG Architectures

Do LLMs Really Understand? Recent Papers Reveal