登录查看更多内容

Constrained and Provable LLM Code Generation

Faisal Waris

Data Scientist/Gen. AI Strategist in the telecom industry

发布日期: 2024年3月11日

LLMs are now good at generating code but human intervention is still required. Can't accept the generated code blindly - except perhaps for well-studied, academic tasks or trivial tasks.

Programming languages are powerful, their scope is large, and it's hard for the LLM to get everything right. What if instead of generating programs in a full-featured language, we constrain the LLM to generate first-order logic formulas from human readable text. Such formulas can be fed to an SMT solver to find provably correct solutions.

The main idea is to reduce the burden on the LLM and as a trade-off get more certainty and confidence around the generated output.

I was surprised to find that the GPT models are well versed with the SMTLIB format for expressing logic formulas. As an experiment, I asked GPT to translate a simple 'Zebra'-type puzzle to a SMTLIB formula to see if it could be effectively solved with the Z3 SMT solver.

Puzzle

Here is the text description of the puzzle (source: AHAPuzzles.com)

**context**
Four friends are cycling around the city on their bikes. Each one has a different name, a different age, is riding a different bike, and have brought a different sandwich. Use the clues to find out their characteristics.

Friends: David, Eric, Gregory, Ivan
Bike Colors: Blue, Pink, Red, Yellow
Ages: 10, 11, 12, 14
Sandwiches: Cheese, Peanut Butter, Roast Beef, Turkey

**clues**
The 12-year-old cyclist is in the third position.
The cyclist with the Yellow bike is immediately after Ivan.
The 14-year-old cyclist has a Cheese sandwich.
The boy with the Peanut Butter sandwich is directly before the boy with the Turkey sandwich.
Eric is the boy who has a Peanut Butter sandwich.
The 11-year-old boy is in the last position.
Cyclist Gregory is 14 years old.
The cyclist with the Red bike is next to the cyclist named David.
Ivan is the one riding the Blue bike.
The 14-year-old boy is in the second position.

Find the position, bike color, age and sandwich of each friend, given the context and the clues

SMTLIB Formula

After some simple prompt engineering, I got GPT-4 to generate the following, correct representation of the above puzzle in SMTLIB format:

; Declare the constants
(declare-const David Int)
(declare-const Eric Int)
(declare-const Gregory Int)
(declare-const Ivan Int)

(declare-const Blue Int)
(declare-const Pink Int)
(declare-const Red Int)
(declare-const Yellow Int)

(declare-const Age10 Int)
(declare-const Age11 Int)
(declare-const Age12 Int)
(declare-const Age14 Int)

(declare-const Cheese Int)
(declare-const PeanutButter Int)
(declare-const RoastBeef Int)
(declare-const Turkey Int)

; Declare the constraints
(assert (distinct David Eric Gregory Ivan))
(assert (distinct Blue Pink Red Yellow))
(assert (distinct Age10 Age11 Age12 Age14))
(assert (distinct Cheese PeanutButter RoastBeef Turkey))

(assert (and (>= David 1) (<= David 4)))
(assert (and (>= Eric 1) (<= Eric 4)))
(assert (and (>= Gregory 1) (<= Gregory 4)))
(assert (and (>= Ivan 1) (<= Ivan 4)))

(assert (and (>= Blue 1) (<= Blue 4)))
(assert (and (>= Pink 1) (<= Pink 4)))
(assert (and (>= Red 1) (<= Red 4)))
(assert (and (>= Yellow 1) (<= Yellow 4)))

(assert (and (>= Age10 1) (<= Age10 4)))
(assert (and (>= Age11 1) (<= Age11 4)))
(assert (and (>= Age12 1) (<= Age12 4)))
(assert (and (>= Age14 1) (<= Age14 4)))

(assert (and (>= Cheese 1) (<= Cheese 4)))
(assert (and (>= PeanutButter 1) (<= PeanutButter 4)))
(assert (and (>= RoastBeef 1) (<= RoastBeef 4)))
(assert (and (>= Turkey 1) (<= Turkey 4)))

; Clues
(assert (= Age12 3))
(assert (= Yellow (+ Ivan 1)))
(assert (= Age14 Cheese))
(assert (= (- PeanutButter Turkey) 1))
(assert (= Eric PeanutButter))
(assert (= Age11 4))
(assert (= Gregory Age14))
(assert (or (= Red (+ David 1)) (= Red (- David 1))))
(assert (= Ivan Blue))
(assert (= Age14 2))

(check-sat)
(get-model)

I was able to run this program in Z3 and obtain the correct solution - without any modification. BTW GPT 3.5 turbo was not able to generate consistently correct SMTLIB formulas, from the same text description.

领英推荐

Unlocking AI with Mojo

developrec 10 个月前

Exploring Julia: The dynamic Language Shaping Modern…

Realtime Recruitment 11 个月前

The Most Common Programming Languages in 2025:…

Pinecone Studio 1 个月前

Update: Apparently, I was wrong in assessing that GPT-4 encoded the problem correctly. Thanks to Jason Wells for pointing this out (see this issue) there is a very subtle error in encoding of the problem. So subtle that it is very easy to miss it, if one is not paying close attention. The larger point about constrained LLM code generation still holds, however. If anything, the assigned tasks should be even simpler than the one showcased here.

Remarks

To make LLMs more useful for us, I am of the opinion that we should NOT have them generate programs in full-featured programming languages - despite what the current hype suggests.

For the time being, we should structure a complex task such that it can be split apart into smaller pieces which can then be solved with code generation of more constrained (and ideally provably correct) languages such as SMTLIB.

SMTLIB (or equivalent) may also be used as the intermediate representation of a problem. If a solution is found in the intermediate space, the final solution can be deterministically derived from it (e.g. by transforming it into program code or function invocation chains). The intermediate representation supports the core reasoning, unencumbered by any spurious concerns (e.g. programming language syntax correctness).

See this GitHub page for the related code.

Julien V.

Xtrem Muscular Platinium Tech Lead Partner chez Euro Information (2theMax)

11 个月

As it's not always easy to write asserts in a smt solver, is a really good idea to use LLM to generate it, nice article!

1 次回应

要查看或添加评论，请登录

Faisal Waris的更多文章

Revisiting Logic Puzzle with "o1"

2024年9月29日

Revisiting Logic Puzzle with "o1"

A few months ago, I tested the reasoning capabilities of then latest 'gpt' model. Here is a link to that article: An…
Phi-3 Vision is a Surprisingly Useful Gem

2024年7月8日

Phi-3 Vision is a Surprisingly Useful Gem

My work involves building RAG applications for question-answering over highly technical internal company documents. The…
An Investigation into LLM Reasoning Capabilities (+ 'Zebra' puzzles & SMT Solvers)

2024年1月15日

An Investigation into LLM Reasoning Capabilities (+ 'Zebra' puzzles & SMT Solvers)

With the (preview) release of GPT 4 Turbo, OpenAI has updated its Technical Report on GPT performance. The results are…
An Elegant Web Application Architecture for Contemporary Times

2023年12月16日

An Elegant Web Application Architecture for Contemporary Times

It used to be that as data scientists we rarely built full-stack production applications. However, that is changing…

1 条评论
FsOpenAI: A GPT 'chat' app for Internal Organizational Data

2023年8月14日

FsOpenAI: A GPT 'chat' app for Internal Organizational Data

#chatgp #azureopenai #semantickernel #semanticsearch #fsharp Unsurprisingly, the demand for accessing Large Language…
Applying Some 'Unconventional Wisdom' to Improve Model Scoring wrt. High-Velocity Streaming Data

2023年1月17日

Applying Some 'Unconventional Wisdom' to Improve Model Scoring wrt. High-Velocity Streaming Data

Some data streams in the telecom industry can exceed the rate of 50,000 messages / second. And at about 300KB per…

7 条评论
Resource-efficient model deployment

2022年9月28日

Resource-efficient model deployment

AI/ML is now mainstream. Model scoring capacity requirements are ever-increasing.
Text Classification with BERT and .Net

2021年11月21日

Text Classification with BERT and .Net

Transformer based models are currently the state-of-the-art for text classification and other natural language related…
Graph Convolutional Network Model with a Strongly-typed Functional Language

2021年5月17日

Graph Convolutional Network Model with a Strongly-typed Functional Language

My present job requires me to work with network or graphical data formats. Graphical data are not readily amenable to…

1 条评论
Lessons learnt in moving a data science 'project' to 'product'

2020年10月11日

Lessons learnt in moving a data science 'project' to 'product'

Data science is complex and so is software engineering. The nature of contemporary technology work often requires…

1 条评论

See all articles

Constrained and Provable LLM Code Generation

Faisal Waris

Data Scientist/Gen. AI Strategist in the telecom industry

Puzzle

SMTLIB Formula

领英推荐

Remarks

Faisal Waris的更多文章

社区洞察

其他会员也浏览了

Top AI Programming Languages for 2024

Analyzing Energy Consumption of CPython

We Speak Your Language - Manuel Serrano

Exploring the Power of Mojo?? Programming Language

Empathy-Driven Error Messages

Introducing Codestral: Mistral AI's Game-Changing Approach to Code Generation

Hyperlambda, a meta programming language

Coding Challenge #81 - Brainf*ck Interpreter

Mastering Simplicity and Efficiency: The Enduring Legacy of the Occam Programming Language

Processor Design #4: Assembly Language

Puzzle

SMTLIB Formula

领英推荐

Remarks

Faisal Waris的更多文章

Revisiting Logic Puzzle with "o1"

Phi-3 Vision is a Surprisingly Useful Gem

An Investigation into LLM Reasoning Capabilities (+ 'Zebra' puzzles & SMT Solvers)

An Elegant Web Application Architecture for Contemporary Times

FsOpenAI: A GPT 'chat' app for Internal Organizational Data

Applying Some 'Unconventional Wisdom' to Improve Model Scoring wrt. High-Velocity Streaming Data

Resource-efficient model deployment

Text Classification with BERT and .Net

Graph Convolutional Network Model with a Strongly-typed Functional Language

Lessons learnt in moving a data science 'project' to 'product'

社区洞察

其他会员也浏览了

Top AI Programming Languages for 2024

Analyzing Energy Consumption of CPython

We Speak Your Language - Manuel Serrano

Exploring the Power of Mojo?? Programming Language

Empathy-Driven Error Messages

Introducing Codestral: Mistral AI's Game-Changing Approach to Code Generation

Hyperlambda, a meta programming language

Coding Challenge #81 - Brainf*ck Interpreter

Mastering Simplicity and Efficiency: The Enduring Legacy of the Occam Programming Language

Processor Design #4: Assembly Language