ChatGPT and the wolf, goat, and cabbage problem
Mircea Grecu
Associate Research Scientist at Goddard Earth Sciences Technology & Research
Problem Statement
The wolf, goat and cabbage problem is a logic puzzle dating back to at least the?9th century. Being an aficionado of?Good Old Fashion AI (GOFAI), I was curious how ChatGPT handles such problems. I am an aficionado of ChatGPT as well, but I don’t see it as a replacement for GOFAI.
The problem states that: “A man went to a market and purchased a wolf, a goat, and a cabbage. On his way home, he arrived at a river, which he had to cross over a narrow bridge. But crossing the river over the bridge, the farmer could carry only himself and a single one of his purchases: the wolf, the goat, or the cabbage. If left unattended together, the wolf would eat the goat, or the goat would eat the cabbage. The man’s challenge was to carry himself and his purchases to the far bank of the river, leaving each purchase intact. How would he proceed?”
First attempt. No prompt crafting
The first answer that I got from ChatGPT is the following:
The farmer would proceed as follows:
1. The man starts by taking the goat across the bridge, leaving the wolf and the cabbage on the original side.
2. After crossing the bridge with the goat, the man leaves the goat on the far side and goes back across the bridge alone.
3. The man then takes the cabbage across the bridge, but he can’t leave it alone with the goat on the far side. So he brings the cabbage back to the original side.
4. Next, the man takes the wolf across the bridge, leaving it on the far side with the goat.
5. He goes back across the bridge alone, leaving the wolf and the goat on the far side.
6. Finally, the man goes back across the bridge one last time to get the cabbage, bringing it to the far side to join the wolf and the goat.
Now all three purchases (the wolf, goat, and cabbage) are on the far side of the bridge, and the man has successfully solved the problem while ensuring that none of the purchases were eaten by another.
This solution was obviously wrong, and I tried to split the problem into smaller and simpler sub-problems that ChatGPT could correctly handle. My hope was that ChatGPT could generate from “verbal” instructions alone pieces of code that could be assembled into a complete solution. But I could not achieve that. ChatGPT kept generating incorrect code (easily correctable by me, but not by ChatGPT itself with “verbal” input from me). Here is how it went.
领英推荐
Additional attempts with prompt crafting
Given that prompt engineering is generally associated with very specific techniques that I haven’t used in my interactions with ChatGPT, I am calling my attempts to guide ChatGPT through prompting prompt crafting. In my second attempt, I asked ChatGPT to come up with a python variable describing where the man, wolf, goat, and cabbage are at various steps of the solution. ChatGPT came up with the following:
Sure, we can define a list of tuples to keep track of the state of the man and his purchases at every step of the solution. Each tuple will contain four elements: the location of the man (0 for the original side, 1 for the far side), the location of the wolf (0 or 1), the location of the goat (0 or 1), and the location of the cabbage (0 or 1).
Here’s how we can define the list variable?solution?to keep track of the state of the problem:
solution = [(0, 0, 0, 0)] # Start with all items on the original side
# Take the goat across the bridge
solution.append((1, 0, 1, 0))
# Return alone to the original side
solution.append((0, 0, 1, 0))
# Take the cabbage across the bridge
solution.append((1, 0, 1, 1))
# Return with the goat to the original side
solution.append((0, 0, 1, 1))
# Take the wolf across the bridge
solution.append((1, 1, 1, 1))
# Return alone to the original side
solution.append((0, 1, 1, 1))
# Take the cabbage across the bridge
solution.append((1, 1, 1, 0))
I now can see why prompt engineers are paid $175,000 to $335,000 per year. ChatGPT tends to jump the gun and do more than it is asked, which makes progress very slow and requires significant mental energy from the prompter.
In the next exchange, I made the observation that (0,0,1,1) is not an acceptable solution because the goat would eat the cabbage. That was probably a mistake (from the prompt engineering perspective) because it triggered ChatGPT into formulating a more complex, but still wrong, solution, i.e.:
# Initialize the starting state
state = {'man': 'left', 'wolf': 'left', 'goat': 'left', 'cabbage': 'left'}
# Define the list of valid moves
moves = [{'man': 'right'}, {'man': 'left', 'goat': 'right'},
{'man': 'right', 'goat': 'right'}, {'man': 'left', 'cabbage': 'right'},
{'man': 'right', 'cabbage': 'right'}, {'man': 'left', 'wolf': 'right'},
{'man': 'right', 'wolf': 'right'}, {'man': 'left'}]
# Define a function to check if the state is valid
def is_valid(state):
if state['goat'] == state['wolf'] and state['man'] != state['wolf']:
return False
if state['goat'] == state['cabbage'] and state['man'] != state['cabbage']:
return False
return True
# Define a list to store the solution
solution = []
# Start with the initial state
solution.append(state)
# Move the man and the goat to the right bank
state = {**state, **moves[1]}
solution.append(state)
# Move the man back to the left bank
state = {**state, **moves[0]}
solution.append(state)
# Move the man and the cabbage to the right bank
state = {**state, **moves[3]}
solution.append(state)
# Move the man back to the left bank, along with the goat
state = {**state, **moves[7], **moves[0]}
solution.append(state)
# Move the man and the wolf to the right bank
state = {**state, **moves[6]}
solution.append(state)
# Move the man back to the left bank, along with the goat
state = {**state, **moves[0], **moves[1]}
solution.append(state)
# Move the man and the cabbage to the right bank
state = {**state, **moves[4]}
solution.append(state)
# Verify that the final state is valid
if not is_valid(state):
print('Invalid final state')
else:
# Print the solution
print('Solution:')
for i, s in enumerate(solution):
print(f'Step {i}: {s}')
Given that I was going nowhere (I tried other but similar things in different sessions) either due to my inability to find smart prompts or ChatGPT’s inability to act upon them, I switched to Github Copilot. Specifically, I wrote a function that generates the moves based on ChatGPT’s “notations”, and another one to update the state given a move (Copilot helped with that) and then asked Copilot to solve the problem using a backtracking method. The outcome was:
Github Copilot Solution
state = {'man': 'left', 'wolf': 'left', 'goat': 'left', 'cabbage': 'left'}
final_state={'man': 'right', 'wolf': 'right', 'goat': 'right', 'cabbage': 'right'}
import copy
# Define a function to check if the state is valid
def is_valid(state):
if state['goat'] == state['wolf'] and state['man'] != state['wolf']:
return False
if state['goat'] == state['cabbage'] and state['man'] != state['cabbage']:
return False
return True
def find_moves(state):
moves=[['man']]
for objs in ['wolf','goat','cabbage']:
if state[objs]==state['man']:
moves.append(['man',objs])
#print(moves)
return moves
def update_state(state,move):
new_state=copy.deepcopy(state)
for m1 in move:
if state[m1]=='left':
new_state[m1]='right'
else:
new_state[m1]='left'
if is_valid(new_state):
return new_state
else:
return False
# use backtracking to find the solution to get from state to final_state using the moves
def solve(state,final_state,moves):
if state==final_state:
return True
else:
for move in moves:
new_state=update_state(state,move)
if new_state and new_state not in stateL: # avoid loops (not in the original copilot solution)
stateL.append(new_state)
new_moves=find_moves(new_state) # find the moves from the new state (not in the original copilot solution)
if solve(new_state,final_state,moves):
return True
return False
stateL=[state]
solve(state,final_state,find_moves(state))
for state1 in stateL:
print(state1)
This solution works. The Copilot-generated code needed some tweaking (i.e. a check to avoid repetitive solutions and a statement to generate new moves for each new state), but it was orders of magnitude easier and faster to get a solution than crafting prompts for ChatGPT. Under the hood, ChatGPT and Github Copilot are obviously powered by the same model. It is the interaction between the user and the model and possibly some fine tuning that makes all the difference.
Conclusions
In conclusion, I don’t think ChatGPT has great logical abilities (at least not version 3.5). A human is still required in the loop (or a symbolic logic tool assuming that the problem can be easily formulated into a symbolic language). The Github Copilot application is significantly more useful than ChatGPT for this type of problems because it more seamlessly interacts with humans and does not break the flow by providing corrections that contain similar (in nature, but different in notations) errors or additional ones.
I find Github Copilot an amazing tool. Not only because it effectively and painlessly provided a critical part of this problem’s solution, but because it proved useful to me in a large variety of tasks: I/O, visualization, linear algebra, ML problems, etc.. Or in short, less is more.
Director, Investment Data Science (Discretionary)
1 年This is an interesting test case indeed. I tried the same question, but with a Tree-of-thought prompt. it got the answer on first attempt https://chat.openai.com/share/de2bd47f-efc4-491d-98c2-6b581470c2d4