I got ChatGPT to Simulate the Game 'Risk' by itself (sort of)
Alex Foster
Partner @Invisible. Foundation models lead. Name a large language model: we're probably helping to train it. Pursuing industry-leading data quality by (I know it sounds crazy) treating smart humans like smart humans.
TLDR: From one prompt and one "keep going" prompt, I managed to get ChatGPT to build and run thousands of simulations of a (very simplified) version of the game 'Risk' to explore various strategies and compare them against each other. This was a quick demonstrative experiment only.
Intro
OK, "simulating the game Risk", is generous at best.
But it did get quite far building a simulation of the full game in math and conditional logic, optimisation etc and I want to demo this as one thing you can achieve via "letting LLMs run" and taking advantage of ChatGPTs code executor environment.
I did this entirely in ChatGPT, not with the API. I still believe early prompt engineering experimentation is often best done in the chat interface or using LLM plugins for Google Sheets, you can iterate so, so fast. And you can take your prompt chains and learnings and move them to the API easily.
The code executor means not using GPT4 Turbo, which means 8000k tokens context limit, which is too low for keeping track of something like this.
My goal was to get ChatGPT to build a full Risk simulator, as well as strategies, entirely in Code Executor (it's in-built 'run python itself' function) entirely from one prompt.
Whilst this 2 hour demo might highlight the potential of approaches like these, it also shows how crucial feedback is for these models to progress sensibly. One dumb decision can propagate if unchecked.
Fun: Because code executor drops info after an hour, ChatGPT had a hard limit on how long it could take.
ONE PROMPT
This is really interesting because if you are setting one prompt then leaving something running, iterating by itself, techniques like this can be used for far more complex challenges, and with multiple agents working on the same problem in parallel.
1: Explore
Before trying "ok go and iterate by yourself" and just typing some simple instruction like "continue" each time, it's best to explore first. Experiment.
With minimal guidance over 30 iterations, we managed to get to this.
Each of these probabilities is based on 100 simulations of a battle. You can see the slight defender advantage for smaller battles in there and the general attacker advantage in larger battles becoming linear.
The rules for battles are:
ChatGPT just knew all that and built the above table by itself. As it progressed I made a few learnings:
Full game simulation
Experimentation here went quick, I'd already run the battle simulator with no supervision and it worked. It came up with reasonable strategy definitions for Aggressive, Defensive, Balanced, Continent-focused and Random. It simplified the game map a fair bit and the strategy definitions weren't going to win it any competitions but that seemed fine.
2. Battle sims in one prompt.
Here's the prompt that worked, implementing learnings from exploration stage:
##INTRO
How would you go about writing python to provide a model that analyses the probability of winning battles in Risk. Work out what the parameters are and how to do the maths.
I want you to iteratively attempt to solve this problem in python code executor outline a plan, write the code, execute it, and on each interaction you can improve your approach and attempt to get closer to a workable approximation.
Potentially start by seeing how far you can get running loops of your simulator in CE, find it's limits.
## GOAL
Your goal is to produce a heatmap of win probabilities via simulations (minimum 100 complete (until victory/failure, assume no retreats)) battle simulations per probability. No need to go above 20 armies either side. You wil solve this over numerous (very many) interactions with me where I ask you to continue the course.
## GUIDANCE
- start by running a series of experiments in code executor to get into the flow of iterating in code executor (this is very important, I'm not going to execute any code for you)
- You'll need to keep track of your own strategy over time so repeat any information you need to recall every few interactions. Strongly reccommend repeating the end goal every few interactions as well as this specific instruction to keep reminding yourself.
- heavily use the CE environment and that you can re-use functions and abstracted logic without re-writing it each time.
The prompt to "continue" was:
Continue, use code executor and CoT to iterate as fast as possible towards a complete heatmap for up to 20 armies on either side. Continuously abstract as you go to minimise re-writing code in CE.
This worked way better. We got to a reasonable heatmap and sim code in just five iterations (<10 mins).
Note: "as fast as possible" made a notable difference.
Check this out:
领英推荐
3: "Full" Game simulation
Prompt: (getting lazy at this point)
OK great - let's save that lookup table down in CE so we can use it for reference in the next step.
The NEXT STEP:
You're going to iteratively try to build a very simple maths simulation of the game Risk (don't include the victory conditions or cards, let's just say you win the game if you have more income after 20 turns in a 2 player game. Simplify everything in the game so we can run lots of simulations. Keep continent bonuses.
Then we're going to devise a way of programmatically defining various "strategies" for playing the game and then simulating those "strategies" playing the game.
We'll use the lookup table to define casualties rather than rolling dice.
## GOAL
Your goal is to produce a matrix of full game win probabilities via simulations (minimum 10 game simulations per probability. The rows and columns are labelled based on five different "strategies". You wil solve this over numerous (very many) interactions with me where I ask you to continue the course.
and continue prompt:
Continue, use code executor and CoT to iterate as fast as possible towards a complete heatmap comparing 5 "strategies". Continuously abstract as you go to minimise re-writing code in CE.
I ran this in two threads at the same time to keep an eye (whilst writing this at the same time). One got a decent sim up fast, the other made the sim way too complicated and had to double-back and re-do it once it realised it was taking too long. I told it to hurry up and offered a $1000 tip to which thread completed first.
This can be tweaked against by adjusting the continuous reinforcement to 'keep abstracting', it's too much pressure. This resulted in the following totally nonsense heatmap:
One really cool thing about code executor is that when the code doesn't work, it reasons it through then fixes it, often in the same interaction:
OUTCOME
Here's the strategy match-up heatmap from the thread that didn't implode. Each of these is based on 100 games played each with maximum 20 turns.
This still looks completely wrong.
The thread also decided that strategies shouldn't fight against themselves, which was a cute enough idea that I left it in.
Conclusion
This is a demo for a linkedin article. The thread massively over-simplified the game and the 'strategies' were not intelligent. I'm pretty sure it could have made a working model of the full game, but I don't think it could have programmed any kind of reasonable basic strategy for the bots playing it. It realised that from the start, and that's pretty cool.
This would be far more interesting using a larger context limit as my ChatGPT business account is limited to 8000 tokens.
Closing
Just to check, I ran a separate thread to just try to fully simulate the game.
The code did seem to be an accurate representation and this was a great moment:
Just like the real game, of 100 simulations of the now full game, the (very) basic AIs playing it would very rarely actually win. One game even hit the cap of 1000 turns.
SaaS | B2B | Python
5 个月cool