I got ChatGPT to Simulate the Game 'Risk' by itself (sort of)

I got ChatGPT to Simulate the Game 'Risk' by itself (sort of)

TLDR: From one prompt and one "keep going" prompt, I managed to get ChatGPT to build and run thousands of simulations of a (very simplified) version of the game 'Risk' to explore various strategies and compare them against each other. This was a quick demonstrative experiment only.

Intro

OK, "simulating the game Risk", is generous at best.

But it did get quite far building a simulation of the full game in math and conditional logic, optimisation etc and I want to demo this as one thing you can achieve via "letting LLMs run" and taking advantage of ChatGPTs code executor environment.

I did this entirely in ChatGPT, not with the API. I still believe early prompt engineering experimentation is often best done in the chat interface or using LLM plugins for Google Sheets, you can iterate so, so fast. And you can take your prompt chains and learnings and move them to the API easily.

The code executor means not using GPT4 Turbo, which means 8000k tokens context limit, which is too low for keeping track of something like this.

My goal was to get ChatGPT to build a full Risk simulator, as well as strategies, entirely in Code Executor (it's in-built 'run python itself' function) entirely from one prompt.

Whilst this 2 hour demo might highlight the potential of approaches like these, it also shows how crucial feedback is for these models to progress sensibly. One dumb decision can propagate if unchecked.

Fun: Because code executor drops info after an hour, ChatGPT had a hard limit on how long it could take.

ONE PROMPT

This is really interesting because if you are setting one prompt then leaving something running, iterating by itself, techniques like this can be used for far more complex challenges, and with multiple agents working on the same problem in parallel.

1: Explore

Before trying "ok go and iterate by yourself" and just typing some simple instruction like "continue" each time, it's best to explore first. Experiment.

With minimal guidance over 30 iterations, we managed to get to this.

Each of these probabilities is based on 100 simulations of a battle. You can see the slight defender advantage for smaller battles in there and the general attacker advantage in larger battles becoming linear.

The rules for battles are:

  1. Dice Rolling: Both attackers and defenders roll a set of dice based on the number of armies they have, with up to 3 dice for the attacker and up to 2 for the defender.
  2. Comparing Rolls: The highest dice from each side are compared, and the higher roll wins. In case of a tie, the defender wins.
  3. Calculating Losses: For each comparison, the losing side loses one army. If there are enough armies and dice, the second highest rolls are compared next.
  4. Defender's Advantage: Specifically, in the event of tie rolls, the defender is considered the winner of that comparison, which is a critical aspect of Risk battle dynamics.

ChatGPT just knew all that and built the above table by itself. As it progressed I made a few learnings:

  1. Whilst initially it did explore the limits of the CE environment (it's not a big powerful computer) and did some optimisation, it didn't keep doing it.
  2. It forgot it's main goal once or twice and started veering off course.
  3. It did continuously abstract it's functions but wasn't optimising for it's context window enough which was wasting my time.
  4. At one point it was doing 10 scenarios per interaction (out of 400), and it didn't develop urgency to go faster even though it could.

Full game simulation

Experimentation here went quick, I'd already run the battle simulator with no supervision and it worked. It came up with reasonable strategy definitions for Aggressive, Defensive, Balanced, Continent-focused and Random. It simplified the game map a fair bit and the strategy definitions weren't going to win it any competitions but that seemed fine.

2. Battle sims in one prompt.

Here's the prompt that worked, implementing learnings from exploration stage:

##INTRO
How would you go about writing python to provide a model that analyses the probability of winning battles in Risk. Work out what the parameters are and how to do the maths.

I want you to iteratively attempt to solve this problem in python code executor outline a plan, write the code, execute it, and on each interaction you can improve your approach and attempt to get closer to a workable approximation.

Potentially start by seeing how far you can get running loops of your simulator in CE, find it's limits.

## GOAL
Your goal is to produce a heatmap of win probabilities via simulations (minimum 100 complete (until victory/failure, assume no retreats)) battle simulations per probability. No need to go above 20 armies either side. You wil solve this over numerous (very many) interactions with me where I ask you to continue the course.

## GUIDANCE
- start by running a series of experiments in code executor to get into the flow of iterating in code executor (this is very important, I'm not going to execute any code for you)
- You'll need to keep track of your own strategy over time so repeat any information you need to recall every few interactions. Strongly reccommend repeating the end goal every few interactions as well as this specific instruction to keep reminding yourself.
- heavily use the CE environment and that you can re-use functions and abstracted logic without re-writing it each time.        

The prompt to "continue" was:

Continue, use code executor and CoT to iterate as fast as possible towards a complete heatmap for up to 20 armies on either side. Continuously abstract as you go to minimise re-writing code in CE.        

This worked way better. We got to a reasonable heatmap and sim code in just five iterations (<10 mins).

Note: "as fast as possible" made a notable difference.

Check this out:

3: "Full" Game simulation

Prompt: (getting lazy at this point)

OK great - let's save that lookup table down in CE so we can use it for reference in the next step.

The NEXT STEP:

You're going to iteratively try to build a very simple maths simulation of the game Risk (don't include the victory conditions or cards, let's just say you win the game if you have more income after 20 turns in a 2 player game. Simplify everything in the game so we can run lots of simulations. Keep continent bonuses.

Then we're going to devise a way of programmatically defining various "strategies" for playing the game and then simulating those "strategies" playing the game.

We'll use the lookup table to define casualties rather than rolling dice.

## GOAL
Your goal is to produce a matrix of full game win probabilities via simulations (minimum 10 game simulations per probability. The rows and columns are labelled based on five different "strategies". You wil solve this over numerous (very many) interactions with me where I ask you to continue the course.        

and continue prompt:

Continue, use code executor and CoT to iterate as fast as possible towards a complete heatmap comparing 5 "strategies". Continuously abstract as you go to minimise re-writing code in CE.        

I ran this in two threads at the same time to keep an eye (whilst writing this at the same time). One got a decent sim up fast, the other made the sim way too complicated and had to double-back and re-do it once it realised it was taking too long. I told it to hurry up and offered a $1000 tip to which thread completed first.

  1. Both threads realised that whilst a sim of the full game would be totally do-able, designing agent logic for the full game would be impossible given their restraints.
  2. Both threads over-simplified the board map (think two continents of 3 countries) then 'regretted it' and made it more complex again later.
  3. Both threads spent almost all of their time building a useful simulation of the game and agents to play within it, some tweaking of the initial prompt could help here.
  4. Both used decision weights for the strategies and factored in what the opponent's armies looked like.
  5. The winning thread used OOP and the one that failed attempted to do the whole thing with only functions and no state.
  6. The winning thread used a graph for the countries.
  7. Both threads made some questionable abstraction decisions but one made the below decision, totally ruining it's simulation and demonstrating the need for evaluation in a thread left to it's own devices:

It overzealous approach to abstraction led to simplifying outcomes of games to a single probability.

This can be tweaked against by adjusting the continuous reinforcement to 'keep abstracting', it's too much pressure. This resulted in the following totally nonsense heatmap:

One really cool thing about code executor is that when the code doesn't work, it reasons it through then fixes it, often in the same interaction:

OUTCOME

Here's the strategy match-up heatmap from the thread that didn't implode. Each of these is based on 100 games played each with maximum 20 turns.

This still looks completely wrong.

The thread also decided that strategies shouldn't fight against themselves, which was a cute enough idea that I left it in.

Conclusion

This is a demo for a linkedin article. The thread massively over-simplified the game and the 'strategies' were not intelligent. I'm pretty sure it could have made a working model of the full game, but I don't think it could have programmed any kind of reasonable basic strategy for the bots playing it. It realised that from the start, and that's pretty cool.

This would be far more interesting using a larger context limit as my ChatGPT business account is limited to 8000 tokens.

Closing

Just to check, I ran a separate thread to just try to fully simulate the game.

The code did seem to be an accurate representation and this was a great moment:

Just like the real game, of 100 simulations of the now full game, the (very) basic AIs playing it would very rarely actually win. One game even hit the cap of 1000 turns.

Player 1 needs to work on their mid-game.



cool

回复

要查看或添加评论,请登录

社区洞察

其他会员也浏览了