qdive goes to space

qdive goes to space

We’re participating in this year’s Kaggle kore competition! Competitors write algorithms that play the kore game against each other. While conceptually simple (steer your spaceships to mine the most resources and/or defeat your opponent), the game features a few challenging aspects that any agent must consider in order to win. Especially if you want to tackle the problem with Reinforcement Learning – which we of course do!

No alt text provided for this image

Reinforcement Learning (RL) is hard. Despite some big names having “solved” many complex games before (Chess, Go, StarCraft, etc.), the leaderboards of most of these smaller competitions are dominated by cleverly written rule-based algorithms. To train a RL agent, one needs patience, computing power, and a lot of trial-and-error refining the agent’s reward function as well as the state and action spaces. In this particular challenge, the latter two are further complicated by the fact that flight plans (an essential component of the game) have variable lengths. They can’t be mapped statically to a point in N-dimensional space. We sketched different approaches to this problem in the discussion forums, but it is ultimately down to the algorithm designer to decide.

Another key element in training a RL agent is to have a solid implementation of the chosen RL algorithm. This can be quite tricky, too, especially for newcomers to the field. That’s why, following Kaggle’s philosophy of competing hard but supporting each other along the way, we built and published an openAI wrapper that makes the kore environment compatible with the powerful RL library stable-baselines3.

Check it out! Reinforcement Learning baseline in Python

No alt text provided for this image

This baseline radically simplifies the process of training an RL algo on this competition and helps the community experiment with different approaches without having to start from scratch. Additionally, we demonstrate how to upload a trained agent, which has a higher degree of complexity than standard supervised competitions.

We’re not finished yet, though. There are still plenty of rule-based algos to beat before the competition closes! See you in kore!

要查看或添加评论,请登录

qdive的更多文章

社区洞察

其他会员也浏览了