How do you incorporate exploration or curiosity in PPO?
Proximal policy optimization (PPO) is a popular reinforcement learning (RL) algorithm that can learn complex policies from high-dimensional observations and actions. However, one of the challenges of PPO is to balance exploration and exploitation, that is, to find new and potentially rewarding states without losing the performance of the current policy. In this article, you will learn how to incorporate exploration or curiosity in PPO using different methods and techniques.