登录查看更多内容

Dancing With the Skulls

Samer P. Francy, MSc, CSE, PMP

Building future of AI

发布日期: 2024年6月29日

Working in the field of Artificial Intelligence (AI) is full of excitements and one of the most excitement moments for me was watching an AI agent dancing with skulls in the challenging game of Montezuma's Revenge . This was part of OpenAI's breakthrough with Random Network Distillation (RND) and Reinforcement Learning (RL) in a paper published back in 2018 called “Exploration by Random Network Distillation”.

Let me first explore RL and the challenges that made OpenAI come up with RND to conquer. In simple words, RL is a computational approach to goal-directed learning from interaction, where an agent learns to make decisions by taking actions in an environment to maximize cumulative rewards over time. The process involves an agent observing the state of the environment, taking an action that causes a change in the state, and receiving a reward that guides future actions to achieve long-term goals.

RL can solve a variety of complex, goal-directed problems where an agent must learn to make a sequence of decisions through interaction with its environment. Key examples include DeepMind's AlphaGo Zero, which learned to play Go and outperform the human world champion (I really recommend watching the documentary movie AlphaGo ), and TD-Gammon, which achieved superhuman performance in backgammon. RL has also been used to play Atari arcade games using pixel inputs and training robotic agents for competitions like RoboCup. RL methods are useful for any problem that requires sequential decision-making to achieve a goal.

Many algorithms have been used for RL such as Monte Carlo, Dynamic Programming, and Temporal-Differences (Sarsa, Q-Learning, and Dyna-Q) for tabular environments and Deep RL (DQN), REINFORCE, and DDPG for continuous environments. As most of the Atari arcade games are continuous environments, DQN was used widely to train the AI agent by using Convolutional Neural Network (CNN) with the input (pixels). DQN was able to achieve superhuman performance in many games and below human performance in others but scored zero points in Montezuma's Revenge game!

DQN performed very poorly in the game Montezuma's Revenge due to the challenges presented by the game's sparse rewards and the complexity of the required action sequences. In the initial state, the agent must navigate through a series of precise actions to reach the key, the first reward in the game. Since rewards are only given when the key is collected and when a door is unlocked, the agent has to rely purely on random exploration to find the key. The probability of the agent randomly executing the correct sequence of actions to reach the key from the starting state is extremely low. This sparse reward problem makes it difficult for traditional RL methods like DQN, which depend on frequent rewards to learn effectively. Additionally, the credit-assignment problem complicates learning further. The agent struggles to determine which of the many exploratory actions it took contributed to achieving the reward. Montezuma's Revenge represents a hard exploration problem, where finding effective strategies requires overcoming significant obstacles related to sparse rewards and the propagation of reward information back through the sequence of actions. As a result, DQN and similar agents often fail to progress beyond the initial stages of the game.

To address this challenge, OpenAI introduced RND as a form of intrinsic motivation—curiosity. The main idea is to reward the agent not just for achieving explicit goals but also for exploring new, unknown states. RND leverages a fixed random neural network and a predictor network. The fixed network outputs a random projection of the state, while the predictor network attempts to predict these projections. The difference between the predicted and actual outputs (the prediction error) serves as an intrinsic reward. When an agent encounters a novel state, the prediction error is high, encouraging the agent to explore more. One of the significant advantages of RND is its strength against deceptive states, such as the noisy-TV problem. In scenarios where random distractions can mislead the agent, RND maintains focus on genuine exploration by not being easily tricked by random, high-variance states. The application of RND has shown remarkable improvements in exploration-heavy tasks. In Montezuma's Revenge, RND agent outperform the traditional RL agents by a big margin, showing the potential of curiosity-driven learning in tackling complex, sparse-reward scenarios.

Now, picture an AI agent stepping into the chambers of Montezuma's Revenge, only to start an unexpected dance with the skulls scattered around. The agent, driven by its curiosity-based RND algorithm, wasn't just avoiding traps or collecting keys. Because the novelty of these unsettling interactions produced large prediction errors in its neural network, it was moving around skeletons and dancing with joy.

You can watch the dance and the whole AI agent performance in below youtube link:

Data & Analytics 1 个月前

How Generative AI is Transforming the World of Gaming

Giovanni Sisinna 1 年前

Topic: How to create deepfakes. Art & web3 news

Marina Nadeeva 1 年前

https://youtu.be/40VZeFppDEM

RND represented an addition to RL, highlighting the importance of intrinsic rewards in enhancing exploration. By encouraging curiosity, RND enabled the AI agent to "dance with the skulls" of challenging environments, unlocking new levels of performance and understanding.

References:

University of Bath

https://openai.com/blog/reinforcement-learning-with-prediction-based-rewards/

Sutton, R.S. and Barto, A.G., 2018.?Reinforcement learning: An introduction. MIT Press

https://arxiv.org/abs/1810.12894

https://www.semanticscholar.org/paper/Human-level-control-through-deep-reinforcement-Mnih-Kavukcuoglu/340f48901f72278f6bf78a04ee5b01df208cc508

https://youtu.be/40VZeFppDEM

Dancing With the Skulls

Samer P. Francy, MSc, CSE, PMP

Building future of AI

领英推荐

更多精彩文章

社区洞察

其他会员也浏览了

The Invisible Limitations of Artificial Intelligence -- Why AI Often Hurts Your Chess & May Limit Your Learning in Life

Generative AI - Short & Sweet 08 - ?? AlphaFold’s and its impact on R&D ??

Engineering AI in the Age of Machine Learning from the Voice of the Senior AI Engineer at DICE

AlphaGo: The Game-Changing AI that Mastered Go

AI in Gaming – the Future of Player Engagement?

AI vs AI: The Future of Gaming Battles

AI Experts Predict 2020 Trends!

How we are using artificial intelligence in building Trial Xtreme Freedom

Meet the Reinforced Learning Method in Games

Dr. Deepfake or: How I Learned to Stop Worrying and Love Deepfakes

领英推荐

CRISP-DM Process for Machine Learning Projects

2024年6月10日

Improving Digital Transformation in Iraq Using Artificial Intelligence

2024年5月25日

????? ?????? ?????? ?? ?????? ???????? ?????? ?????????

2024年5月21日

The Essential Role of AI Project Managers in Driving Success

2024年4月24日

The Need for Telecommunication Companies to Utilize AI Teams

2024年4月7日

The Emergence of AI Departments

2024年3月15日

Can AI lie?

2023年11月6日

Object Detection from Traditional Techniques to Modern Deep Learning Approaches

2023年8月26日

Edge AI: Model Compression Techniques for Convolutional Neural Networks

2023年8月21日

Price Negotiation Strategies For Different Buyers

2020年10月18日