登录查看更多内容

Decentralized and Scalable Multi-Agent Reinforcement Learning

Jesus Rodriguez

CEO of IntoTheBlock, Co-Founder, Co-Founder of LayerLens, Faktory,and NeuralFabric, Founder of The Sequence AI Newsletter, Guest Lecturer at Columbia, Guest Lecturer at Wharton Business School, Investor, Author.

发布日期: 2018年8月15日

When we think about training or learning processes in deep learning solution we typically visualize centralized models. In those architectures a series of central nodes collect and curate datasets which are used to train the models that are deployed across different nodes in a network. Even in distributed scenarios such as multi-agent reinforcement learning(MARL) that can include tens of thousands of nodes running a model the learning models rely on a handful of centralized nodes.

Centralized learning is conceptually simple to implement but incredibly hard to scale. Imagine an internet of things(IOT) scenario with hundreds of thousands of devices collecting data and executing a reinforcement learning model. If each agents needs to collect the data, send it to a central server and interact with it to optimize its learning policy, the complexity of the architecture increases linearly with the number of agents. Furthermore, in many distributed scenarios, we would like agents to learn and optimize their policies real time which is almost impossible to achieve with centralized models. Recently, researchers from artificial intelligence(AI) powerhouse Prowler.io published a paper in which they introduced a method for what they called “Distributed Actor-Critic Reinforcement Learning”. The proposed learning method is called Diff-DAC and I prefer to refer to it as decentralized learning as it targets MARL topologies that are not only distributed but they lack central coordinators.

The Task Similarity Learning Principle

Multi-agent reinforcement learning(MARL) scenarios are, practically speaking, one of the most complex deep learning architectures to implement. Game theory, distributed programming and unsupervised learning all collide in MARL scenarios to create an incredibly challenging environment for data scientists and developers. Consider a MARL models with hundreds of thousands of nodes that can learn several tasks. In a typical centralized MARL topology, the complexity of the architecture is dictated by two disjointed factors: the number of nodes and the number of tasks. As more nodes are added to the network the communication with the centralized coordinator becomes more complex. As the agents need to learn new tasks, the central coordinator is forced to coordinate learning policies across arbitrary number of nodes in the network.

Diff-DAC is based on a very simple but incredibly powerful observation: “Similar tasks in MARL scenarios tend to have similar learning policies”.When adjusting temperatures in a wireless network of thermostats, for instance, or setting meeting agendas via virtual assistants, tasks can be enough alike that they can be performed using similar policies. I like to call this insight the Task Similarity Learning Principle and it can lead to powerful optimization models in MARL scenarios.

Diff-DAC

The Task Similarity Learning Principle basically means that, if an RL agent learns a specific task policy, other agents in the network performing similar tasks can leverage that policy. Leveraging that idea, Diff-DAC structures a MARL topology as a connected graph in which there are paths between nodes performing similar tasks. In that model, each agent learns from data gathered and processed for its own task. It then exchanges learned parameters with only its closest neighbors, so that all agents benefit from their neighbors’ learning processes. The following graph illustrates the Diff-DAC MARL approach. Colors in the graphic represent the spreading local consensus of learned parameters through the network. Eventually, the network would converge to a single solution (and color) for all the tasks.

The Diff-DAC architecture is completely decentralized. The model replaces a central coordinator with a connected graph in which the agents learn independently and then share some intermediate parameters with their neighbors. By communicating with each other, nearby agents tend towards consensus. As information is diffused across the network, every agent benefits from every other agent’s learning process. Since agents can only communicate with their neighbors, the computational complexity and communication overhead per agent grow linearly with the number of neighbors instead of the total number of agents.

The Results

The Prowler team used OpenAI Gym to benchmark Diff-DAC against a group of state-of-the-art multi-agent reinforcement learning(MARL) algorithms. The experiments were based on classic MARL scenarios such as Cart-Port Balance or the Inverted Pendulum. In most cases, Diff-DAC was able to match and usually outperform the results obtained with the centralized architectures. Even when the centralized models were able to learn the policies faster than Diff-DAC, the latter exhibited better performance and less variance.

Decentralized learning models such as Diff-DAC are going to be key to implement reinforcement learning scenarios at scale. The emergence of technologies such as blockchains and distributed ledgers as well as the improvements in security models such as homomorphic encryption are contributing to take decentralized deep learning closer to reality. MARL scenarios, seems like an obvious place to start.

要查看或添加评论，请登录

Jesus Rodriguez的更多文章

Robust Agents Are All We Need: Faktory Emerges from Stealth Mode with a Private?Alpha

2024年2月28日

Robust Agents Are All We Need: Faktory Emerges from Stealth Mode with a Private?Alpha

Last year, I had the unique opportunity to incubate a new project in the autonomous agents space, alongside a…

1 条评论
Google’s BLEURT is BERT for Evaluating Natural Language Generation Models

2020年5月27日

Google’s BLEURT is BERT for Evaluating Natural Language Generation Models

Natural language generation(NLG) is one of the fastest growing areas of research in deep learning. NLG applications are…
Two Deep Learning Frameworks and an AI Super-Computer: Microsoft Launches New Efforts to Achieve Large-Scale AI

2020年5月25日

Two Deep Learning Frameworks and an AI Super-Computer: Microsoft Launches New Efforts to Achieve Large-Scale AI

Training models with massive datasets is becoming the norm in modern deep learning applications. Some of the latest…
Uber Open Sources a New Framework for Designing Optimal Statistical Experiments

2020年5月18日

Uber Open Sources a New Framework for Designing Optimal Statistical Experiments

Rapid experimentation is a key element of modern software development. The raise in popularity of machine learning, has…
Uber Unveils Its New Data Quality Management Solution

2020年5月13日

Uber Unveils Its New Data Quality Management Solution

Data quality management is one of those often forgotten aspects of machine learning workflows. Small inconsistencies or…
LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability

2020年5月7日

LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability

Interoperating TensorFlow and Apache Spark is a common challenge in real world machine learning scenarios. TensorFlow…
Google Unveils TAPAS, a BERT-Based Neural Network for Querying Tables Using Natural Language

2020年5月6日

Google Unveils TAPAS, a BERT-Based Neural Network for Querying Tables Using Natural Language

Querying relational data structures using natural languages has long been a dream of technologists in the space. With…
Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot

2020年5月4日

Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot

Natural language understanding(NLU) has been one of the most active areas adopting state-pf-the-art deep learning…

2 条评论
Microsoft Research Unveils Three Efforts to Advance Deep Generative Models

2020年4月27日

Microsoft Research Unveils Three Efforts to Advance Deep Generative Models

Generative models have been an important component of machine learning for the last few decades. With the emergence of…
Facebook and Amazon Bring Two Projects to PyTorch 1.5 that Streamline the Lifecycle of Production-Ready Deep Learning Models

2020年4月22日

Facebook and Amazon Bring Two Projects to PyTorch 1.5 that Streamline the Lifecycle of Production-Ready Deep Learning Models

PyTorch is one of the fastest growing open source projects in the deep learning space. Initially incubated by Facebook,…

See all articles

Decentralized and Scalable Multi-Agent Reinforcement Learning

Jesus Rodriguez

CEO of IntoTheBlock, Co-Founder, Co-Founder of LayerLens, Faktory,and NeuralFabric, Founder of The Sequence AI Newsletter, Guest Lecturer at Columbia, Guest Lecturer at Wharton Business School, Investor, Author.

The Task Similarity Learning Principle

Diff-DAC

The Results

Jesus Rodriguez的更多文章

社区洞察

其他会员也浏览了

A Comprehensive Hands on guide to transfer learning

Type of ML and Application

Reset-Free Reinforcement Learning

Machine Learning Series: Part 5 – The Rise Of Self-Supervised Learning

Day 10 : Supervised vs Unsupervised Learning

Off-Policy Reinforcement Learning

Knowledge-guided Self-Supervised Learning (KGSSL)

A Comprehensive Hands on guide to transfer learning

Semi-Supervised Learning with Generative Models

The Task Similarity Learning Principle

Diff-DAC

The Results

Jesus Rodriguez的更多文章

Robust Agents Are All We Need: Faktory Emerges from Stealth Mode with a Private?Alpha

Google’s BLEURT is BERT for Evaluating Natural Language Generation Models

Two Deep Learning Frameworks and an AI Super-Computer: Microsoft Launches New Efforts to Achieve Large-Scale AI

Uber Open Sources a New Framework for Designing Optimal Statistical Experiments

Uber Unveils Its New Data Quality Management Solution

LinkedIn Open Sources a Small Component to Simplify the TensorFlow-Spark Interoperability

Google Unveils TAPAS, a BERT-Based Neural Network for Querying Tables Using Natural Language

Facebook Open Sources Blender, the Largest-Ever Open Domain Chatbot

Microsoft Research Unveils Three Efforts to Advance Deep Generative Models

Facebook and Amazon Bring Two Projects to PyTorch 1.5 that Streamline the Lifecycle of Production-Ready Deep Learning Models

社区洞察

其他会员也浏览了

A Comprehensive Hands on guide to transfer learning

Type of ML and Application

Reset-Free Reinforcement Learning

Machine Learning Series: Part 5 – The Rise Of Self-Supervised Learning

Day 10 : Supervised vs Unsupervised Learning

Off-Policy Reinforcement Learning

Knowledge-guided Self-Supervised Learning (KGSSL)

A Comprehensive Hands on guide to transfer learning

Semi-Supervised Learning with Generative Models