登录查看更多内容

Exclusive: My Interview with Rich Sutton, the Father of Reinforcement Learning

Gregory Piatetsky-Shapiro

Part-time philosopher, Retired, Data Scientist, KDD and KDnuggets Founder, was LinkedIn Top Voice on Data Science & Analytics. Currently helping Ukrainian refugees in MA.

发布日期: 2017年12月5日

I met Rich Sutton back in 1980s, when he and I, both fresh PhDs, joined GTE Laboratories in Boston area. I was doing research into Intelligent Databases and he was working on Reinforcement Learning department, but our GTE Labs projects were far from real-world deployment. We frequently played chess, where we were about equal, but Rich was far ahead of me in Machine Learning. Rich is both a brilliant researcher and a very nice and modest person. He says in the interview below that the idea of "Reinforcement Learning" was obvious, but there is a huge distance between having an idea and developing it into a working, mathematically-based theory, which is what Rich and Andrew Barto - his PhD thesis adviser - did for Reinforcement Learning. RL was a major part of the recent success of AlphaGo Zero, and if Artificial General Intelligence (AGI) will be developed at some point, RL is likely to play a major role in it.

Rich Sutton, Ph.D. is currently professor of Computer Science, iCORE chair at the University of Alberta, and a Distinguished Research Scientist at DeepMind. He is one of the founding fathers of Reinforcement Learning (RL), an increasingly important part of Machine Learning and AI. His significant contributions to RL include temporal difference learning and policy gradient methods. He is the author of a widely acclaimed book (with Andrew Barto)"Reinforcement Learning, an introduction" - cited over 25,000 times, with 2nd edition coming soon.

He received BA in Psychology from Stanford (1978) and MS (1980) and PhD (1984) in Computer science from U. of Massachusetts at Amherst. His doctoral dissertation was entitled "Temporal Credit Assignment in Reinforcement Learning", where he introduced actor-critic architectures and "temporal credit assignment".

From 1985 to 1994 Sutton was a Principal Member of Technical Staff at GTE Laboratories. He then spent 3 years at UMass Amherst as a Senior Research Scientist, and after that 5 years at the AT&T Shannon Laboratory as Principal Technical Staff Member. Since 2003 he is Professor and iCORE Chair in the Dept. of Computing Science at the University of Alberta, where he leads the Reinforcement Learning and Artificial Intelligence Laboratory (RLAI). Starting June 2017, Sutton also co-leads a new Alberta office of DeepMind.

Rich also keeps a blog/personal page at incompleteideas.net.

Gregory Piatetsky: What are the main ideas in Reinforcement Learning (RL) and how it is different from Supervised Learning?

Rich Sutton: Reinforcement learning is learning from rewards, by trial and error, during normal interaction with the world. This makes it very much like natural learning processes and unlike supervised learning, in which learning only happens during a special training phase in which a supervisory or teaching signal is available that will not be available during normal use.

The typical RL scenario: an agent takes actions in an environment, which is interpreted into a reward and a representation of the state, which are fed back into the agent. Source: Wikipedia

For example, speech recognition is currently done by supervised learning, using large datasets of speech sounds and their correct transcriptions into words. The transcriptions are the supervisory signals that will not be available when new speech sounds come in to be recognized. Game playing, on the other hand, is often done by reinforcement learning, using the outcome of the game as a reward. Even when you play a new game you will see whether you win or lose, and can use this with reinforcement learning algorithms to improve your play. A supervised learning approach to game playing would instead require examples of "correct" moves, say from a human expert. This would be handy to have, but it is not available during normal play, and would limit the skill of the learned system to that of the human expert. In reinforcement learning you make do with less informative training information, with the advantage that that information is more plentiful and is not limited by the skill of the supervisor.

GP: The second edition of your classic book with Andrew Barto: "Reinforcement Learning, an introduction" is coming soon (when?). What are the main advances covered in the second edition, and can you tell us about new chapters on the intriguing connections between reinforcement learning and psychology (Ch. 14) and neuroscience (Ch. 15)?

RS: The complete draft of the second edition is currently available on the web at richsutton.com. Andy Barto and I are putting some final finishing touches on it: validating all the references, things like that. It will be printed in a physical form early next year.

A lot has happened in reinforcement learning in the twenty years since the first edition. Perhaps the most important of these is the huge impact reinforcement learning ideas have had on neuroscience, where the now-standard theory of brain reward systems is that they are an instance of temporal-difference learning (one of the fundamental learning methods of reinforcement learning).

In particular, the theory now is that a primary role of the neurotransmitter Dopamine is to carry the temporal-difference error, also called the reward-prediction error. This has been a huge development with many sources, ramifications, and tests, and our treatment in the book can only summarize them. This and other developments are covered in Chapter 15, and Chapter 14 summarizes their important precursors in psychology.

Overall the second edition is at about two-thirds larger than the first. There are now five chapters on function approximation instead of one. There are the two new chapters on psychology and neuroscience. There is also a new chapters on the frontiers of reinforcement learning, including a section on its societal implications. And everything has been updated and extended throughout the book. For example, the new applications chapter covers Atari game playing and AlphaGo Zero.

GP: What is Deep Reinforcement Learning - how it is different from RL?

RS: Deep reinforcement learning is the combination of deep learning and reinforcement learning. These two kinds of learning address largely orthogonal issues and combine nicely. In short, reinforcement learning needs methods for approximating functions from data to implement all of its components - value functions, policies, world models, state updaters - and deep learning is the latest and most successful of recently developed function approximators. Our textbook covers mainly linear function approximators, while giving the equations for the general case. We cover neural networks in the applications chapter and in one section, but to learn fully about deep reinforcement learning one would have to complement our book with, say, the Deep Learning book by Goodfellow, Bengio, and Courville.

GP: RL had great success in games, for example with AlphaGo Zero. What other areas you expect RL to do well?

RS: Well, of course I believe that in some sense reinforcement learning is the future of AI. Reinforcement learning is the best representative of the idea that an intelligent system must be able to learn on its own, without constant supervision. An AI has to be able to tell for itself if it is right or wrong. Only in this way can it scale to really large amounts of knowledge and general skill.

Read the rest of the interview on KDnuggets - including where Yann LeCun is wrong, why Prediction Learning will soon be a hot technology trends, Rich thoughts on AGI, and more.

Exclusive: Interview with Rich Sutton, the Father of Reinforcement Learning - Dec 5, 2017.

https://www.kdnuggets.com/2017/12/interview-rich-sutton-reinforcement-learning.html

Pradeep Narasimha

6 年

RL is the way to achieve singularity.

Luis Fernando Fuentes

Data Engineer Multicloud | Data Architech | Data Governance | Snowflake | IA Enthusiast

6 年

The master...

Pascal Sitbon

Tech Lead Data

6 年

Charles Sutton

1 次回应

Domy C.

A.I. Deep Learning Architect & I have developed my own 3D Procedural Game Engine for over 25 years. Currently developing my own 3D game using my 3D game engine. ??

6 年

Great interview!!!

2 次回应

查看更多评论

要查看或添加评论，请登录

查看全部

Exclusive: My Interview with Rich Sutton, the Father of Reinforcement Learning

Gregory Piatetsky-Shapiro

Part-time philosopher, Retired, Data Scientist, KDD and KDnuggets Founder, was LinkedIn Top Voice on Data Science & Analytics. Currently helping Ukrainian refugees in MA.

更多精彩文章

社区洞察

其他会员也浏览了

Machine Learning Explained: Understanding Supervised, Unsupervised, and Reinforcement Learning

? #ICML2024 accepted! CARTE: Pretraining and Transfer for Tabular Learning

Advancing Deep Fake Detection Through Multi-Task Learning: An In-Depth Analysis

Mastering Unsupervised Learning with Generative Models: A Comprehensive Guide

Uniting Operations Research and Deep Reinforcement Learning: A Blueprint for Advanced Decision-making

ML - Supervised vs. Unsupervised Learning by Izam

Understand the space of Reinforcement Learning algorithms, Temporal- Difference learning, Monte Carlo, Sarsa, Q-learning, Policy Gradients, and Dyna

Challenges of Reinforcement Learning (2022 Guide)

GANs Unplugged: Explained for a 10-Year-Old

#3: RLiable to Save RL Research, NeurIPS is Here, Offline RL Gains Steam, Check out COMARL

KDnuggets: Personal History and Nuggets of Experience

2021年12月4日

Which Data Science Skills are core and which are hot/emerging ones?

2019年9月17日

Gainers, Losers, and Trends in Gartner 2019 Magic Quadrant for Data Science and Machine Learning Platforms

2019年2月11日

AI, Data Science, Analytics Main Developments in 2018 and Key Trends for 2019

2018年12月4日

How Important is that Machine Learning Model be Understandable?

2018年11月19日

Anticipating the next move in data science – my interview with Thomson Reuters

2018年11月18日

Amazing consistency: Largest Dataset Analyzed / Data Mined – Poll Results and Trends

2018年10月31日

How many Data Scientists are there and is there a shortage?

2018年9月19日

Why Germany did not defeat Brazil in the final, or Data Science lessons from the World Cup

2018年7月30日

SuperDataScience Podcast: Insights from the Founder of KDnuggets

2018年7月23日