Exclusive: My Interview with Rich Sutton, the Father of Reinforcement Learning
Rich Sutton

Exclusive: My Interview with Rich Sutton, the Father of Reinforcement Learning

I met Rich Sutton back in 1980s, when he and I, both fresh PhDs, joined GTE Laboratories in Boston area. I was doing research into Intelligent Databases and he was working on Reinforcement Learning department, but our GTE Labs projects were far from real-world deployment. We frequently played chess, where we were about equal, but Rich was far ahead of me in Machine Learning. Rich is both a brilliant researcher and a very nice and modest person. He says in the interview below that the idea of "Reinforcement Learning" was obvious, but there is a huge distance between having an idea and developing it into a working, mathematically-based theory, which is what Rich and Andrew Barto - his PhD thesis adviser - did for Reinforcement Learning. RL was a major part of the recent success of AlphaGo Zero, and if Artificial General Intelligence (AGI) will be developed at some point, RL is likely to play a major role in it. 

Rich Sutton, Ph.D. is currently professor of Computer Science, iCORE chair at the University of Alberta, and a Distinguished Research Scientist at DeepMind. He is one of the founding fathers of Reinforcement Learning (RL), an increasingly important part of Machine Learning and AI. His significant contributions to RL include temporal difference learning and policy gradient methods. He is the author of a widely acclaimed book (with Andrew Barto)"Reinforcement Learning, an introduction" - cited over 25,000 times, with 2nd edition coming soon. 

He received BA in Psychology from Stanford (1978) and MS (1980) and PhD (1984) in Computer science from U. of Massachusetts at Amherst. His doctoral dissertation was entitled "Temporal Credit Assignment in Reinforcement Learning", where he introduced actor-critic architectures and "temporal credit assignment". 

From 1985 to 1994 Sutton was a Principal Member of Technical Staff at GTE Laboratories. He then spent 3 years at UMass Amherst as a Senior Research Scientist, and after that 5 years at the AT&T Shannon Laboratory as Principal Technical Staff Member. Since 2003 he is Professor and iCORE Chair in the Dept. of Computing Science at the University of Alberta, where he leads the Reinforcement Learning and Artificial Intelligence Laboratory (RLAI). Starting June 2017, Sutton also co-leads a new Alberta office of DeepMind

Rich also keeps a blog/personal page at incompleteideas.net


Gregory Piatetsky: What are the main ideas in Reinforcement Learning (RL) and how it is different from Supervised Learning? 

Rich Sutton: Reinforcement learning is learning from rewards, by trial and error, during normal interaction with the world. This makes it very much like natural learning processes and unlike supervised learning, in which learning only happens during a special training phase in which a supervisory or teaching signal is available that will not be available during normal use. 

The typical RL scenario: an agent takes actions in an environment, which is interpreted into a reward and a representation of the state, which are fed back into the agent. Source: Wikipedia


For example, speech recognition is currently done by supervised learning, using large datasets of speech sounds and their correct transcriptions into words. The transcriptions are the supervisory signals that will not be available when new speech sounds come in to be recognized. Game playing, on the other hand, is often done by reinforcement learning, using the outcome of the game as a reward. Even when you play a new game you will see whether you win or lose, and can use this with reinforcement learning algorithms to improve your play. A supervised learning approach to game playing would instead require examples of "correct" moves, say from a human expert. This would be handy to have, but it is not available during normal play, and would limit the skill of the learned system to that of the human expert. In reinforcement learning you make do with less informative training information, with the advantage that that information is more plentiful and is not limited by the skill of the supervisor. 

GP: The second edition of your classic book with Andrew Barto: "Reinforcement Learning, an introduction" is coming soon (when?). What are the main advances covered in the second edition, and can you tell us about new chapters on the intriguing connections between reinforcement learning and psychology (Ch. 14) and neuroscience (Ch. 15)? 

RS: The complete draft of the second edition is currently available on the web at richsutton.com. Andy Barto and I are putting some final finishing touches on it: validating all the references, things like that. It will be printed in a physical form early next year. 

A lot has happened in reinforcement learning in the twenty years since the first edition. Perhaps the most important of these is the huge impact reinforcement learning ideas have had on neuroscience, where the now-standard theory of brain reward systems is that they are an instance of temporal-difference learning (one of the fundamental learning methods of reinforcement learning). 

In particular, the theory now is that a primary role of the neurotransmitter Dopamine is to carry the temporal-difference error, also called the reward-prediction error. This has been a huge development with many sources, ramifications, and tests, and our treatment in the book can only summarize them. This and other developments are covered in Chapter 15, and Chapter 14 summarizes their important precursors in psychology. 

Overall the second edition is at about two-thirds larger than the first. There are now five chapters on function approximation instead of one. There are the two new chapters on psychology and neuroscience. There is also a new chapters on the frontiers of reinforcement learning, including a section on its societal implications. And everything has been updated and extended throughout the book. For example, the new applications chapter covers Atari game playing and AlphaGo Zero. 

GP: What is Deep Reinforcement Learning - how it is different from RL? 

RS: Deep reinforcement learning is the combination of deep learning and reinforcement learning. These two kinds of learning address largely orthogonal issues and combine nicely. In short, reinforcement learning needs methods for approximating functions from data to implement all of its components - value functions, policies, world models, state updaters - and deep learning is the latest and most successful of recently developed function approximators. Our textbook covers mainly linear function approximators, while giving the equations for the general case. We cover neural networks in the applications chapter and in one section, but to learn fully about deep reinforcement learning one would have to complement our book with, say, the Deep Learning book by Goodfellow, Bengio, and Courville. 

GP: RL had great success in games, for example with AlphaGo Zero. What other areas you expect RL to do well? 

RS: Well, of course I believe that in some sense reinforcement learning is the future of AI. Reinforcement learning is the best representative of the idea that an intelligent system must be able to learn on its own, without constant supervision. An AI has to be able to tell for itself if it is right or wrong. Only in this way can it scale to really large amounts of knowledge and general skill. 

Read the rest of the interview on KDnuggets - including where Yann LeCun is wrong, why Prediction Learning will soon be a hot technology trends, Rich thoughts on AGI, and more.

Exclusive: Interview with Rich Sutton, the Father of Reinforcement Learning - Dec 5, 2017.

https://www.kdnuggets.com/2017/12/interview-rich-sutton-reinforcement-learning.html


Pradeep Narasimha

Observability | Kubernetes | Cloud Native | IaC | SaaS | OSS

6 年

RL is the way to achieve singularity.

回复
Luis Fernando Fuentes

Data Engineer Multicloud | Data Architech | Data Governance | Snowflake | IA Enthusiast

6 年

The master...

回复
Domy C.

A.I. Deep Learning Architect & I have developed my own 3D Procedural Game Engine for over 25 years. Currently developing my own 3D game using my 3D game engine. ??

6 年

Great interview!!!

要查看或添加评论,请登录

社区洞察

其他会员也浏览了