Bygone Basis of Reinforcement Learning
An AI Generated Image by Canva Magic Studio

Bygone Basis of Reinforcement Learning

When we explore the roots of reinforcement learning, many of us come across the contributions of Richard S Sutton, a famous computer scientist. It is interesting to note that Richard Sutton was inspired by the pioneering works of American Researcher, Harry Klopf. Hedonistic Neuron is a seminal work by Harry Klopf, published in 1982. He has influenced the world of reinforcement learning like none has ever imagined. In this book, Klopf defines the heterostatic properties of neural networks. In an attempt to characterise consciousness, Klopf relates the construct to wave phenomena and suggests the further general equivalences of pleasure with "entropic processes", and pain with "anti-entropic processes".

By contrast with homeostatic systems seeking to maintain homeostasis, neurons and the systems they compose are envisaged as "heterostats", which seek to achieve "heterostasis". The concept of heterostasis can be considered at the root of artificial adaptive intelligence. A "homeostat" seeks to maintain a steady state whilst a "heterostatic" system seeks to achieve a better or optimal state. This book delves into the Hebbian concept of plasticity and self organisation. Klopf introduces another hypothesis that the capacity of the Limbic System and Hypothalumus to distinguish between self and other is "severely limited".

Reinforcement learning involves the problems of stochastic processes, optimisation and control, and sequential decisions. These problems have the common feature that one needs to overpower the uncertainty by persistent steering towards the goal. Thus, reinforcement learning aims to solve problems that involve sequential optimal decisions under uncertainty. It tries to address the problems related to random variables that evolve over time. It is technically defined as stochastic processes. The field of reinforcement learning was originally known as approximate dynamic programming (ADP) and neurodynamic programming (NDP). Field such as Operations Research and Optimisation Control have greatly contributed to the emergence of reinforcement learning .

Unlike the other branches of Machine Learning (Supervised Learning and Unsupervised Learning), Reinforcement Learning is a lot more pragmatic and purpose driven. It aims go beyond learning the patterns and properties of the presented data and to learn about the appropriate decisions to be made so as to drive towards the optimisation objective. Following observation from the Stanford Book on Reinforcement Fundamentals summarises this elegantly.

It is sometimes said that Supervised Learning and Unsupervised learning are about “minimisation” (i.e., they minimise the fitting error of a model to the presented data), while Reinforcement Learning is about “maximisation”

In conventional statistical learning methods, the learning agents are considered to be passive. But in reinforcement learning, the learning agent and the environment are active participants.The class of problems that reinforcement learning aims to solve can be described with a mathematical framework known as Markov Decision Processes (abbreviated as MDPs). In MDP approach, the agent and the environment interact in a time-sequenced loop. The term agent refers to an algorithm and the term environment refers to an abstract entity that serves up uncertain outcomes to the agents.

要查看或添加评论,请登录

社区洞察

其他会员也浏览了