Lessons in A.I. from a Budding Machine Learning Engineer — A Brief Introduction to Reinforcement Learning Part II

Lessons in A.I. from a Budding Machine Learning Engineer — A Brief Introduction to Reinforcement Learning Part II


In our last article we introduced RL and explained what a policy is and how it applies to our fast food problem domain. In this article we will dive a bit deeper by discussing the reward and value function. In addition, we will close out this series and open up future conversations to discuss more popular AI topics in addition to building and deploying real solutions.

When one speaks of RL we have to go back in time and discuss the genesis of the field from one of the early contributors, Richard Bellman. His work goes back to the mid 20th century around 1957. Building off the work of the Russian mathematician Andrey Andreyevich Markov, Bellman utilized Markov Decision Processes or MDP to solve a class of problems that involved sequential decisions to reach some long term goal. Bellman, born in Brooklyn, was the son of Jewish parents who ran a local grocery store. However, he was raised atheist which helped fuel his desire for unconventional thinking at that time in American history. During World War II, he worked for a theoretical physics division of the military in Los Alamos, New Mexico.2 Returning back to college Bellman defended his PhD thesis at Princeton University within just 3 months on the “the stability of differential equations”.1

Through his experimentation, rigor, and academic excellence one of his many contributions to mathematics and control theory was the Bellman Equation.

Bellman Equation

It looks like there is a lot going in this equation and in fact there is. However, we will break it down to its subcomponent parts, so that you have a fundamental understanding of its functioning. First, remember the pseudocode in our last article that showed how a typical RL policy is designed and subsequently implemented.

Read more here....


Larry Johnson的更多文章

