One Minute Overview of Reinforcement Learning
Reinforcement Learning. Image by author.

One Minute Overview of Reinforcement Learning

The?#52weeksofdatascience?newsletter covers everything from Linear Regression to Neural Networks and beyond. If you like?Data Science?and?Machine Learning, don’t forget to?subscribe and share it with your friends!

Level 1 - One Minute Overview for Data & Analytics Executives and Curious Minds

Category:?Reinforcement Learning

Main Idea:?Reinforcement Learning (RL) is a category of Machine Learning algorithms used for training an intelligent?agent?to perform tasks or achieve goals in a specific?environment?by maximising the expected cumulative?reward.?

It is similar to how babies learn about their surroundings or how we train dogs. We allow them to interact with and explore the environment and provide positive/negative rewards to encourage/discourage particular behaviour.

In this week’s newsletter, I will introduce you to RL’s main elements/terminology, so we can reference them when we look at actual RL algorithms in the upcoming newsletters.

Agent exploring its environment. Image by author.

  • Agent?— an “intelligent actor” that can interact with its environment, e.g. a player in a game.
  • Environment?— the “world” where that agent “lives” or operates.
  • Action Space?— a list or range of actions the agent can perform.
  • State/Observation Space?— a list or range of possible environment configurations. A state/observation provides information to the agent about its environment (e.g. its location).
  • Reward?— incentive (or disincentive) that we give to the agent when it performs desired (undesired) actions at various states. For example, if the goal of the game is to catch a squirrel (see above image), then we would reward the agent for moving in the direction of a squirrel and catching it. We can also reduce future rewards relative to present rewards by using the?discount factor {gamma(??)}.
  • Exploration/Exploitation {epsilon(??)}— enables us to set how much time the agent should spend exploring the environment vs exploiting its existing knowledge about the environment.
  • Episode?— one complete cycle from the start position to the end position. E.g., in the context of a game, an episode would last from the moment your agent starts a new level until it dies or completes the level.
  • Alpha(??)?— learning rate, which influences the learning speed and convergence towards the optimal policy.
  • Policy(??)?—an agent’s strategy to pursue a goal.

There are two different methods to train an agent in Reinforcement Learning:

  • Policy-based methods?— we train the agent directly on what action to take in which state.
  • Value-based methods?— we train the agent to identify which states (or state-action pairs) are more valuable so that it can be guided by value maximisation. E.g., in the game of catching a squirrel, standing one step away from a squirrel would describe a more valuable state than standing ten steps away.

To summarise, we now know that Reinforcement Learning is used to teach the?agent?to operate within its?environment?and achieve a goal or objective (e.g., win a game) by providing positive, neutral or negative?rewards?to the agent based on the?actions?it takes at different?states.

We balance the?exploration?vs?exploitation?by specifying what proportion of the agent’s actions should be chosen randomly, and we apply a?discount factor (gamma)?to control the agent’s preference for short-term vs long-term rewards.

Finally, we train the model (teach the agent) by optimising?Policy(??),?which we do either through a?direct policy-based method?or an?indirect value-based method.

Everyday use cases:?Reinforcement Learning has a wide range of applications. It can be used to train an agent to play a game (e.g. chess or GO), teach robots to perform virtual or real-life tasks, or even develop an AI for self-driving cars.

Level 2 - for Aspiring Data Scientists

Learn more about Reinforcement Learning in my?in-depth article?on Towards Data Science.

Level 3 - for Data Science and Analytics Professionals

No Python code this time, as I focused on introducing the concepts/terminology. Don’t worry, though. You’ll get some RL code to play with in the upcoming newsletters.

Segun Umoru

Founder of 3Signet Ltd | Head of Data Science and Analytics | Generative AI Engineer | Data Science Consultant | Data Governance Analyst | Transforming learning and empowering enterprises through innovative solutions

2 年

Nice ??

回复

要查看或添加评论,请登录

Saulius Dobilas的更多文章

  • Strategy #12 - Start With The Hardest Part

    Strategy #12 - Start With The Hardest Part

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

  • Strategy #11 - Asymmetric Payoff

    Strategy #11 - Asymmetric Payoff

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

  • Strategy #10 - Specificity Is Good

    Strategy #10 - Specificity Is Good

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

  • Strategy #9 - Exploration vs Exploitation

    Strategy #9 - Exploration vs Exploitation

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

  • Strategy #8 - Metrics Are Only Proxies For What You Care About

    Strategy #8 - Metrics Are Only Proxies For What You Care About

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

    4 条评论
  • Strategy #7 - Exponential Backoff

    Strategy #7 - Exponential Backoff

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

  • Strategy #6 - Capabilities Also Define Disabilities

    Strategy #6 - Capabilities Also Define Disabilities

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

    3 条评论
  • Strategy #5 - Narrow Framing vs Broad Framing

    Strategy #5 - Narrow Framing vs Broad Framing

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

  • Strategy #4 - One-Way Door vs Two-Way Door

    Strategy #4 - One-Way Door vs Two-Way Door

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

    2 条评论
  • Strategy #3 - When to Think Less

    Strategy #3 - When to Think Less

    The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

社区洞察

其他会员也浏览了