登录查看更多内容

One Minute Overview of Reinforcement Learning

Saulius Dobilas

Associate Director, Strats at HSBC

发布日期: 2022年11月2日

The?#52weeksofdatascience?newsletter covers everything from Linear Regression to Neural Networks and beyond. If you like?Data Science?and?Machine Learning, don’t forget to?subscribe and share it with your friends!

Level 1 - One Minute Overview for Data & Analytics Executives and Curious Minds

Category:?Reinforcement Learning

Main Idea:?Reinforcement Learning (RL) is a category of Machine Learning algorithms used for training an intelligent?agent?to perform tasks or achieve goals in a specific?environment?by maximising the expected cumulative?reward.?

It is similar to how babies learn about their surroundings or how we train dogs. We allow them to interact with and explore the environment and provide positive/negative rewards to encourage/discourage particular behaviour.

In this week’s newsletter, I will introduce you to RL’s main elements/terminology, so we can reference them when we look at actual RL algorithms in the upcoming newsletters.

Agent exploring its environment. Image by author.

Agent?— an “intelligent actor” that can interact with its environment, e.g. a player in a game.
Environment?— the “world” where that agent “lives” or operates.
Action Space?— a list or range of actions the agent can perform.
State/Observation Space?— a list or range of possible environment configurations. A state/observation provides information to the agent about its environment (e.g. its location).
Reward?— incentive (or disincentive) that we give to the agent when it performs desired (undesired) actions at various states. For example, if the goal of the game is to catch a squirrel (see above image), then we would reward the agent for moving in the direction of a squirrel and catching it. We can also reduce future rewards relative to present rewards by using the?discount factor {gamma(??)}.
Exploration/Exploitation {epsilon(??)}— enables us to set how much time the agent should spend exploring the environment vs exploiting its existing knowledge about the environment.
Episode?— one complete cycle from the start position to the end position. E.g., in the context of a game, an episode would last from the moment your agent starts a new level until it dies or completes the level.
Alpha(??)?— learning rate, which influences the learning speed and convergence towards the optimal policy.
Policy(??)?—an agent’s strategy to pursue a goal.

There are two different methods to train an agent in Reinforcement Learning:

领英推荐

Types of Machine Learning

Dr. Hari Thapliyaal, PMP 1 年前

10 Applications that require Deep Learning

Vartul Mittal 4 年前

??Basic Concepts of Deep Learning – Part2

Himanshu Singh 2 个月前

Policy-based methods?— we train the agent directly on what action to take in which state.
Value-based methods?— we train the agent to identify which states (or state-action pairs) are more valuable so that it can be guided by value maximisation. E.g., in the game of catching a squirrel, standing one step away from a squirrel would describe a more valuable state than standing ten steps away.

To summarise, we now know that Reinforcement Learning is used to teach the?agent?to operate within its?environment?and achieve a goal or objective (e.g., win a game) by providing positive, neutral or negative?rewards?to the agent based on the?actions?it takes at different?states.

We balance the?exploration?vs?exploitation?by specifying what proportion of the agent’s actions should be chosen randomly, and we apply a?discount factor (gamma)?to control the agent’s preference for short-term vs long-term rewards.

Finally, we train the model (teach the agent) by optimising?Policy(??),?which we do either through a?direct policy-based method?or an?indirect value-based method.

Everyday use cases:?Reinforcement Learning has a wide range of applications. It can be used to train an agent to play a game (e.g. chess or GO), teach robots to perform virtual or real-life tasks, or even develop an AI for self-driving cars.

Level 2 - for Aspiring Data Scientists

Learn more about Reinforcement Learning in my?in-depth article?on Towards Data Science.

Level 3 - for Data Science and Analytics Professionals

No Python code this time, as I focused on introducing the concepts/terminology. Don’t worry, though. You’ll get some RL code to play with in the upcoming newsletters.

52 weeks of Decision Making

699 位关注者

Segun Umoru

2 年

Nice ??

查看更多评论

要查看或添加评论，请登录

Saulius Dobilas的更多文章

Strategy #12 - Start With The Hardest Part

2025年3月23日

Strategy #12 - Start With The Hardest Part

The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…
Strategy #11 - Asymmetric Payoff

2025年3月16日

Strategy #11 - Asymmetric Payoff

The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…
Strategy #10 - Specificity Is Good

2025年3月9日

Strategy #10 - Specificity Is Good

The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…
Strategy #9 - Exploration vs Exploitation

2025年3月2日

Strategy #9 - Exploration vs Exploitation

The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…
Strategy #8 - Metrics Are Only Proxies For What You Care About

2025年2月23日

Strategy #8 - Metrics Are Only Proxies For What You Care About

The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

4 条评论
Strategy #7 - Exponential Backoff

2025年2月16日

Strategy #7 - Exponential Backoff

The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…
Strategy #6 - Capabilities Also Define Disabilities

2025年2月9日

Strategy #6 - Capabilities Also Define Disabilities

The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

3 条评论
Strategy #5 - Narrow Framing vs Broad Framing

2025年2月2日

Strategy #5 - Narrow Framing vs Broad Framing

The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…
Strategy #4 - One-Way Door vs Two-Way Door

2025年1月26日

Strategy #4 - One-Way Door vs Two-Way Door

The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

2 条评论
Strategy #3 - When to Think Less

2025年1月19日

Strategy #3 - When to Think Less

The #52weeksofdecisionmaking newsletter explains different ways of making decisions in a quick and easy-to-digest…

See all articles

One Minute Overview of Reinforcement Learning

Saulius Dobilas

Associate Director, Strats at HSBC

Level 1 - One Minute Overview for Data & Analytics Executives and Curious Minds

领英推荐

Level 2 - for Aspiring Data Scientists

Level 3 - for Data Science and Analytics Professionals

52 weeks of Decision Making

699 位关注者

Saulius Dobilas的更多文章

社区洞察

其他会员也浏览了

Basic Concepts of Deep Learning – Part3

Deep Learning Model

?? Image Classification: Supercharging Image Classification with Transfer Learning and Ensemble Models ??

AlphaZero: Revolutionizing Machine Learning and Artificial Intelligence

Machine Learning: A Comprehensive Overview

The State of Competitive Machine Learning, Deep Learning and NLP

50 scenario-based questions for an ML engineer interview

Delving Deeper: Exploring the Inner Workings of Machine Learning

AI-Reinforcement Learning Approach on Real-time Applications & ML for Sentiment Analysis.

Alpha Matting in ComputerVision

Level 1 - One Minute Overview for Data & Analytics Executives and Curious Minds

领英推荐

Level 2 - for Aspiring Data Scientists

Level 3 - for Data Science and Analytics Professionals

52 weeks of Decision Making

699 位关注者

Saulius Dobilas的更多文章

Strategy #12 - Start With The Hardest Part

Strategy #11 - Asymmetric Payoff

Strategy #10 - Specificity Is Good

Strategy #9 - Exploration vs Exploitation

Strategy #8 - Metrics Are Only Proxies For What You Care About

Strategy #7 - Exponential Backoff

Strategy #6 - Capabilities Also Define Disabilities

Strategy #5 - Narrow Framing vs Broad Framing

Strategy #4 - One-Way Door vs Two-Way Door

Strategy #3 - When to Think Less

社区洞察

其他会员也浏览了

Basic Concepts of Deep Learning – Part3

Deep Learning Model

?? Image Classification: Supercharging Image Classification with Transfer Learning and Ensemble Models ??

AlphaZero: Revolutionizing Machine Learning and Artificial Intelligence

Machine Learning: A Comprehensive Overview

The State of Competitive Machine Learning, Deep Learning and NLP

50 scenario-based questions for an ML engineer interview

Delving Deeper: Exploring the Inner Workings of Machine Learning

AI-Reinforcement Learning Approach on Real-time Applications & ML for Sentiment Analysis.

Alpha Matting in ComputerVision