登录查看更多内容

AI Reinforcement Learning Overview

Baishalini Sahu

发布日期: 2022年9月25日

What is Reinforcement Learning?

Reinforcement Learning?is defined as a Machine Learning method that is concerned with how software agents should take actions in an environment. Reinforcement Learning is a part of the deep learning method that helps you to maximize some portion of the cumulative reward. This neural network learning method helps you to learn how to attain a complex objective or maximize a specific dimension over many steps.

How does it compare with other ML techniques?

Reinforcement learning is distinguished from other training styles, including?supervised?and unsupervised learning, by its goal and, consequently, the learning approach.

Our Reinforcement learning article will give you a complete overview of reinforcement learning, including MDP and Q-learning. In RL tutorial, you will learn the below topics:

Terms used in Reinforcement Learning.
Key features of Reinforcement Learning.
Elements of Reinforcement Learning.
Approaches to implementing Reinforcement Learning.
How does Reinforcement Learning Work?
The Bellman Equation.
Types of Reinforcement Learning.
Reinforcement Learning Algorithm.
Markov Decision Process.
What is Q-Learning?
Difference between Supervised Learning and Reinforcement Learning.
Applications of Reinforcement Learning.
Conclusion.

Terms used in Reinforcement Learning

Agent():?An entity that can perceive/explore the environment and act upon it.
Environment():?A situation in which an agent is present or surrounded by. In RL, we assume the stochastic environment, which means it is random in nature.
Action():?Actions are the moves taken by an agent within the environment.
State():?State is a situation returned by the environment after each action taken by the agent.
Reward():?A feedback returned to the agent from the environment to evaluate the action of the agent.
Policy():?Policy is a strategy applied by the agent for the next action based on the current state.
Value():?It is expected long-term retuned with the discount factor and opposite to the short-term reward.
Q-value():?It is mostly similar to the value, but it takes one additional parameter as a current action (a).

Key Features of Reinforcement Learning

In RL, the agent is not instructed about the environment and what actions need to be taken.
It is based on the hit and trial process.
The agent takes the next action and changes states according to the feedback of the previous action.
The agent may get a delayed reward.
The environment is stochastic, and the agent needs to explore it to reach to get the maximum positive rewards.

Approaches to implement Reinforcement Learning

There are mainly three ways to implement reinforcement-learning in ML, which are:

1.Value-based: The value-based approach is about to find the optimal value function, which is the maximum value at a state under any policy. Therefore, the agent expects the long-term return at any state(s) under policy π.

2.Policy-based:

Policy-based approach is to find the optimal policy for the maximum future rewards without using the value function. In this approach, the agent tries to apply such a policy that the action performed in each step helps to maximize the future reward. The policy-based approach has mainly two types of policy:

Deterministic:?The same action is produced by the policy (π) at any state.
Stochastic:?In this policy, probability determines the produced action.

3.Model-based:?In the model-based approach, a virtual model is created for the environment, and the agent explores that environment to learn it. There is no particular solution or algorithm for this approach because the model representation is different for each environment.

Elements of Reinforcement Learning

There are four main elements of Reinforcement Learning, which are given below:

Policy
Reward Signal
Value Function
Model of the environment

1) Policy:?A policy can be defined as a way how an agent behaves at a given time. It maps the perceived states of the environment to the actions taken on those states. A policy is the core element of the RL as it alone can define the behavior of the agent. In some cases, it may be a simple function or a lookup table, whereas, for other cases, it may involve general computation as a search process. It could be deterministic or a stochastic policy:

For deterministic policy: a = π(s)

For stochastic policy: π(a | s) = P[At =a | St = s]

2) Reward Signal:?The goal of reinforcement learning is defined by the reward signal. At each state, the environment sends an immediate signal to the learning agent, and this signal is known as a?reward signal. These rewards are given according to the good and bad actions taken by the agent. The agent's main objective is to maximize the total number of rewards for good actions. The reward signal can change the policy, such as if an action selected by the agent leads to low reward, then the policy may change to select other actions in the future.

3) Value Function:?The value function gives information about how good the situation and action are and how much reward an agent can expect. A reward indicates the?immediate signal for each good and bad action, whereas a value function specifies?the good state and action for the future. The value function depends on the reward as, without reward, there could be no value. The goal of estimating values is to achieve more rewards.

4) Model:?The last element of reinforcement learning is the model, which mimics the behavior of the environment. With the help of the model, one can make inferences about how the environment will behave. Such as, if a state and an action are given, then a model can predict the next state and reward.

The model is used for planning, which means it provides a way to take a course of action by considering all future situations before actually experiencing those situations. The approaches for solving the RL problems?with the help of the model?are termed as the?model-based approach. Comparatively, an approach?without using a model?is called a?model-free approach.

How does Reinforcement Learning Work?

To understand the working process of the RL, we need to consider two main things:

Environment:?It can be anything such as a room, maze, football ground, etc.
Agent:?An intelligent agent such as AI robot.

Types of Reinforcement learning

There are mainly two types of reinforcement learning, which are:

Positive Reinforcement
Negative Reinforcement

Positive Reinforcement:

The positive reinforcement learning means adding something to increase the tendency that expected behavior would occur again. It impacts positively on the behavior of the agent and increases the strength of the behavior.

This type of reinforcement can sustain the changes for a long time, but too much positive reinforcement may lead to an overload of states that can reduce the consequences.

Negative Reinforcement:

The negative reinforcement learning is opposite to the positive reinforcement as it increases the tendency that the specific behavior will occur again by avoiding the negative condition.

It can be more effective than the positive reinforcement depending on situation and behavior, but it provides reinforcement only to meet minimum behavior.

How to represent the agent state?

领英推荐

Reinforcement Learning

Bluechip Technologies Asia 10 个月前

Reinforcement Learning: How Machines Teach Themselves

Bluechip Technologies Asia 2 个月前

Reinforcement Learning in Modern AI Applications and…

Pratibha Kumari J. 8 个月前

We can represent the agent state using the?Markov State?that contains all the required information from the history. The State St is Markov state if it follows the given condition:

P[St+1 | St ] = P[St +1 | S1,......, St]

The Markov state follows the?Markov property, which says that the future is independent of the past and can only be defined with the present. The RL works on fully observable environments, where the agent can observe the environment and act for the new state. The complete process is known as Markov Decision process, which is explained below:

Markov Decision Process

Markov Decision Process or MDP, is used to?formalize the reinforcement learning problems. If the environment is completely observable, then its dynamic can be modeled as a?Markov Process. In MDP, the agent constantly interacts with the environment and performs actions; at each action, the environment responds and generates a new state.

MDP is used to describe the environment for the RL, and almost all the RL problem can be formalized using MDP.

MDP contains a tuple of four elements (S, A, Pa, Ra):

A set of finite States S
A set of finite Actions A
Rewards received after transitioning from state S to state S', due to action a.
Probability Pa.

MDP uses?Markov property, and to better understand the MDP, we need to learn about it.

Markov Property:

It says that?"If the agent is present in the current state S1, performs an action a1 and move to the state s2, then the state transition from s1 to s2 only depends on the current state and future action and states do not depend on past actions, rewards, or states."

Or, in other words, as per Markov Property, the current state transition does not depend on any past action or state. Hence, MDP is an RL problem that satisfies the Markov property. Such as in a?Chess game, the players only focus on the current state and do not need to remember past actions or states.

Finite MDP:

A finite MDP is when there are finite states, finite rewards, and finite actions. In RL, we consider only the finite MDP.

Markov Process:

Markov Process is a memoryless process with a sequence of random states S1, S2, ....., St?that uses the Markov Property. Markov process is also known as Markov chain, which is a tuple (S, P) on state S and transition function P. These two components (S and P) can define the dynamics of the system.

Reinforcement Learning Algorithms

Reinforcement learning algorithms are mainly used in AI applications and gaming applications. The main used algorithms are:

Q-Learning:
Q-learning is an?Off policy RL algorithm, which is used for the temporal difference Learning. The temporal difference learning methods are the way of comparing temporally successive predictions.
It learns the value function Q (S, a), which means how good to take action "a" at a particular state "s."
The below flowchart explains the working of Q- learning:

State Action Reward State action (SARSA):
SARSA stands for?State Action Reward State action, which is an?on-policy?temporal difference learning method. The on-policy control method selects the action for each state while learning using a specific policy.
The goal of SARSA is to calculate the?Q π (s, a) for the selected current policy π and all pairs of (s-a).
The main difference between Q-learning and SARSA algorithms is that?unlike Q-learning, the maximum reward for the next state is not required for updating the Q-value in the table.
In SARSA, new action and reward are selected using the same policy, which has determined the original action.
The SARSA is named because it uses the quintuple?Q(s, a, r, s', a').?Where,
?????s: original state
?????a: Original action
?????r: reward observed while following the states
?????s' and a': New state, action pair.
Deep Q Neural Network (DQN):
As the name suggests, DQN is a?Q-learning using Neural networks.
For a big state space environment, it will be a challenging and complex task to define and update a Q-table.
To solve such an issue, we can use a DQN algorithm. Where, instead of defining a Q-table, neural network approximates the Q-values for each action and state.

Now, we will expand the Q-learning.

Q-Learning Explanation:

Q-learning is a popular model-free reinforcement learning algorithm based on the Bellman equation.
The main objective of Q-learning is to learn the policy which can inform the agent that what actions should be taken for maximizing the reward under what circumstances.
It is an?off-policy RL?that attempts to find the best action to take at a current state.
The goal of the agent in Q-learning is to maximize the value of Q.
The value of Q-learning can be derived from the Bellman equation. Consider the Bellman equation given below:

To perform any action, the agent will get a reward R(s, a), and also he will end up on a certain state, so the Q -value equation will be:

Hence, we can say that,?V(s) = max [Q(s, a)]

The above formula is used to estimate the Q-values in Q-Learning.

What is 'Q' in Q-learning?

The Q stands for?quality?in?Q-learning, which means it specifies the quality of an action taken by the agent.

Q-table:

A Q-table or matrix is created while performing the Q-learning. The table follows the state and action pair, i.e., [s, a], and initializes the values to zero. After each action, the table is updated, and the q-values are stored within the table.

The RL agent uses this Q-table as a reference table to select the best action based on the q-values.

Difference between Reinforcement Learning and Supervised Learning & Unsupervised Learning

Reinforcement Learning Applications

Robotics: RL is used in?Robot navigation, Robo-soccer, walking, juggling, etc.
Control: RL can be used for?adaptive control?such as Factory processes, admission control in telecommunication, and Helicopter pilot is an example of reinforcement learning.
Game Playing: RL can be used in?Game playing?such as tic-tac-toe, chess, etc.
Chemistry: RL can be used for optimizing the chemical reactions.
Business: RL is now used for business strategy planning.
Manufacturing:?In various automobile manufacturing companies, the robots use deep reinforcement learning to pick goods and put them in some containers.
Finance Sector: The RL is currently used in the finance sector for evaluating trading strategies.

Why use Reinforcement Learning?

Here are prime reasons for using Reinforcement Learning:

It helps you to find which situation needs an action
Helps you to discover which action yields the highest reward over the longer period.
Reinforcement Learning also provides the learning agent with a reward function.
It also allows it to figure out the best method for obtaining large rewards.

Conclusion:

I am "Baishalini Sahu" Doing research & Development in Data Science areas like Supervised , Unsupervised & Reinforcement learning. From the above discussion, we can say that Reinforcement Learning is one of the most interesting and useful parts of Machine learning. Despite training difficulties, reinforcement learning finds its way to be effectively used in real business scenarios. Generally, RL is valuable when searching for optimal solutions in a constantly changing environment is needed.

Reinforcement learning is used for operations automation, machinery and equipment control and maintenance, energy consumption optimization. The finance industry also acknowledged the capabilities of reinforcement learning for powering AI-based training systems. Although trial-and-error training of robots is time-consuming, it allows robots to better evaluate real-world situations, use their skills for completing tasks, or reacting to unexpected consequences appropriately. In addition, RL provides opportunities for eCommerce players in terms of revenue optimization, fraud prevention, and customer experience enhancement via personalization.

Sitanshu Sahu

Big Data Engineer at Capgemini

2 年

Helpful! This will

1 次回应

要查看或添加评论，请登录

Baishalini Sahu的更多文章

Project Title: Enhancing Cybersecurity with Artificial Intelligence

2024年10月29日

Project Title: Enhancing Cybersecurity with Artificial Intelligence

Introduction Objective: Explore how AI can improve cybersecurity measures, detect threats, and respond to incidents…

2 条评论
Data Science Road Map 2024

2024年1月4日

Data Science Road Map 2024

Becoming a successful data scientist requires a combination of education, skills development, practical experience, and…
How AWS EC2 needful for AI ML and high performance computing applications with powerful GPUs

2022年10月10日

How AWS EC2 needful for AI ML and high performance computing applications with powerful GPUs

In this post will discuss, How to get deeper insights from your data while lowering costs with AWS machine learning…

2 条评论
How Open AI Changing the world ?

2022年10月5日

How Open AI Changing the world ?

OpenAI is an artificial intelligence (AI) research laboratory consisting of the for-profit corporation OpenAI LP and…
Reinforcement Learning - Applications & Area of focus

2022年10月4日

Reinforcement Learning - Applications & Area of focus

1. Robotics – In robotics, the ultimate goal of reinforcement learning is to endow robots with the ability to learn…

1 条评论
Advance Predictive Analytics usecase

2021年12月24日

Advance Predictive Analytics usecase

Project Title - Dementia Analysis And Prediction Technologies - Artificial Intelligence With Deep Learning Domain –…
Artificial Intelligence powered for Retail in 2021: Real-World Use Cases

2021年8月25日

Artificial Intelligence powered for Retail in 2021: Real-World Use Cases

Introduction For decades, traditional analytics have worked perfectly fine for the data-driven retail industry…
What evaluation approaches would you work to deal with the effectiveness of a machine learning model

2021年7月22日

What evaluation approaches would you work to deal with the effectiveness of a machine learning model

why need of evaluate machine learning model ? Machine learning continues to be an increasingly integral component of…

6 条评论
Databricks with Machine Learning flow all in one solution #2021

2021年6月17日

Databricks with Machine Learning flow all in one solution #2021

Over view Databricks Machine Learning is an integrated end-to-end machine learning environment for experiment tracking,…

5 条评论
Top 10 Automated Machine Learning(Auto ML) tools used in 2020-2021

2021年6月6日

Top 10 Automated Machine Learning(Auto ML) tools used in 2020-2021

How AutoML developed and its work flow? AutoML (automated machine learning) refers to the automated end-to-end process…

8 条评论

See all articles

AI Reinforcement Learning Overview

Baishalini Sahu

What is Reinforcement Learning?

How does it compare with other ML techniques?

Terms used in Reinforcement Learning

Key Features of Reinforcement Learning

Approaches to implement Reinforcement Learning

Elements of Reinforcement Learning

How does Reinforcement Learning Work?

Types of Reinforcement learning

领英推荐

Markov Decision Process

Reinforcement Learning Algorithms

Difference between Reinforcement Learning and Supervised Learning & Unsupervised Learning

Reinforcement Learning Applications

Why use Reinforcement Learning?

Conclusion:

Baishalini Sahu的更多文章

社区洞察

其他会员也浏览了

How to Reduce Risk and Time-to-Market in Deep Learning Development

FrameFlow: Folding Proteins with the Flow, Not the Fold!

Advancing Deep Fake Detection Through Multi-Task Learning: An In-Depth Analysis

Types of Machine Learning: Layman's Terms Guide.

Reinforcement Learning: AI’s Autonomous Evolution

GENERAL KEYWORDS AND TERMS USED IN DEEP LEARNING:

Zero-shot learning and the foundations of generative AI

Understanding and Optimizing Generalization in Contextual Reinforcement Learning: A Deep Dive into Model-Based Transfer Learning (MBTL).

Introduction to Machine Learning

Human-Guided Reinforcement Learning: Exploring Techniques, Real-World Applications, and Ethical Implications

What is Reinforcement Learning?

How does it compare with other ML techniques?

Terms used in Reinforcement Learning

Key Features of Reinforcement Learning

Approaches to implement Reinforcement Learning

Elements of Reinforcement Learning

How does Reinforcement Learning Work?

Types of Reinforcement learning

领英推荐

Markov Decision Process

Reinforcement Learning Algorithms

Difference between Reinforcement Learning and Supervised Learning & Unsupervised Learning

Reinforcement Learning Applications

Why use Reinforcement Learning?

Conclusion:

Baishalini Sahu的更多文章

Project Title: Enhancing Cybersecurity with Artificial Intelligence

Data Science Road Map 2024

How AWS EC2 needful for AI ML and high performance computing applications with powerful GPUs

How Open AI Changing the world ?

Reinforcement Learning - Applications & Area of focus

Advance Predictive Analytics usecase

Artificial Intelligence powered for Retail in 2021: Real-World Use Cases

What evaluation approaches would you work to deal with the effectiveness of a machine learning model

Databricks with Machine Learning flow all in one solution #2021

Top 10 Automated Machine Learning(Auto ML) tools used in 2020-2021

社区洞察

其他会员也浏览了

How to Reduce Risk and Time-to-Market in Deep Learning Development

FrameFlow: Folding Proteins with the Flow, Not the Fold!

Advancing Deep Fake Detection Through Multi-Task Learning: An In-Depth Analysis

Types of Machine Learning: Layman's Terms Guide.

Reinforcement Learning: AI’s Autonomous Evolution

GENERAL KEYWORDS AND TERMS USED IN DEEP LEARNING:

Zero-shot learning and the foundations of generative AI

Understanding and Optimizing Generalization in Contextual Reinforcement Learning: A Deep Dive into Model-Based Transfer Learning (MBTL).

Introduction to Machine Learning

Human-Guided Reinforcement Learning: Exploring Techniques, Real-World Applications, and Ethical Implications