登录查看更多内容

Markov Decision Process

Devika D.

Marketing Lead at Cognida.ai

发布日期: 2023年1月13日

Definition

The Markov Decision Process (MDP) is a mathematical framework or model which can be used for decision-making in discrete, stochastic, sequential environments. MDP, like the ?Markov chain (MC), attempts to predict the future state based only on the information provided by the current state. In addition to that, MDP incorporates the characteristics of actions and motivations. At each time step during future processes, the decision maker, or agent, within an environment takes an action on the available current state. In response to such action, the environment changes state randomly and affects the immediate reward obtained by the decision maker.

No alt text provided for this image — Image credits: Quora.com

MDP Model

MDPs are commonly used to describe dynamical systems and represent environment in the Reinforcement Learning (RL) framework.

?An MDP is a tuple < S, A,P, R, γ >

? S: The set of states.

? A: The set of actions.

? P: The set of transition probability.

? R: The set of immediate rewards associated with the state-action pairs.

? 0 ≤ γ ≤ 1: Discount factor.

The Agent (decision maker) interacts continually with its?Environment?by performing?actions?sequentially at each discrete time step. As the state of the Environment changes, the interaction of the Agent with the Environment changes. Therefore, the Agent gets a numerical reward from its Environment.?

Transition Probability

A Markov Process is defined by (S, P) where S are the states, and P is the state-transition probability. This process consists of series of random states S?, S?, etc. where all states obey the Property.

The transition probability describes the dynamics of the MDP. It shows the transition probability from all states s to all successor states s 0 for each action a. P is the set of transition probability with na matrices each of dimension ns × ns where the s, s0 entry reads

[P a ]ss0 = p[st+1 = s 0 |st = s, at = a].

?One can verify that the row sum is equal to one.

Data & Analytics 5 个月前

Empirical Optimization with Divergent Fixed Point…

Vincent Granville 1 年前

Poisson-binomial Stochastic Processes: Introduction…

Vincent Granville 2 年前

Application of Markov Chain

I would like to cite an example of real-world application of Markov Chain given by Prateek Sharma &?Priya Chetty ?

Markov chain and its use in solving real world problems (projectguru.in)

Suppose there are 2 types of weather in an area, ‘sunny’ and ‘cloudy’. In their broadcast, a news channel wants to predict about the weather for the next week.?

The channel hires a weather forecast company to find out the weather for the next few weeks. ?Currently, the weather is ‘sunny’ in that area.

The probabilities for the following week are as given below:

Staying ‘sunny’ the following week = 80%.
Changes from ‘sunny’ to ‘cloudy’ over a week = 20%
Staying cloudy in the following week = 70%
Changes from ‘cloudy’ to ‘sunny’ over a week is 30%

Although, it is predicted to be ‘sunny’ the whole week, one cannot be fully sure about the next week without making some transition calculations.

The matrix below explains the transition:

Current State * Transition Matrix = Final State

S=Sunny; C= Cloudy

We conclude, there is an 80% chance that next week will be ‘sunny’. However, there is a 20% chance that next week, the weather may become cloudy. This calculation is called the Markov chain.

If transition matrix doesn’t change with time, one can also predict the weather for further weeks using the same equation.?

MDP in AI/ML

A?machine learning (ML)?algorithm may be tasked with an optimization problem. Using?reinforcement learning (RL), the algorithm tries to optimize actions taken within an environment, by maximizing the potential reward. Supervised learning techniques require correct pairs of input/output to create a model, while RL uses MDPs to achieve an optimal balance of exploitation and exploration. In case of unspecified/unknown probabilities and rewards, ML may use RL through MDP.

Markov Decision Process

Devika D.

Marketing Lead at Cognida.ai

Definition

MDP Model

Transition Probability

领英推荐

Application of Markov Chain

MDP in AI/ML

更多精彩文章

社区洞察

其他会员也浏览了

Mastering Logistic Regression

Supervised Algorithm Cheat Sheet

The General Routing Problem

Why Mean Squared Error for Linear Regression?

Linear Regression. Making Sense Of The Future Based On The Past.

Mastering Stream Processing - Windowing time semantics

Unified Convergence Analysis of Nonconvex Randomized Block Coordinate Descent Methods

Representing a Problem's State-Space

Symbolic Regression: Deciphering Nature's Equations

Definition

MDP Model

Transition Probability

领英推荐

Application of Markov Chain

MDP in AI/ML

The 'Eureka' Moment!

2023年8月20日

The 'Cutting Edge'

2023年7月28日

Personal Financing - Start Early To Reap The Benefits

2023年4月6日

Language tells a lot about culture.

2023年3月7日

Generative AI: The Modern Buzzword

2023年1月11日

Who is an ARTIST?

2022年12月21日

Test Automation Framework & Popular Open-Source Cloud Testing Tools

2022年10月13日

Juran's Trilogy of Quality Management

2022年9月27日

Your Friend Is Your Subordinate?

2022年9月15日

Your Friend Is Your Boss?

2022年9月12日

社区洞察

其他会员也浏览了

Mastering Logistic Regression

Supervised Algorithm Cheat Sheet

The General Routing Problem

Why Mean Squared Error for Linear Regression?

Linear Regression. Making Sense Of The Future Based On The Past.

Mastering Stream Processing - Windowing time semantics

Unified Convergence Analysis of Nonconvex Randomized Block Coordinate Descent Methods

Representing a Problem's State-Space

Symbolic Regression: Deciphering Nature's Equations