Journey to machine learning - part 3 Building an AI bot to play Supermario

Journey to machine learning - part 3 Building an AI bot to play Supermario

Hello world !

This is part-3 of my machine learning series.Before diving, we have to make sure whether our diving gears are working properly, we need to know about,

where are we heading to?

where were we?

make sure to go through part - 1 and part -2 to go with the flow.

Before digging things, I would like to remind a small thing from this beautiful movie "The Shawshank Redemption".

No alt text provided for this image

Andy Dufresne, the lead role, digs a tunnel for a couple of decades with a 7inch hammer to escape through the walls of his prision cell. Inch by inch perseverance! No rush ! And that's how this series is going to be, slowly and steadily , grasping things and descend.most of the parts might seems vague or irrelevant but in the end the intention is to lay a strong basement.

Okay, let's pick up things from where we left, we need to recall some of the basics of, this math concept , Probability to go with the flow.

Took this screen shot from Byjus.

No alt text provided for this image

Let's keep this formula table for our reference if we need to refer things.Like we saw in part -2 , we need to grasp some probability concepts which is the heart of Reinforcement Learning.

The general formula of probability is ,

Probability of finding an event = (Number of favorable outcomes) / (Total Number of possible outcomes)

The formula is,

P(A) = N(E)/N(S)

P(A) => Probability of an event

N(E) => Number of favorable outcomes

N(S) => Set of all possible outcomes for an event.

The Probability of occurrence of an event is P(A).From that we can derive the probability of non-occurence of an event which is P(A'). P(A') = 1 - P(A).Now somehow we can correlate what would be the symbol " s' " supposed to mean in the RL formula.might be a P(non-occurence) again "might be" we aren't sure yet.The things we are about to learn are to correlate with Reinforcement learning's formula.

No alt text provided for this image

Let's solve a simple problem to grasp things even more.The "hello world" in probability is finding the probability of getting head or tail after tossing a coin. Let's do our "hello word" !

When a coin is tossed the possible outcomes are either getting a head or a tail.No of possible outcomes P(S) = 2 (head or tail). To find the probability of getting a head P(H), we have to use the basic formula P(H) = fav outcomes / set of possible outcomes.so, the number of favorable outcomes for the head is 1. so, P(H) = 1/2.Okay, the track here might looks completely irrelevant to what we are trying to achieve.why on earth we need to study probability? the intention is, making a strong basement ! period.so that what we can construct as many storey as our wish.

Let's solve the same problem with two coins.What is the probability of getting head P(H) when a coin is tossed twice.

Now the possbile outcomes P(S) are ,

HH, HT, TH, TT => H is head and T is Tail

The possible outcomes P(S) is 4.

Now, looking at the P(S) we can say what would be the probability of getting at least one head P(H).

Number of possible outcomes P(S) = HH, HT, TH, TT => 4

Number of favorable outcomes (getting one head) P(H) = HH, HT, TH => 3

Now, as per the formula

P(H) = fav outcomes / set of possible outcomes

Which is P(H) = 3/4.

Now we can calculate the probability of getting the same face P(SF)

P(SF) = 2/4

Cause we know,

Number of favorable outcomes = HH, TT(same face) => 2

Total number = HH,TT,TH,HT which is 4.

Done, Crystal clear , we somehow dust out the probability concepts.But keep in mind, this is just a problem at very embryonic stage , which is going to be one of the brick in our wall. Now we laid one of our brick perfectly!

Alright, now I need to know about the following things which are relevant to probability:

1. Probability Distribution

2. Transition Probability

Question : Why would we need to know about the above concepts ? Answer : Cause Markov decision process (as we saw in part - 2) depends on those two concepts.Let's see one by one.

1.Probability Distribution:

A probability distribution is a statistical function that describes all the possible values and likelihoods that a random variable can take within a given range.A very vague and boring definition which I stole from some random site makes the series even slow, isn't it ? Just keep this definition in mind, we are yet to connect this definition to the things which we already know.

Remember this picture we saw on part - 2 ?

No alt text provided for this image

like we saw earlier while progressing MDP(Markov Decision Process) a state S and reward R will be keep on changing with time 't'.which can be notated as St and Rt.Here, St and Rt are random variables cause time t will vary from t=0,1,2,3,4,.... so on.Our states' set S and rewards R does have a starting and ending point St+1 and Rt+1 instead of Infinite values.And when a set is containing finite values then that type of set is called finite set !.

Here,

St and Rt => random variables

S and R => finite set (within a range).

Now , go back to Probability distribution's definition, read it again , just try to correlate the highlighted texts "random variable" and "within a range" with the definition of Probability Distribution.We can somehow assume things, its okay to go with assuming, once we done connecting the dots, things will be crystal clear.

After we done correlating things with probability distribution we can say that, rewards at a time t, Rt and states with time t, St have probability distribution.Confusing ? Just read the definition of Probability distribution and the highlighted texts "random variable" , "within a range".

Alright, now coming to point no 2 . What is transition probability ?

2.Transition Probability :

As the name describes "transition", The probability that an agent (A) moves from one state to another is called "transition probability" which is one of the building blocks of MDP.

Before proceeding further we need to understand this symbol '|' .

Question : Why would we need to know about that ?

Answer : Cause transition probability have that symbol('|') and we have know clue what's that supposed to mean !.

That symbol ('|') is the notation for the term "given".i.e P(A|B) means Probability of 'A' given 'B'.That is, the probability of event 'B' is given which means, it's already happened, now what would be the event 'A'. Now we don't need to panic whenever we see this symbol ('|') like here in the Reinforcement learning's formula.

No alt text provided for this image

so where were we ? yeah, we need to define the transition probability(TP) of our agent.We know what transition probability is, to the "somehow" extend.With that in mind,let's progress but before that Question : Why would we need to find the Transition probability?

Answer : Our Agent (super mario) will move from one state to another (one place to another place in the game) and we need to know about the Transition probability of it.

a small recap from part -2 to connect the dots,

Our character moves from a state St to another state St+1with the reward Rt+1.we need to find the transition probability of range starting from a state s to a destination s'.

s => Source State

s' => Destination State

Now, the destination state s' will have rewards r where the destination state s' is within the set of all states S and r is within the set of all rewards R. which can be denoted as s'S and rR. This symbol '∈' means "belongs to", since I want this series to be filled with layman's definitions I would like to shed light on even tiny details which might be familiar sometimes.

All right, the notation to define the probability of transition, to a state s'(destination) with rewards "r" while taking an action "a" in a state "s"(source state) is, Probability of destination state with rewards given source state with an action.Let's convert the highlighted texts to notations

s' => destination state

r => rewards on destination state

s => source state

a => action

"|" => symbol for "given" (which we saw earlier)

Let's combine all those notations like how Professor Utonium combined things while making power puff girls.

No alt text provided for this image



Sugar, spice and everything nice ! Tada !

p(s',r | s, a)

But, Hangon, the notation p(s',r | s, a) don't have a RHS ? Ofcourse, SOME RHS is there,we are putting that SOME RHS on hold to avoid stack overflow.

Now let's look back the notation what we got after combining the things we know in Professor Utonium's style, p(s',r | s, a) . Haven't we seen this somewhere ? Ofcourse it is ! That's what one of the part on the RHS of the Reinforcement learning. Bazzinga !

No alt text provided for this image


And now we can somehow assume things about one part on the RHS of the Reinforcement's formula ! Once we clear the entire math behind it we can start out coding straight away with a clear mind ! I have Miles to go !






要查看或添加评论,请登录

Sampath Kumar的更多文章

社区洞察

其他会员也浏览了