Journey to machine learning - part 2 Building an AI bot to play Supermario
Hello world ! This is part-2 of my machine learning series.
Let's start digging things with out further ado.I need to understand the mathematical concepts behind our progress.The main reason behind that intention is, simple, I just want to know what exactly happening when am doing
from sklearn.tree import DecisionTreeClassifier
One can write a book about Decision tree classifier algorithm which is just an import away while using machine learning libraries.well, on the other hand I don't want to create a library from the math behind it.my perspective here is something similar to this, I just tasted a recipe at a restaurant and now am so curious about the ingredients behind it, at the same time I don't want to cook that recipe as well.That curiosity is just a fine line between importing a library and understanding the math behind it.Nothing fancy ! Just to get a glimpse about behind the scenes.
Okay, one might get the following images on the first page of google results when they search things regarding Reinforcement learning.
The first image looks completely UFO to me, to put things into perspective, the first image which contains some UFO things is the mathematical concept behind Reinforcement learning.Let's progress with an attitude , "This is Sparta !!!!!!!"
If I would like to describe about my interest in math in a single sentence it would be this "I regret bunking my math classes back in my college days".
I though learning engineering mathematics is a waste of time, which was a big mistake.Love is everywhere they said, nope ! it's maths !
Okay jokes apart, Now coming to point no.1 on part - 1
- What is Reinforcement learning(RL)/ Why I picked this approach ?
so, with a layman's definition and from a machine learning newbie's perspective, "Reinforcement learning is a type of machine learning approach which will let our program/neural-network/model to make a sequence of decisions and pick up a positive outcome out of it".
And coming to "Why I picked this approach" ? :
Cause, I want my program/neural-network/model to learn to play a game by itself rather than letting a human to play the game with success scenarios and use that results as a training data set to train my neural network which will comes under a type of machine learning approach named "Supervised learning".
Okay, let's talk some machine learning terms here, in Reinforcement learning an "agent" will do certain "actions" in an "environment" which will be measured by "states" with "rewards" along with the flow.That's the answer to how Reinforcement learning will pick up good decisions out of all, using "rewards".
Let's correlate those terms with our use case:
Agent => SuperMario character
Actions => Running/Jumping/Shooting
Environment => Game
States => Position of our super mario character in the game world.
Rewards => Scores(by surviving/collecting coins/ killing enemies)
These RL terms reminds me of, me training my dog to do certain things.I give commands to my dog and let her to do something.If she did something in an expected way I will give her a treat and if the actions are supposed to be too good then the treat will be more !
Now, somehow we can relate the list of things we've observed so far with the second image.But still we are not there, lets progress ! This is Sparta !!!!!!
Alright, since we were keep on hearing things about "decisions" again and again in Reinforcement learning and that's where Markov Decision Process come in to the picture.which is our point no.3 on part - 1
- What is Markov decision process ? (RL is relying on this)
Reinforcement learning(RL) is relying on Markov Decision process (MDP) to work around with the decision making processes.
Markov Decision process is a part of mathematical concept called "probability".I had "probability and queuing theory" in my 4th semester so somehow I can reckon things.so, Probability is the entire math concept behind RL and that brings answer to point no.2 on part - 1
- What is the math behind RL ?
Our supermario character will have certain scenarios when everytime an action takes place.The actions will be "running" and "Jumping" for now.These scenarios are the "states".so, n number of states will be in our game:
state1, state2, state3,...... staten
s1, s2, s3, ..... sn.
Each state from s1 - sn will earn certain rewards which is the score our bot will collect while playing the game.so, n number of rewards will be notated as
r1, r2,r3,.....rn.
Question : Why on earth we are framing few things here like the one above ?
Answer : so that we can relate things to the math formula of RL.
The formula of RL is quite intimidating but at the end it is just a formula with some notations which are unknown to me.with that in mind,let's understand MDP.since it's the core math concept which RL is relying on.Made a rough workout will enhance more as time moves on.
Now, from the above image time "t" can be incremented from one state to another which can be denoted by "t+1" so the new states can be denoted as "St+1" ('S' is state's notation) and rewards as "Rt+1" ('R' is reward's notation).Now we can observe few more things from the second image which we saw earlier
The repeated images, terms and notations might be vague but that's how we need to connect the dots to fasten the learning process.
Here, our bot needs to understand what sort of action performed on states will get maximum rewards. To analyze this we need to maintain the rewards (r) earned on the progress of each state (s).Each state will have rewards (r) and one more value called "v" which will represent the rewards we get from other state and that's where "V" come into the formula. Now we need to compute V(s) aka "Value function" which is denoted in the formula as V(s).our agent can do various actions(a) on a state(s) and the process of finding out the probability of picking up a particular action (a) from a particular state(s) is called "policy" which is denoted in the formula as "π" => "pi".
Again the above paragraph might be vague, but just keep on reading it for few times till you get some assumptions.We will dig deeper into value function so for now just assume things !
RL is nothing but finding "value function" (V(s)) and "policy" (π) which is finding the probability of picking up a particular action from a state and how good that particular action / particular state respectively.Sounds like we heard that somewhere isn't it ? And that's what we got on the LHS of the formula ! phew !
Again lets connect the dots from above formula !
- V(s) => Value function
- "π" => Policy
- "a" => Action
- "s" => State
To continue with the flow we really really need to grasp few more things regarding "Probability" which is on top of all these things at the end.
Like a cow's cud am going to recall the "Probability" concepts I studied aka "bunked" in my college days.I need to understand the basics of probability so that I can go with the flow and grasp things on the RHS of the formula and all the upcoming things without much hassle. I have Miles to go !!