登录查看更多内容

Journey to machine learning - part 2 Building an AI bot to play Supermario

Sampath Kumar

Backend engineering | ML/AI | containers | kubernetes | Automations

发布日期: 2020年5月3日

Hello world ! This is part-2 of my machine learning series.

Let's start digging things with out further ado.I need to understand the mathematical concepts behind our progress.The main reason behind that intention is, simple, I just want to know what exactly happening when am doing

from sklearn.tree import DecisionTreeClassifier

One can write a book about Decision tree classifier algorithm which is just an import away while using machine learning libraries.well, on the other hand I don't want to create a library from the math behind it.my perspective here is something similar to this, I just tasted a recipe at a restaurant and now am so curious about the ingredients behind it, at the same time I don't want to cook that recipe as well.That curiosity is just a fine line between importing a library and understanding the math behind it.Nothing fancy ! Just to get a glimpse about behind the scenes.

Okay, one might get the following images on the first page of google results when they search things regarding Reinforcement learning.

The first image looks completely UFO to me, to put things into perspective, the first image which contains some UFO things is the mathematical concept behind Reinforcement learning.Let's progress with an attitude , "This is Sparta !!!!!!!"

If I would like to describe about my interest in math in a single sentence it would be this "I regret bunking my math classes back in my college days".

I though learning engineering mathematics is a waste of time, which was a big mistake.Love is everywhere they said, nope ! it's maths !

Okay jokes apart, Now coming to point no.1 on part - 1

What is Reinforcement learning(RL)/ Why I picked this approach ?

so, with a layman's definition and from a machine learning newbie's perspective, "Reinforcement learning is a type of machine learning approach which will let our program/neural-network/model to make a sequence of decisions and pick up a positive outcome out of it".

And coming to "Why I picked this approach" ? :

Cause, I want my program/neural-network/model to learn to play a game by itself rather than letting a human to play the game with success scenarios and use that results as a training data set to train my neural network which will comes under a type of machine learning approach named "Supervised learning".

Okay, let's talk some machine learning terms here, in Reinforcement learning an "agent" will do certain "actions" in an "environment" which will be measured by "states" with "rewards" along with the flow.That's the answer to how Reinforcement learning will pick up good decisions out of all, using "rewards".

Let's correlate those terms with our use case:

Agent => SuperMario character

Actions => Running/Jumping/Shooting

Environment => Game

States => Position of our super mario character in the game world.

Rewards => Scores(by surviving/collecting coins/ killing enemies)

These RL terms reminds me of, me training my dog to do certain things.I give commands to my dog and let her to do something.If she did something in an expected way I will give her a treat and if the actions are supposed to be too good then the treat will be more !

Now, somehow we can relate the list of things we've observed so far with the second image.But still we are not there, lets progress ! This is Sparta !!!!!!

Alright, since we were keep on hearing things about "decisions" again and again in Reinforcement learning and that's where Markov Decision Process come in to the picture.which is our point no.3 on part - 1

What is Markov decision process ? (RL is relying on this)

Reinforcement learning(RL) is relying on Markov Decision process (MDP) to work around with the decision making processes.

Markov Decision process is a part of mathematical concept called "probability".I had "probability and queuing theory" in my 4th semester so somehow I can reckon things.so, Probability is the entire math concept behind RL and that brings answer to point no.2 on part - 1

What is the math behind RL ?

Our supermario character will have certain scenarios when everytime an action takes place.The actions will be "running" and "Jumping" for now.These scenarios are the "states".so, n number of states will be in our game:

state1, state2, state3,...... staten

s1, s2, s3, ..... sn.

Each state from s1 - sn will earn certain rewards which is the score our bot will collect while playing the game.so, n number of rewards will be notated as

r1, r2,r3,.....rn.

Question : Why on earth we are framing few things here like the one above ?

Answer : so that we can relate things to the math formula of RL.

The formula of RL is quite intimidating but at the end it is just a formula with some notations which are unknown to me.with that in mind,let's understand MDP.since it's the core math concept which RL is relying on.Made a rough workout will enhance more as time moves on.

Now, from the above image time "t" can be incremented from one state to another which can be denoted by "t+1" so the new states can be denoted as "St+1" ('S' is state's notation) and rewards as "Rt+1" ('R' is reward's notation).Now we can observe few more things from the second image which we saw earlier

The repeated images, terms and notations might be vague but that's how we need to connect the dots to fasten the learning process.

Here, our bot needs to understand what sort of action performed on states will get maximum rewards. To analyze this we need to maintain the rewards (r) earned on the progress of each state (s).Each state will have rewards (r) and one more value called "v" which will represent the rewards we get from other state and that's where "V" come into the formula. Now we need to compute V(s) aka "Value function" which is denoted in the formula as V(s).our agent can do various actions(a) on a state(s) and the process of finding out the probability of picking up a particular action (a) from a particular state(s) is called "policy" which is denoted in the formula as "π" => "pi".

Again the above paragraph might be vague, but just keep on reading it for few times till you get some assumptions.We will dig deeper into value function so for now just assume things !

RL is nothing but finding "value function" (V(s)) and "policy" (π) which is finding the probability of picking up a particular action from a state and how good that particular action / particular state respectively.Sounds like we heard that somewhere isn't it ? And that's what we got on the LHS of the formula ! phew !

Again lets connect the dots from above formula !

V(s) => Value function
"π" => Policy
"a" => Action
"s" => State

To continue with the flow we really really need to grasp few more things regarding "Probability" which is on top of all these things at the end.

Like a cow's cud am going to recall the "Probability" concepts I studied aka "bunked" in my college days.I need to understand the basics of probability so that I can go with the flow and grasp things on the RHS of the formula and all the upcoming things without much hassle. I have Miles to go !!

要查看或添加评论，请登录

Sampath Kumar的更多文章

Introduction to kubernetes the spartans way !

2021年3月23日

Introduction to kubernetes the spartans way !

Okay, first things first, What is Kubernetes? So, we've been repeatedly hearing about this term "Kubernetes" these…
part - 4 Picking up the tool

2020年5月17日

part - 4 Picking up the tool

Hello world ! This is part -4 of my layman's approach to machine learning series. Without further ado lets dig things.
Journey to machine learning - part 3 Building an AI bot to play Supermario

2020年5月10日

Journey to machine learning - part 3 Building an AI bot to play Supermario

Hello world ! This is part-3 of my machine learning series.Before diving, we have to make sure whether our diving gears…
Journey to machine learning - part I Building an AI bot to play Supermario

2020年4月26日

Journey to machine learning - part I Building an AI bot to play Supermario

Hello world ! This weekend I picked up a machine learning project to work around and I'm thinking of jotting down my…

1 条评论
Products for me by me !

2020年1月6日

Products for me by me !

"Product for me by me" is going to be sequence of blog posts where im going to rant about my progress regarding the…

See all articles

Journey to machine learning - part 2 Building an AI bot to play Supermario

Sampath Kumar

Backend engineering | ML/AI | containers | kubernetes | Automations

Sampath Kumar的更多文章

社区洞察

其他会员也浏览了

Understanding Machine Learning Algorithms: A Beginner’s Guide

Understanding Machine Learning: Concepts, Methods, and Challenges

Machine Learning Mastery: Tips for Beginners and Experts Alike

How to learn machine learning and deep learning (AI) based on your high school maths

Beginner's Guide to Machine Learning: Start Here

Machine Learning Basics

Machine Learning: A Bird's Eye View

Common machine Learning Algorithms

Conquer Machine Learning: A Structured Roadmap with Resources and Kaggle Winning Solutions

Sampath Kumar的更多文章

Introduction to kubernetes the spartans way !

part - 4 Picking up the tool

Journey to machine learning - part 3 Building an AI bot to play Supermario

Journey to machine learning - part I Building an AI bot to play Supermario

Products for me by me !

社区洞察

其他会员也浏览了

Understanding Machine Learning Algorithms: A Beginner’s Guide

Understanding Machine Learning: Concepts, Methods, and Challenges

Machine Learning Mastery: Tips for Beginners and Experts Alike

How to learn machine learning and deep learning (AI) based on your high school maths

Beginner's Guide to Machine Learning: Start Here

Machine Learning Basics

Machine Learning: A Bird's Eye View

Common machine Learning Algorithms

Conquer Machine Learning: A Structured Roadmap with Resources and Kaggle Winning Solutions