ç™»å½•æŸ¥çœ‹æ›´å¤šå†…å®¹

Reinforcement learning with Neural Network use case

Steven Murhula

ML Engineer l Data Engineer l Scala l Python l Data Analysis l Big Data Development l SQL I AWS l ETL I GCP I Azure I Microservices l Data Science I Data Engineer I AI Engineer I Architect I Databricks I Java I Sql

å‘å¸ƒæ—¥æœŸ: 2018å¹´3æœˆ26æ—¥

Introduction :

Reinforcement learning is learning from interaction with the environment. Here the learneris called the Agent. Everything outside the Agent is called the Environment. The Agentperforms actions continuously and the Environment responds to all those actions and presents new situations to the Agent. Furthermore the environment gives feedback for all the actions, called a reward; it is a numeric value. The Agent goal is to maximize this reward. A complete specification of an environment defines

Moreover, the Agent and Environment interact at discrete time steps t=0,1,2,3,... At each time step t, the Agent receives some representation of the Environment state, St E S, where S is the set of possible states. On that basis, it selects an action, At E A(St), where A(St) is the set of actions available in state St. One time step later, in part as a consequence of its action,the Agent receives a numerical reward, Rt+1 E R, and finds itself in a new state, St+1:

Let's take an example of GridWorld; people who are into reinforcement learning love to think about it. This GridWorld shown in Figure 1.9 is a 3 X 4 grid. For the purpose of this discussion, think that the world is a kind of game; you start from a state that is called start state and you are able to execute actions, in this case, up, down, left, and right. Here, the green square represents your goal, the red square represents failure, and the black square is where you cannot enter. It actually acts as a wall. If you reach the green square (goal), the world is over and you begin from the start state again. The same holds for the red square. If you reach the red square (failure), the world is over and you have to start over again. This means you cannot go through the red square to get to the green square.The purpose here is to roam around this world in such a way that eventually you reach the goal state and under all circumstances you avoid the red spot. Here you can go up, down, left, and right but if you are on the boundary state such as (1,3) and you try to go up or left, you just stay where you are. If you try to go right, you actually end up in the next square

What is the shortest sequence of actions that gets us from start state to goal state? There are two options: 1.Up, up, right, right, right 2.Right, right, up, up, right Both the answers are correct, taking five steps to reach the goal state.

The previous question was very easy because each time you take an action, it does exactly what you expected it to do. Now introduce a little bit of uncertainty into this GridWorld problem. When you execute an action, it executes correctly with a probability of 0.8. This means 80 percent of the time when you take an action, it works as expected and goes up, down, right, or left. But 20 percent of the time, it actually (incorrectly) causes you to move by a right angle. If you move up, there is a probability of 0.1 (10 percent) to go left and 0.1 (10 percent) to go right. Now what is the reliability of your sequence of up, up, right, right, right in getting you to the goal state given?

To calculate it, we need to do some math. The correct answer is 0.32776. Let me explain how this value is computed. From the Start state we need to go up, up, right, right, right. Each of those actions works as it is supposed to do with a probability of 0.8. So: (0.8)5 = 0.32768.Now we have the probability that the entire sequence will work as intended. As you noticed, 0.32768 is not equal to the correct answer 0.32776. It's actually a very small difference of 0.00008. Now we need to calculate the probability of uncertainties, it means we need to calculate the probability that the intended sequence of events will not work as it is supposed to.

Let's go through this again. Is there any way you could have ended up falling into the goal from that sequence of commands by not following the intended path? Actions can have unintended consequences, and they often do. Suppose you are on the start state and you go up in the first step; there is a probability of 0.1 that you will actually go to the right. From there, if you go up, there is again a probability of 0.1 that you will actually go to the right.

From there, the next thing we do is take the right action as per our intended sequence, but that can actually go up with a probability of 0.1. Then another 0.1 to get to the next right action can actually cause an up to happen. And finally, that last right might actually execute correctly with a probability of 0.8 to bring us to the goal state: 0.1X0.1X0.1X0.1X0.8 = 0.00008 ; Now add both of them and you get the correct answer: 0.32668 + 0.00008 = 0.32776 What we did in the first case is to come up with a sequence of up, up, right, right, right where it is sort of planned out what we do in a world where nothing could go wrong; it's actually like an ideal world. But once we introduce this notion of uncertainty or the randomness, we have to do something other than work out in advance what the right answer is, and then just go. Either we have to execute the sequence and once in a while we have to drift away and re-plan to come up with a new sequence wherever it happened to end up or we come up with some way to incorporate these uncertainties or probabilities that we never really have to rethink of in case something goes wrong.

There is a framework that is very common for capturing these uncertainties directly. It is called Markov Decision Process (MDP);

Exploration versus exploitation :

Exploration implies firm behaviors characterized by finding, risk taking, research, search,and improvement; while exploitation implies firm behaviors characterized by refinement,implementation, efficiency, production, and selection.

Exploration and exploitation are major problems when you learn about the environment while performing several different actions (possibilities). The dilemma is how much more exploration is required, because when you try to explore the environment, you are most likely to keep hitting it negative rewards. Ideal learning requires that you sometimes make bad choices. It means that sometimes the agent has to perform random actions to explore the environment. Sometimes, it gets a positive, or sometimes it gets a reward that is less rewarding. The explorationâ€”exploitation dilemma is really a trade-off.

The following are some examples in real life for exploration versus exploitation:

Restaurant selection:

Exploitation: Go to your favorite restaurant
Exploration: Try a new restaurant

Online banner advertisements:

Exploitation: Show the most successful advert
Exploration: Show a different advert

Oil drilling:

Exploitation: Drill at the best-known location
Exploration: Drill at a new location

Game playing:

Exploitation: Play the move you believe is best
Exploration: Play an experimental move

Clinical trial:

Exploitation: Choose the best treatment so far
Exploration: Try a new treatment

Neural network and reinforcement learning

How do neural networks and reinforcement learning fit together? What is the relationship of both these topics? Let me explain it, the structure of a neural network is like any other kind of network. There are interconnected nodes, which are called neurons and the edges that join them together. A neural network comes in layers. The layers are called input layer, the hidden layer and the output layer.

In reinforcement learning, convolutional networks are used to recognize an agent's state.Let's take an example: the screen that Mario is on. That is, it is performing the classical task of image recognition.Don't be confused by a convolutional network with unsupervised learning. It is using different classifications from images in reinforcement learning. On the other hand, in supervised learning, the network is trying to match to an output variable or category to get a label of the image. It is actually getting the label of the image to the pixel:

In supervised learning, it will give the probability of the image with respect to labels. You give it any picture and it will predict in percentages the likelihood of it being a cat or a dog. Shown an image of a dog, it might decide that the picture is 75 percent likely to be a dog and 25 percent likely to be a cat.

Sample Code :

package project;

import burlap.shell.visual.VisualExplorer;

import burlap.mdp.singleagent.SADomain;

import burlap.domain.singleagent.gridworld.GridWorldDomain;

import burlap.domain.singleagent.gridworld.GridWorldVisualizer;

import burlap.domain.singleagent.gridworld.state.GridWorldState;

import burlap.domain.singleagent.gridworld.state.GridLocation;

import burlap.domain.singleagent.gridworld.state.GridAgent;

import burlap.mdp.core.state.State;

import burlap.visualizer.Visualiz

public class HelloWorld

{

public static void main(String[] args)

{

//11x11 grid world

GridWorldDomain gridworld = new GridWorldDomain(11,11);

//layout four rooms

gridworld.setMapToFourRooms();

//transitions with 0.9 success rate

gridworld.setProbSucceedTransitionDynamics(0.9);

//now we will create the grid world domain

SADomain sad= gridworld.generateDomain();

//initial state setup

State st = new GridWorldState(new GridAgent(0, 0), new GridLocation(10,

10, "loc0"));

//now we will setup visualizer and visual explorer

Visualizer vis = GridWorldVisualizer.getVisualizer(gridworld.getMap());

VisualExplorer ve= new VisualExplorer(sad, vis, st);

//now setup the control keys move the agent to "a w d s"

ve.addKeyAction("a", GridWorldDomain.ACTION_WEST, "");

ve.addKeyAction("w", GridWorldDomain.ACTION_NORTH, "");

ve.addKeyAction("d", GridWorldDomain.ACTION_EAST, "");

ve.addKeyAction("s", GridWorldDomain.ACTION_SOUTH, "");

ve.initGUI(); } }

è¦æŸ¥çœ‹æˆ–æ·»åŠ è¯„è®ºï¼Œè¯·ç™»å½•

Steven Murhulaçš„æ›´å¤šæ–‡ç«

Automating AI in the Cloud: MLOps Best Practices for Azure, AWS, and GCP

2025å¹´3æœˆ20æ—¥

Automating AI in the Cloud: MLOps Best Practices for Azure, AWS, and GCP

Introduction Artificial Intelligence (AI) is transforming industries at an unprecedented scale, but its true power isâ€¦
Solving the MLOps Puzzle: How to Optimize Model Deployment in Azure, AWS, and GCP

2025å¹´3æœˆ20æ—¥

Solving the MLOps Puzzle: How to Optimize Model Deployment in Azure, AWS, and GCP

Introduction The world of AI is racing forward, but without a solid deployment strategy, even the most powerful machineâ€¦
Building Resilient Data Pipelines: Stop Firefighting, Start Delivering Value

2025å¹´3æœˆ11æ—¥

Building Resilient Data Pipelines: Stop Firefighting, Start Delivering Value

"Our pipeline broke again. Dashboards are down.

1 æ¡è¯„è®º
From Chaos to Clarity: How Data Lakehouses Are Powering Real-Time Analytics

2025å¹´3æœˆ6æ—¥

From Chaos to Clarity: How Data Lakehouses Are Powering Real-Time Analytics

A Deep Dive Into Kafka, Iceberg, Airflow, and the Future of Streaming Analytics in AWS & GCP ?? Introduction: The Dataâ€¦
DAGs, Snowflake, and the Future of Cloud Data Engineering

2025å¹´3æœˆ4æ—¥

DAGs, Snowflake, and the Future of Cloud Data Engineering

Introduction In todayâ€™s fast-paced digital world, businesses thrive on data-driven decisions. But how do companiesâ€¦
Docker & Kafka on AWS: The Ultimate Guide for Data Engineers

2025å¹´2æœˆ26æ—¥

Docker & Kafka on AWS: The Ultimate Guide for Data Engineers

Introduction Data engineers often face challenges in managing complex data workflows, ensuring environment consistency,â€¦
Beyond Pipelines: Why Most ML Models Fail in Production (And How to Fix It)

2025å¹´2æœˆ24æ—¥

Beyond Pipelines: Why Most ML Models Fail in Production (And How to Fix It)

?? You built an ML model. It works beautifully in your Jupyter notebook.
Your ML Model is Dyingâ€”And You Donâ€™t Even Know It

2025å¹´2æœˆ24æ—¥

Your ML Model is Dyingâ€”And You Donâ€™t Even Know It

The Hidden MLOps Crisis Thatâ€™s Costing Companies Millions You just built an amazing machine learning model. It crushedâ€¦
Why Your Data Models Are Failing: The Hidden Mistakes Youâ€™re Overlooking

2025å¹´2æœˆ21æ—¥

Why Your Data Models Are Failing: The Hidden Mistakes Youâ€™re Overlooking

Have you ever spent weeks fine-tuning your data model only to watch it crash and burn in production? Youâ€™re not aloneâ€¦
From Data Chaos to Cloud Automation: How Apache NiFi Powers Scalable Data Pipelines: A Hands-On Guide for Engineers & Architects

2025å¹´2æœˆ19æ—¥

From Data Chaos to Cloud Automation: How Apache NiFi Powers Scalable Data Pipelines: A Hands-On Guide for Engineers & Architects

Introduction: The Data Movement Challenge in Cloud Environments As organizations increasingly shift to cloud-firstâ€¦

See all articles

Reinforcement learning with Neural Network use case

Steven Murhula

ML Engineer l Data Engineer l Scala l Python l Data Analysis l Big Data Development l SQL I AWS l ETL I GCP I Azure I Microservices l Data Science I Data Engineer I AI Engineer I Architect I Databricks I Java I Sql

Introduction :

Neural network and reinforcement learning

Steven Murhulaçš„æ›´å¤šæ–‡ç«

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

BxD Primer Series: SARSA Reinforcement Learning Models

Your AI Researcher: Exploring AI Through Reinforcement Learning

Reinforcement Learning: Algorithms, Types, and Applications

How Reinforcement Learning helps Decision Making

Challenges of Reinforcement Learning (2022 Guide)

Reinforcement Learning in the Wild and Lessons Learned

Understanding Association Learning for Everyone: The Backbone of Deep Learning and LLM

This Obscure Area of Game Theory can Help to Scale Reinforcement Learning to Infinite Agents

What's New in Deep Learning Research: How Google Builds Curiosity Into Reinforcement Learning?Agents

Partially Observable Deep Reinforcement Learning: Navigating the Unknown with AI

Introduction :

Neural network and reinforcement learning

Steven Murhulaçš„æ›´å¤šæ–‡ç«

Automating AI in the Cloud: MLOps Best Practices for Azure, AWS, and GCP

Solving the MLOps Puzzle: How to Optimize Model Deployment in Azure, AWS, and GCP

Building Resilient Data Pipelines: Stop Firefighting, Start Delivering Value

From Chaos to Clarity: How Data Lakehouses Are Powering Real-Time Analytics

DAGs, Snowflake, and the Future of Cloud Data Engineering

Docker & Kafka on AWS: The Ultimate Guide for Data Engineers

Beyond Pipelines: Why Most ML Models Fail in Production (And How to Fix It)

Your ML Model is Dyingâ€”And You Donâ€™t Even Know It

Why Your Data Models Are Failing: The Hidden Mistakes Youâ€™re Overlooking

From Data Chaos to Cloud Automation: How Apache NiFi Powers Scalable Data Pipelines: A Hands-On Guide for Engineers & Architects

ç¤¾åŒºæ´žå¯Ÿ

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†

BxD Primer Series: SARSA Reinforcement Learning Models

Your AI Researcher: Exploring AI Through Reinforcement Learning

Reinforcement Learning: Algorithms, Types, and Applications

How Reinforcement Learning helps Decision Making

Challenges of Reinforcement Learning (2022 Guide)

Reinforcement Learning in the Wild and Lessons Learned

Understanding Association Learning for Everyone: The Backbone of Deep Learning and LLM

This Obscure Area of Game Theory can Help to Scale Reinforcement Learning to Infinite Agents

What's New in Deep Learning Research: How Google Builds Curiosity Into Reinforcement Learning?Agents

Partially Observable Deep Reinforcement Learning: Navigating the Unknown with AI

å…¶ä»–ä¼šå‘˜ä¹Ÿæµè§ˆäº†