AWS DeepRacer Tips and Tricks: How to build a powerful rewards function with AWS Lambda and Photoshop
AWS DeepRacer, AWS SAM, Machine Learning

AWS DeepRacer Tips and Tricks: How to build a powerful rewards function with AWS Lambda and Photoshop

IVE DeepRacer team is a group of year 1 students from Higher Diploma in Cloud and Data Centre Administration and we were awarded the Champion Peter, 1st Runner-up Eddie, and 5th Bo Zuo Li, 8th and 10th place awards. I am sure no one think they can get any prize before because all of them are just year 1 non-degree students!

No alt text provided for this image
One of the most painful part of training DeepRacer model is to re-implement everything that can be done easily with Python library i.e. Geometry because you cannot import any Python library for the reward function.
It is undifferentiated heavy lifting and don't reinvent the wheel!

Now, let us tell you our secret weapon during AWS Hong Kong DeepRacer League!

Our trick is to move the reward function logic to another applications and we can use all powerful Python library!

Deepracerrewardfunctionapi is a AWS SAM Application with APIGatway and a Lambda function. With AWS Lambda, you can nearly do anything you like inside your reward function. The reward function in DeepRacer console is just work as a proxy to send the data to a Lambda function. A Lambda layer contains all powerful Python Library including NumPysympysklearn, and Pillow.

Before the training, we use Photoshop to draw the target path and add it into the Lambda function source folder and here is the reward function in DeepRacOur reward function.

No alt text provided for this image

Our reward function.

import urllib.request

import urllib.parse

import json

def reward_function(params):

    url = 'https://XXXXX.execute-api.us-east-1.amazonaws.com/Prod/reward/'

    query_string = urllib.parse.urlencode({"json":json.dumps(params)}) 

    url = url + "?" + query_string 

    with urllib.request.urlopen( url ) as response:    

        response_text = response.read().decode('utf-8')

        result = json.loads(response_text)

        return float(result["reward"])

The Most POWERFUL Reward Function

First, we use Pillow to load the path map Image.

No alt text provided for this image

Convert the point from simulation world to image world system, and in above image the black dot represents the position of DeepRacer and it is not next to way point 15.

params = {

       'all_wheels_on_track': True,

       'x': 7,

       'y': 1,

       'distance_from_center': 0,

       'heading': 60,

       'progress': 0,

       'steps': 1,

       'speed': 0.5,

       'steering_angle': 6,

       'track_width': 0.2,

       'waypoints': waypoints,

       'closest_waypoints': [0, 1],

       'is_left_of_center': True,

       'is_reversed': False,

   }

In the map, crop a circle around the DeepRacer,

No alt text provided for this image


and rotate the image to make x-axis becomes the heading direction.

No alt text provided for this image

Extract RGB color points and return the failure reward if the number of color point is less than threshold. Use Linear Regression from sklearn and get a regression line over all color points.

No alt text provided for this image

heading is x-axes, black line is the regression line and orange line represent the current steering angle. Convert the slop of regression line into degree and it is the target direction.

Now, converts all data into sympy Geometry object and all of the calculation becomes very simple. Get the angle between target direction and current steering angle line or the angle between the black line and orange line. Positive means needs to turn left, and negative means needs to turn right.

Reward consists of 3 components:

  1. Distance Reward – it uses NumPy and sympy to calculate the perpendicular distance of DeepRacer from the Regression Line and reduce the value over Gaussian distribution as smoothing.
  2. Speed Reward - speed / max_speed * 100
  3. Track Reward

Track reward consists of 3 color rules. Green is flavour for straight forward, Blue penalties right turn, and Red promotes tune left.

Final reward gives addition marks according to progress.

Please note that this is not the our actual re-ward function as we just want to share the trick but not the model!

Degree or not doesn't matter and you can be strong in AWS from any background!
Everyone can be hard work and be creative, then becomes a winner!

Remark:

Beware of the Lambda Deployment package size limit is 250 MB (unzipped, including layers) and if you need to add more library, then you just dockerize your code and wrap it inside an Fargate web application with CDK.

About IVE DeepRacer Team


Nikhil Chowdary Gutlapalli

Software Engg Co-op at TeraDAR || MS in Robotics @ Northeastern

4 年

How to get the map of the track with coordinate points?

Kevin R.

Customer Engineer at Google, Infrastructure Modernization.

5 年

Is the is_reversed still a valid parameter?? ?Building my first model, but I don't see that in the documentation list???

回复
Coda Wong

UI and Multimedia Designer

5 年

Thanks for your sharing

Wuming Zhang

Cloud Architect & Advisor | FinOps Professional/Ambassador | AWS Certification SME | CCSP

5 年

well, u could have used paint...? ??

要查看或添加评论,请登录

Wong Chun Yin,Cyrus (黃俊彥)的更多文章

社区洞察

其他会员也浏览了