AWS DeepRacer Models For Beginners
Introduction
In the?previous part, we have presented a brief description of cloud computing, machine learning, and reinforcement learning with AWS DeepRacer. We have learned about the specifications and theoretical knowledge to build a model, select the hyperparameters, and write a reward function using the environment parameters.
This part will present a step-by-step manual on how to create, train, and evaluate a model on AWS DeepRacer. Moreover, we will explain how to read and interpret the performance of a model during the training and evaluation phases.?
Create a model?
The first step to start with reinforcement learning on DeepRacer is creating a model. To start, we go to the?AWS Console?and type DeepRacer on the search bar as follows:?
From the DeepRacer console, select “Create model”.
Another option is to use the side menu bar in the DeepRacer console and select "Your mod”, and then select "Create model".
Step 1: Specifying the model name and environment
On the "Create model" page, we have to enter a name for the model under the training details. We can also add training job descriptions, but it is optional. Visit the?Tagging page?to learn more about tags.
The next part is to select a racing track as a training environment. A training environment specifies the conditions that the agent will be trained with.?There are many shapes of tracks that can be used as a training environment which varies in complexity. As beginners, we can start with a simple track that consisted of basic shapes and smooth turns. We can gradually increase the track complexity when we become more familiar with DeepRacer.
Step 2: Choosing race type and training algorithm
After naming the model, we need to select a race type that we will train the model upon. AWS offers three different types of races. The "Time trial" is the easiest race type where it only considers completing the track in the least amount of time. In the "Object avoidance" race, we aim to complete the track while avoiding random static objects placed along the track. Finally, "Head-to-head" racing is the most challenging where we will face moving objects along the race, which are other players racing on the same track.
We will consider the time trail race as the selected option for the remaining instructions. Once the race type is selected, we need to choose the training algorithm. DeepRacer provides two different types of training algorithms, Proximal Policy Optimization (PPO) and Soft Actor-Critic (SAC). We can select PPO algorithms for continuous or discrete action spaces, while SAC is only for continuous action spaces. You can learn more about the training algorithms from DeepRacer documentation.?
Besides the training algorithms, we need to select the hyperparameters (see this link for more details about hyperparameters). You can watch this?video?to learn more about how to choose a proper hyperparameter for our model.
Step 3: Defining Action Space
Action space specifies the actions the agent can take inside the environment. For DeepRacer, it means in the range of the steering angle and speed of the vehicle. There are two types of action spaces, discrete and continuous. The discrete action space allows us to select a discrete number of angles that the agent can steer and speeds the car can aim for at a single state. The final action space is the steering angles, and speed of every possible action based on the values specified earlier. The following figure shows six possible actions that the agent can take at a single state of the car.
Unlike discrete action space, continuous action space does not have a discrete number of actions. We can only specify the minimum and maximum for both the steering angle and the speed. The following figure shows how does the continuous action space look.
Step 4: Choose vehicle
Here we can specify the vehicle shell and sensor configuration that will run on the track. DeepRacer provides two agents by default with different specifications. We can also create our own custom agents and set the configuration, such as the camera type and adding the LIDAR sensor. The original DeepRacer vehicle was selected for the model, as shown in the following figure.?
Step 5: Customising reward function
The last step in creating a model is to choose a reward function and training time. AWS provides some simple examples of reward functions. We can either select a pre-existing reward function from the examples and modify it or create our own reward function from scratch.
Finally, depending on the track and reward function, we need to choose a perfect training time.?
领英推荐
Also, we select the maximum time we want our model to train. It helps to monitor the cost of the training by setting a stopping time. Usually, the more a model is trained, the better it performs. But it can also cause overfitting where the model is trained in a level that can not be generalized in testing environments. Therefore, selecting the proper training time is crucial for the model’s performance. You can learn more about overfitting and underfitting?in this link.?
Finally, before we press on “create model”, there is an option that allows submitting the model in the DeepRacer league automatically after completion of training. After confirming the choices, we can click the create model. AWS will start building the model and start the training process for the selected amount of time.
Training Analysis
AWS DeepRacer utilises?SageMaker?to train the model behind the scenes and leverages ?RoboMaker?to simulate the agent's interaction with the environment.?
Once you've submitted your training task, please wait for it to be initialised and then executed. To change the state from "Initialising" to "In Progress", the initialisation process takes roughly 6 minutes.
To track the progress of your training job, look at the Reward graph and Simulation video stream as shown in the below figure.
You can refresh the Reward graph by using the refresh button next to it until the training job is completed. In the chart generated, it shows three lines:
Each of these metrics is an indicator of how the model is performing during the training phase. You can learn more about the reward graph in this video.
The following video shows a training demo where it shows how the model is learning over time by trial-and-error at each run.??
Evaluation?
Once the training gets completed, it allows you to evaluate the model. When the training task is finished, a model is ready. If the training isn't finished, the model can be in a Ready state if trained up to the point of failure.
Step 1:?Click on Start evaluation as shown below.
Step 2:?Choose a track under Evaluation criteria on the Evaluate model page's Evaluate criteria section. You can choose a track to evaluate that you have used while training a model. You can evaluate your model on any track. However, selecting the most similar track to the one used in training will yield the best results.
Step 3:?Choose the racing type you used to train the model under Race type on the Evaluate model page.
Step 4:?Turn off the Submit model after evaluation option on the Evaluate model page under Virtual Race Submission for your initial model. Leave this option enabled later if you want to participate in a racing event. Then choose Start evaluation on the Evaluate model page to begin generating and initialising the evaluation job. It takes roughly 3 minutes to complete the startup process.
Step 5:?The evaluation outcome, including the trial time and track completion rate, are displayed under "Evaluation" after each trial as the evaluation advances. You may watch how the agent performs on the chosen track in the simulation video stream window. For the particular evaluation below, each trial was finished because the car was off track. This means that we need to have a better training model to be able to complete the race.
The following video shows a demo of the evaluation process.?As you can see all three races will finish with "off track" which means the agent didn't finish the race. This is the indication that we need a better reward function and longer training time for the agent to learn.
Recap
In this part of the series, we have followed a step-by-step guide to train and evaluate a model on AWS DeepRacer. We have learned how to select the most appropriate option while creating, training and evaluating the model.?
What is next?
In the following part of the series, we will explain four different reward functions in detail. We will show how we logically build a model and select the hyperparameters.
Acknowledgement
This article is prepared by students at AWS Academy@Western Sydney University.