NVIDIA Mixed Precision - Loss & Accuracy - Part 2
Andrew Antonopoulos
Senior Solutions Architect at Sony Professional Solutions Europe
Part 1 explained how Nvidia's mixed precision can help reduce power consumption. However, we also need to consider accuracy and loss. Accuracy and loss are the two most well-known and discussed metrics in machine learning.
Accuracy is a method for measuring a classification model’s performance. It is the count of predictions where the predicted value equals the true value. Accuracy is often graphed and monitored during the training phase, though the value is often associated with the overall or final model accuracy.
A loss function, also known as a cost function, takes into account the probabilities or uncertainty of a prediction based on how much it varies from the true value.
Loss is a summation of the errors made for each sample in training or validation sets. The goal during the training process is to minimise this value. Unlike accuracy, loss may be used in both classification and regression problems.
Most of the time, we observe that accuracy increases with the decrease in loss, but this is not always the case. Accuracy and loss have different definitions and measure different things.
A new test has been performed by using the following hyper-parameters for the benchmarking:
Benchmarking
After the benchmarking model training completion, the accuracy graph was the following:
and the loss graph:
Additionally, the following image presents the same results but for each epoch (up to 25th):
The same dataset and hyper-parameters were used for the experiment, but the main difference between the two tests is the usage of mixed precision.
Experiment
For the experiment testing, the accuracy graph was the following:
and the loss graph:
领英推荐
the detailed accuracy and loss per epoch was the following:
Additionally, when we fit an ML model and use validation split, the data is split into two parts for every epoch: training and validation data. The model is trained on training data and validated on validation data by checking its loss and accuracy for the training data, and validation loss and validation accuracy for the validation data.
By comparing the results for the training data, we can see that the loss is slightly better when using mixed precision (0.094318 for mixed precision and 0.094806 for 32-bit floating point). Conversely, accuracy is slightly better when using a 32-bit floating point (0.996538 for 32-bit floating point and 0.996030 for mixed precision).
However, with the validation data, the validation loss and validation accuracy are better when using a 32-bit floating.
The learning rate (the last column in the above tables) directly influences model convergence, stability, and overall performance metrics such as accuracy and loss. In both cases, the same learning scheduler was used. At the beginning of every epoch, the callback gets the updated learning rate value from the schedule function, with the current epoch and current learning rate, and applies the updated learning rate to the optimiser.
To achieve maximum performance, “learning rate decay” was used, which means decreasing the learning rate as we iterate during the training process. In this way, we get a faster learning algorithm without the risk of our algorithm not converging to a minimum loss value.