Building a Traffic sign classifier using TensorFlow with Convolutional neural networks

Building a Traffic sign classifier using TensorFlow with Convolutional neural networks

In this post I’ll describe my experience training a model for classifying traffic signs using Deep learning and TensorFlow, along with some emphasis and recommendations. It suitable for readers with some knowledge in Python, TensorFlow and machine learning.

Introduction

Neural networks is a beautiful biologically-inspired programming paradigm which enables a computer to learn from observational data. Deep learning is a powerful set of techniques for learning in neural networks and deep learning, currently provide the best solutions to many problems, including image recognition.

A convolutional neural networks (CNN) technique consist of several layers with different filters, where each one pick up different qualities of a patch. The subsequent layers tend to be higher levels in the hierarchy and generally classify more complex ideas, while eventually the CNN classifies the image by combining the larger, more complex objects, grouping together adjacent pixels and treating them as a collective. The CNN learns all of this on its own and also helps us with translation invariance and gives us smaller, more scalable model.

Data Set and Exploration

The dataset I used for this project come from the German Traffic Sign Dataset, which is one of the largest traffic datasets available online.

The German Traffic Sign provided as the Dataset is a 4D array containing raw pixel data of the traffic sign images, (num examples, width, height, channels). 'labels' is a 2D array containing the label/class id of the traffic sign, along with name mappings for each id.

I used the os library to load the sample images for the 'Train', 'Validation' and 'Test' data. With pandas library I calculated summary statistics of the traffic signs data set:

  • Number of training examples = 34799
  • Number of validation examples = 4410
  • Number of testing examples = 12630
  • Image data shape = (32, 32, 3)
  • Number of classes = 43

Here is a sample of the dataset, displaying the first image from each class:






The images quality is not so good, but there are a variety of angles and lighting conditions for each class.

This are the exploratory visualization of the 'Train and 'Test' data set. The bar chart showing distribution of the images across the classes (different traffic signs):

Pre-process the Data Set

After inspecting the data, I noticed that the distribution of signs between classes is very high, and the variance gets up from 210 samples for the lower class to 2250 samples for the highest one.

I decided to generate additional data in order to raise the number of dataset samples by proactively perform some data augmentation (in deep learning more data is always good) and some data preprocessing.

Data augmentation can be achieved through several approaches aimed at exploiting the data we already have by augmenting it, here are some ways to do that:

  1. Rotating our data: We could perform random rotations of our images, possibly drawing the rotation angle from a normal distribution to avoid biasing the newly created images.
  2. Flipping and cropping images (preserving the relevant parts of the image)
  3. Adding noise or filter.


Data preprocessing can be achieved by the following Techniques:

  1. Normalization
  2. Standardization
  3. Brightness augmentation
  4. Histogram Equalization
  5. PCA/ZCA whitening

I Counted the lower & upper bounds of each class in order to multiply each class images with respective to the number of original count, to raise the amount of training samples.

Using cv2 library to create perspective transform and rotation for the new augmented images.

All the images were also transformed to grayscale, since I noticed the accuracy of the model was higher this way, and of course it also shorten the model runtime. From shape (32,32,3) to (32,32,1)

Here is a sample of the grayscale dataset, displaying the first image from each class:

Normalize the data to get higher validation and training accuracy (it accelerates the convergence). The new value for each image will transform from [0,255] to [0,1]

Divide the data into train (80%) and test (20%), and shuffled randomly each one of them.

After the preprocessing, the data set distribution is as follow:

  • Number of training examples = 71081
  • Number of validation examples = 17769
  • Number of testing examples = 12630
  • Image data shape = (32, 32, 1)
  • Number of classes = 43

Here is the visualization bar chart showing the distribution of the images across the classes before and after the process:

Model Architecture

My model consisted of the following layers:


Model training

To train the model, I used LeNet architecture as a baseline, with additional 2D Convolution layer and a couple of parameter tuning. Batch size was set to 100, and I also used 100 epochs. The learning rate (0.001), mean (0) and sigma (0.1) were left with their default LeNet values.

Model training is quite an art. I started with LeNet architecture and the train accuracy wasn't high enough, so I start with changing the default parameters of the batch and epochs which improved the results. I applied grayscale on the dataset which give some better results and also the model is much faster. After adding more images to the dataset (in the preprocessing step) I also observed better results. Adding the dropout function after each layer also improving the accuracy ('Max pooling' steps doesn’t need it since it already performing dropout) ReLU activation function produced better results than sigmoid. I found that splitting the LeNet first Convolution layer to two 2D Convolution layers helps getting higher accuracy.

The following elements can be implemented and tuned in order to improve Algorithm performance:

  1. Diagnostics.
  2. Weight Initialization.
  3. Learning Rate.
  4. Activation Functions.
  5. Network Topology.
  6. Batches and Epochs.
  7. Regularization.
  8. Optimization and Loss.
  9. Early Stopping.


Model Evaluation

I left the learning rate and the Mean as the LeNet default values since after trying to change them a little bit I didn't get better results. There are lots options to change parameters in the Model - padding, stride, filters, connected shapes, weight and bias initialization and more. I tuned them many times till I got results to my satisfaction. AdamOptimizer was set instead of the GradientDescentOptimizer since its using 'momentum'. I used TensorFlow softmax_cross_entropy_with_logits function to Measures the probability error


Training the Model

Training with TensorFlow, using Batch size of 100 and 60 Epochs

Training was performed on an Amazon g2.2xlarge GPU server, and it took about 16 minutes.

My final model results were:

  • training set accuracy of 96.2%
  • validation set accuracy of 95.2%
  • test set accuracy of 93.4%






The validation accuracy become balanced at around the 60 epoch, so in order to avoid overfitting and save training time I reduced the number of epochs from 100 to 60.

A low accuracy on the training and validation sets imply underfitting. A high accuracy on the training set but low accuracy on the validation set implies overfitting.


Testing the Model on New Images

To get more insight into how the model is working, I downloaded 10 pictures of German traffic signs from the web and used the model to predict the traffic sign type.

Load and Output the Images

Here are 10 German traffic signs that I found on the web:








Some of the image (image:0, and image:5) might be difficult to classify because they have two signs combined in each of them. The classifier might label it as one of the signs or none of them. The third image (image:2) has a sign that is not part of the train dataset classes, meaning the classifier is lack of information about it and probably will not label it right. The other images has the stick of the sign occupies a large part of the picture, and since the images needs to get through the same preprocessing stage (grayscale, normalization, resizing to 32x32 etc.) before running the model on them, the sign size in the images might be significantly smaller than the train dataset.

Also, the images background and signs brightness along with their rotation angles can be challenging.

Model's predictions on the New Images

Here are the results of the prediction:

The model was able to correctly predict 5 of the 10 traffic signs, which gives an accuracy of 50%. This compares favorably to the accuracy on the test set of 93.4%. The model didn't perform well on half of the new images.

The images that were not included at all in the training dataset (no suitable class) were labled incorrectly. The other images were processed differently (angles, cropped etc.) so the model also failed to classify some of them correctly. I noticed that the model classify 20% og the images as 'Dangerous curve', while one of this images is understandable (image:3 , Double curve) while it's quite similar, while I was disappointed from image:5 (Speed limit 30) that the model failed on.

Softmax probabilities prediction

We are looking at the softmax probabilities for each prediction to display how certain the model is when predicting each of the images. Below we can see the visualizations results of the top 5 softmax probabilities for each image along with the sign type of each probability:

For almost all the images the predictor was very certain with probability of more than 50%, even when the results were wrong.

I noticed that for some of the images - I got different predictions each time I ran the model (also with high probability on them) which is quite strange

For the images that were predict correctly we can observe probability of more than 90% which is pretty satisfying.

Augmenting the training set definitely help improve model performance. I used rotation and translation as data augmentation techniques, after searching the web for new images I noticed that it's important to use also zoom, flips and color perturbation.

The source code can be found here. The Jupyter notebook running with Python 3.5, and using the following libraries

In the next post I'll describe more advanced techniques to address this challenge


Thanks for reading,

Shmulik Willinger

要查看或添加评论,请登录

社区洞察

其他会员也浏览了