Image Recognition with Transfer Learning
Abstract
This article explores the concept of transfer learning in machine learning. Transfer learning involves utilizing knowledge acquired from solving one problem and applying it to a different but related problem. Instead of starting the learning process from scratch for each new task, this technique allows models to leverage previously learned representations or features. By doing so, models can benefit from existing knowledge and improve their performance on new tasks that work correctly for different but related problems.
The full source code is in my GitHub repository.
Introduction
Studying machine learning at Holberton School, I was assigned a task to develop a model to classify the CIFAR-10 dataset. CIFAR-10 is a commonly used dataset that consists of 32x32 color images of 10 different classes with 6000 images each. Said model should have a validation accuracy of 87% or higher, that means, on a set of images that the model has not seen yet, it has to predict at least 87% of that validation dataset. This assignment also involves preprocessing the data for my model, which includes the training set that consists of the input data for the network and the labels the model has to attempt to predict.
The overall idea is to use another previously trained model to achieve better results on my model. So I would have to transfer the learning from another network to mine in order to improve its complexity and achieve satisfactory results.
Materials and Methods
First thing I did was code the function to preprocess the data. For this I took advantage of the Keras ResNet-50 application for preprocessing the CIFAR-10 images. What this will do is normalize the input images for this used neural network. Then the expected outputs or labels for CIFAR-10 should be easier for the model to interpret, so for that I used the Keras ‘to_categorical’ function to turn the labels into arrays that use the One Hot encryption. Here’s what the preprocess function looks like (TensorFlow’s Keras is imported as K):
def preprocess_data(X, Y):
????X_p = K.applications.resnet50.preprocess_input(X)
????Y_p = K.utils.to_categorical(Y, 10)
????return X_p, Y_p:
Now that the dataset can be preprocessed for further training, I began to build the model. The correct approach when building image recognition neural networks is to follow a? convolutional neural network architecture. But first we must be sure that the model takes the matrices correctly as input, so for that the first layer of our model must be a resizing layer for the input. Keras provides a ‘Lambda’ layer that allows you to apply a custom function to the input data. In this case, a lambda function defined inline will resize the images with the Keras backend method ‘resize_images’. It will take the input image tensor x scaling factors of 2 in both the height and width dimensions like this:
model.add(K.layers.Lambda(lambda x: K.backend.resize_images(x, 2, 2, 'channels_last'), input_shape=(32, 32, 3)))
Now with this, I had to use another neural network, in this case R that has been trained to recognize images. The idea here is to take advantage of already trained models to do different tasks without leaving behind the already achieved learning. ResNet-50 means this model is part of the family of Residual Networks featuring 50 convolutional layers along with other layers like pooling, fully connected and shortcut connections. As this model is really convenient to use I am testing its usage by adding new layers and the objective to classify CIFAR-10. I also had to freeze layers of ResNet-50 that were already suitable for my model given that I don’t want the useful pieces of that model to be modified.
领英推荐
This technique of using a neural network already trained to create a new network is called transfer learning. This is how I applied ResNet-50 to my network right after the Lambda layer:
input_t = K.Input(shape=(32, 32, 3)
resnet50 = K.applications.ResNet50(include_top=False, input_tensor=input_t)
for layer in resnet50.layers[:143]:
?? layer.trainable = False
model.add(K.layers.Lambda(lambda x: K.backend.resize_images(x, 2, 2, 'channels_last'), input_shape=(32, 32, 3)))
model.add(resnet50)
What a convolutional neural network does is take an image as input, this being a matrix that contains each pixel; the network will convolve the image in order to extract important features of the image, such as the shapes within the images. Think about it as how humans see things, we don’t necessarily need to look at every feature or every color in the image, we might just need the silhouette of something to know what it is, so does a machine if you give it the right tools.
I took care of adding some regularizers and extra features such as Batch Normalization to achieve a good training speed and stability, Dropout regularization to help the model adapt to unseen data. As the purpose of this model is to classify, I chose to use the RMSprop optimizer with categorical cross entropy loss.
Here’s the code of the full architecture and compilation of the model:
model = K.models.Sequential(
input_t = K.Input(shape=(32, 32, 3))
resnet50 = K.applications.ResNet50(include_top=False, input_tensor=input_t)
for layer in resnet50.layers[:143]:
????layer.trainable = False
model.add(K.layers.Lambda(lambda x: K.backend.resize_images(x, 2, 2, 'channels_last'), input_shape=(32, 32, 3)))
model.add(resnet50)
model.add(K.layers.Flatten())
model.add(K.layers.BatchNormalization())
model.add(K.layers.Dense(256, activation='relu'))
model.add(K.layers.Dropout(0.5))
model.add(K.layers.BatchNormalization())
model.add(K.layers.Dense(128, activation='relu'))
model.add(K.layers.Dropout(0.5))
model.add(K.layers.BatchNormalization())
model.add(K.layers.Dense(64, activation='relu'))
model.add(K.layers.Dropout(0.5))
model.add(K.layers.BatchNormalization())
model.add(K.layers.Dense(10, activation='softmax'))
model.compile(optimizer=K.optimizers.RMSprop(lr=2e-5),? loss='categorical_crossentropy', metrics=['accuracy'])
)
For the training I chose to add a callback that monitors the validation accuracy? and saves the best results during training. I chose to train it through 15 epochs with a batch size of 512.
Results
After some hours of training I obtained successful results with roughly 90% accuracy which is higher than requested. In the end it was an interesting experiment to use a previously trained model to create my own. It could even achieve a better outcome since it uses regulators like Dropout that are excellent features to prevent overfitting so it can be trained for several hours longer.
Discussion
This is none other than an excellent showcase of how many powerful and creative models of machine learning can be made and the best part is you don’t even need to create it all from scratch and do all sorts of stuff. Even with my model you can create another model for other tasks even more complex than classifying CIFAR-10.