Face Recognition with VGG16 Transfer Learning
MLOps Task : 4 Task Description :
Create a project using transfer learning solving various problems like Face Recognition, Image Classification, using existing Deep Learning models like VGG16, VGG19, ResNet
Face Recognition : Face recognition is a method of identifying or verifying the identity of an individual using their face. Face recognition systems can be used to identify people in photos, video, or in real-time.
Challenges when we creating a model in deep learning .
- Lots of Data Needed to Train Model
- Lots of Computing Power Needed
Transfer Learning
Adding new objects to pre-trained models without starting the model creation from the beginning. It's take lot of Resources to train a model so, we use already trained weight to train our model .
- In transfer Learning the Convolution Layer of the pre-trained model is freezed and convolution layer is not in the part of training the model in Transfer Learning.
- We need to add a Dense layer before the layer with activation function or we can retrain the layer before activation function to predict for the object we added.
VGG16:
VGG16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. It was one of the famous model submitted to ILSVRC-2014. It makes the improvement over AlexNet by replacing large kernel-sized filters (11 and 5 in the first and second convolutional layer, respectively) with multiple 3×3 kernel-sized filters one after another. VGG16 was trained for weeks and was using NVIDIA Titan Black GPU’s.
ARCHITECTURE :
The input to cov1 layer is of fixed size 224 x 224 RGB image. The image is passed through a stack of convolutional (conv.) layers, where the filters were used with a very small receptive field: 3×3 (which is the smallest size to capture the notion of left/right, up/down, center). In one of the configurations, it also utilizes 1×1 convolution filters, which can be seen as a linear transformation of the input channels (followed by non-linearity). The convolution stride is fixed to 1 pixel; the spatial padding of conv. layer input is such that the spatial resolution is preserved after convolution, i.e. the padding is 1-pixel for 3×3 conv. layers. Spatial pooling is carried out by five max-pooling layers, which follow some of the conv. layers (not all the conv. layers are followed by max-pooling). Max-pooling is performed over a 2×2 pixel window, with stride 2.
Three Fully-Connected (FC) layers follow a stack of convolutional layers (which has a different depth in different architectures): the first two have 4096 channels each, the third performs 1000-way ILSVRC classification and thus contains 1000 channels (one for each class). The final layer is the soft-max layer. The configuration of the fully connected layers is the same in all networks.
All hidden layers are equipped with the rectification (ReLU) non-linearity. It is also noted that none of the networks (except for one) contain Local Response Normalisation (LRN), such normalization does not improve the performance on the ILSVRC dataset, but leads to increased memory consumption and computation time.
Data Collection :
We have two main directories inside the dataset folder of for Training and Testing. In both training and testing we have three diCollected a dataset of images with haarcascade_frontalface_default.xml pre-trained model. haarcascade is the model to detect the face .
# Sample Data Collection... import cv2 import numpy as np # Load HAAR face classifier face_classifier = cv2.CascadeClassifier('haarcascade_frontalface_default.xml') # Load functions def face_extractor(img): # Function detects faces and returns the cropped face # If no face detected, it returns the input image gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) faces = face_classifier.detectMultiScale(gray, 1.3, 5) if faces is (): return None # Crop all faces found for (x,y,w,h) in faces: cropped_face = img[y:y+h, x:x+w] return cropped_face # Initialize Webcam cap = cv2.VideoCapture(0) count = 0 # Collect 100 samples of your face from webcam input while True: ret, frame = cap.read() if face_extractor(frame) is not None: count += 1 face = cv2.resize(face_extractor(frame), (224, 224)) #face = cv2.cvtColor(face, cv2.COLOR_BGR2GRAY) # Save file in specified directory with unique name. Here I am saving images captured in folder named charan. file_name_path = './dataset/charan/face' + str(count) + '.jpg' cv2.imwrite(file_name_path, face) # Put count on images and display live count cv2.putText(face, str(count), (50, 50), cv2.FONT_HERSHEY_COMPLEX, 1, (0,255,0), 2) cv2.imshow('Face Cropper', face) else: print("Face not found") pass if cv2.waitKey(1) == 13 or count == 100: #13 is the Enter Key break cap.release() cv2.destroyAllWindows() print("Collecting Samples Complete")
Data Set :
Dataset contain 3 different classes to predict of Siva , Bharath and Charan. Split the images from this folder as 80% for Training and 20% for Testing.
Model :
Loading the VGG16 Model
from keras.applications import VGG16 # VGG16 was designed to work on 224 x 224 pixel input images sizes img_rows = 224 img_cols = 224 #Loads the VGG16 model model = VGG16(weights = 'imagenet', include_top = False, input_shape = (img_rows, img_cols, 3))
got the weights of the model from imagenet.
Inpsecting each layer
Every Layer is trainable .Let's check
# Let's print our layers for (i,layer) in enumerate(model.layers):
print(str(i) + " "+ layer.__class__.__name__, layer.trainable)
Let's freeze all layers except the top 4
We need to freeze layers to avoid them to train again Here.
from keras.applications import VGG16 # VGG16 was designed to work on 224 x 224 pixel input images sizes img_rows = 224 img_cols = 224 # Re-loads the VGG16 model without the top or FC layers model = VGG16(weights = 'imagenet', include_top = False, input_shape = (img_rows, img_cols, 3)) # Here we freeze the last 4 layers # Layers are set to trainable as True by default for layer in model.layers: layer.trainable = False # Let's print our layers for (i,layer) in enumerate(model.layers):
print(str(i) + " "+ layer.__class__.__name__, layer.trainable)
Lets make a function that return our FC head
In this function we separated the top layer of pre-trained VGG16 model for adding the new task which we need to perform .
def addTopModel(bottom_model, num_classes, D=256): """creates the top or head of the model that will be placed ontop of the bottom layers""" top_model = bottom_model.output top_model = Flatten(name = "flatten")(top_model) top_model = Dense(D, activation = "relu")(top_model) top_model = Dropout(0.3)(top_model) top_model = Dense(num_classes, activation = "softmax")(top_model
return top_model
Let's add our FC Head back onto VGG
After adding our new task we need to again attach the head to pre-trained model.
from keras.models import Sequential from keras.layers import Dense, Dropout, Activation, Flatten from keras.layers import Conv2D, MaxPooling2D, ZeroPadding2D from keras.layers.normalization import BatchNormalization from keras.models import Model # I want to detect faces with 3 different classes num_classes = 3 FC_Head = addTopModel(model, num_classes) modelnew = Model(inputs=model.input, outputs=FC_Head)
print(modelnew.summary())
.
Model: "model_2" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_5 (InputLayer) (None, 224, 224, 3) 0 _________________________________________________________________ block1_conv1 (Conv2D) (None, 224, 224, 64) 1792 _________________________________________________________________ block1_conv2 (Conv2D) (None, 224, 224, 64) 36928 _________________________________________________________________ block1_pool (MaxPooling2D) (None, 112, 112, 64) 0 _________________________________________________________________ block2_conv1 (Conv2D) (None, 112, 112, 128) 73856 _________________________________________________________________ block2_conv2 (Conv2D) (None, 112, 112, 128) 147584 _________________________________________________________________ block2_pool (MaxPooling2D) (None, 56, 56, 128) 0 _________________________________________________________________ block3_conv1 (Conv2D) (None, 56, 56, 256) 295168 _________________________________________________________________ block3_conv2 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_conv3 (Conv2D) (None, 56, 56, 256) 590080 _________________________________________________________________ block3_pool (MaxPooling2D) (None, 28, 28, 256) 0 _________________________________________________________________ block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160 _________________________________________________________________ block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808 _________________________________________________________________ block4_pool (MaxPooling2D) (None, 14, 14, 512) 0 _________________________________________________________________ block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808 _________________________________________________________________ block5_pool (MaxPooling2D) (None, 7, 7, 512) 0 _________________________________________________________________ flatten (Flatten) (None, 25088) 0 _________________________________________________________________ dense_3 (Dense) (None, 256) 6422784 _________________________________________________________________ dropout_2 (Dropout) (None, 256) 0 _________________________________________________________________ dense_4 (Dense) (None, 3) 771 ================================================================= Total params: 21,138,243 Trainable params: 6,423,555 Non-trainable params: 14,714,688 _________________________________________________________________
None
Loading our Dataset
Load the dataset which we collected for face Recognition.
from keras.preprocessing.image import ImageDataGenerator train_data_dir = '/content/drive/My Drive/dataset/Train/' validation_data_dir = '/content/drive/My Drive/dataset/Val/' train_datagen = ImageDataGenerator( rescale=1./255, rotation_range=20, width_shift_range=0.2, height_shift_range=0.2, horizontal_flip=True, fill_mode='nearest') validation_datagen = ImageDataGenerator(rescale=1./255) # Change the batchsize according to your system RAM train_batchsize = 16 val_batchsize = 10 train_generator = train_datagen.flow_from_directory( train_data_dir, target_size=(img_rows, img_cols), batch_size=train_batchsize, class_mode='categorical') validation_generator = validation_datagen.flow_from_directory( validation_data_dir, target_size=(img_rows, img_cols), batch_size=val_batchsize, class_mode='categorical',
shuffle=False)
Training our top layers
In this step we are going to train the model.
from keras.optimizers import RMSprop from keras.callbacks import ModelCheckpoint, EarlyStopping checkpoint = ModelCheckpoint("face_recog.h5", monitor="val_loss", mode="min", save_best_only = True, verbose=1) earlystop = EarlyStopping(monitor = 'val_loss', min_delta = 0, patience = 3, verbose = 1, restore_best_weights = True) # we put our call backs into a callback list callbacks = [earlystop, checkpoint] # Note we use a very small learning rate modelnew.compile(loss = 'categorical_crossentropy', optimizer = RMSprop(lr = 0.001), metrics = ['accuracy']) nb_train_samples = 1190 nb_validation_samples = 170 epochs = 3 batch_size = 16 history = modelnew.fit_generator( train_generator, steps_per_epoch = nb_train_samples // batch_size, epochs = epochs, callbacks = callbacks, validation_data = validation_generator, validation_steps = nb_validation_samples // batch_size)
modelnew.save("face_recog.h5")
.
Epoch 1/3 74/74 [==============================] - 667s 9s/step - loss: 1.5254 - accuracy: 0.7796 - val_loss: 1.2937 - val_accuracy: 0.9400 Epoch 00001: val_loss improved from inf to 1.29368, saving model to face_recog.h5 Epoch 2/3 74/74 [==============================] - 659s 9s/step - loss: 0.2183 - accuracy: 0.9333 - val_loss: 1.5331 - val_accuracy: 0.7700 Epoch 00002: val_loss did not improve from 1.29368 Epoch 3/3 74/74 [==============================] - 655s 9s/step - loss: 0.1356 - accuracy: 0.9544 - val_loss: 6.5446e-06 - val_accuracy: 0.9800
Epoch 00003: val_loss improved from 1.29368 to 0.00001, saving model to face_recog.h5
Our model will save in face_recog.h5
Prediction:
Let's predict from the model we created.
import os os.system("tput setaf 34") print("WAIT TO ENTER INTO PREDICTION STEP .....") os.system("tput setaf 8") from keras.models import load_model from keras.preprocessing import image import numpy as np model = load_model('face_recog.h5') os.system("tput setaf 27") print("Copy and Paste the image HERE to predict .....") while True : os.system("tput setaf 11") file = input("Enter the absolute path to photo :") img_width, img_height = 224, 224 img = image.load_img(file , target_size = (img_width, img_height)) img = image.img_to_array(img) img = np.expand_dims(img, axis = 0) pred = model.predict(img) if pred[0][0] == 1.0 : print('You are Bharath') elif pred[0][1] == 1.0 : print('You are Charan.') else : print('You are Siva') enter = input("Enter 0 to exit :") if enter == 0 :
exit()
When we run this python file. I t will ask to provide the path of the image to predict.
I gave the Image of charan for prediction It predicted correct.
I did this task a month back just wrote a article with detail. Link to my post
GitHub Link:
Thank you Vimal Daga Sir.
Done this task under guidance of Vimal Daga Sir.In training of MLOps by Linux World Informatics Pvt Ltd.