登录查看更多内容

Face Recognition with VGG16 Transfer Learning

Siva Naik Kethavath

DevOps Engineer | MLOps | DataOps | Founding Engineer

发布日期: 2020年6月27日

MLOps Task : 4 Task Description :

Create a project using transfer learning solving various problems like Face Recognition, Image Classification, using existing Deep Learning models like VGG16, VGG19, ResNet

Face Recognition : Face recognition is a method of identifying or verifying the identity of an individual using their face. Face recognition systems can be used to identify people in photos, video, or in real-time.

Challenges when we creating a model in deep learning .

Lots of Data Needed to Train Model
Lots of Computing Power Needed

Transfer Learning

Adding new objects to pre-trained models without starting the model creation from the beginning. It's take lot of Resources to train a model so, we use already trained weight to train our model .

In transfer Learning the Convolution Layer of the pre-trained model is freezed and convolution layer is not in the part of training the model in Transfer Learning.
We need to add a Dense layer before the layer with activation function or we can retrain the layer before activation function to predict for the object we added.

VGG16:

VGG16 is a convolutional neural network model proposed by K. Simonyan and A. Zisserman from the University of Oxford in the paper “Very Deep Convolutional Networks for Large-Scale Image Recognition”. The model achieves 92.7% top-5 test accuracy in ImageNet, which is a dataset of over 14 million images belonging to 1000 classes. It was one of the famous model submitted to ILSVRC-2014. It makes the improvement over AlexNet by replacing large kernel-sized filters (11 and 5 in the first and second convolutional layer, respectively) with multiple 3×3 kernel-sized filters one after another. VGG16 was trained for weeks and was using NVIDIA Titan Black GPU’s.

ARCHITECTURE :

The input to cov1 layer is of fixed size 224 x 224 RGB image. The image is passed through a stack of convolutional (conv.) layers, where the filters were used with a very small receptive field: 3×3 (which is the smallest size to capture the notion of left/right, up/down, center). In one of the configurations, it also utilizes 1×1 convolution filters, which can be seen as a linear transformation of the input channels (followed by non-linearity). The convolution stride is fixed to 1 pixel; the spatial padding of conv. layer input is such that the spatial resolution is preserved after convolution, i.e. the padding is 1-pixel for 3×3 conv. layers. Spatial pooling is carried out by five max-pooling layers, which follow some of the conv. layers (not all the conv. layers are followed by max-pooling). Max-pooling is performed over a 2×2 pixel window, with stride 2.

Three Fully-Connected (FC) layers follow a stack of convolutional layers (which has a different depth in different architectures): the first two have 4096 channels each, the third performs 1000-way ILSVRC classification and thus contains 1000 channels (one for each class). The final layer is the soft-max layer. The configuration of the fully connected layers is the same in all networks.

All hidden layers are equipped with the rectification (ReLU) non-linearity. It is also noted that none of the networks (except for one) contain Local Response Normalisation (LRN), such normalization does not improve the performance on the ILSVRC dataset, but leads to increased memory consumption and computation time.

Data Collection :

We have two main directories inside the dataset folder of for Training and Testing. In both training and testing we have three diCollected a dataset of images with haarcascade_frontalface_default.xml pre-trained model. haarcascade is the model to detect the face .

# Sample Data Collection...


import cv2
import numpy as np


# Load HAAR face classifier
face_classifier = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')


# Load functions
def face_extractor(img):
    # Function detects faces and returns the cropped face
    # If no face detected, it returns the input image
    
    gray = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY)
    faces = face_classifier.detectMultiScale(gray, 1.3, 5)
    
    if faces is ():
        return None
    
    # Crop all faces found
    for (x,y,w,h) in faces:
        cropped_face = img[y:y+h, x:x+w]


    return cropped_face


# Initialize Webcam
cap = cv2.VideoCapture(0)
count = 0


# Collect 100 samples of your face from webcam input
while True:


    ret, frame = cap.read()
    if face_extractor(frame) is not None:
        count += 1
        face = cv2.resize(face_extractor(frame), (224, 224))
        #face = cv2.cvtColor(face, cv2.COLOR_BGR2GRAY)


        # Save file in specified directory with unique name. Here I am saving images captured in folder named charan.
        file_name_path = './dataset/charan/face' + str(count) + '.jpg'
        cv2.imwrite(file_name_path, face)


        # Put count on images and display live count
        cv2.putText(face, str(count), (50, 50), cv2.FONT_HERSHEY_COMPLEX, 1, (0,255,0), 2)
        cv2.imshow('Face Cropper', face)
        
    else:
        print("Face not found")
        pass


    if cv2.waitKey(1) == 13 or count == 100: #13 is the Enter Key
        break
        
cap.release()
cv2.destroyAllWindows()      
print("Collecting Samples Complete")

Data Set :

Dataset contain 3 different classes to predict of Siva , Bharath and Charan. Split the images from this folder as 80% for Training and 20% for Testing.

Model :

Loading the VGG16 Model

from keras.applications import VGG16


# VGG16 was designed to work on 224 x 224 pixel input images sizes
img_rows = 224
img_cols = 224 


#Loads the VGG16 model 
model = VGG16(weights = 'imagenet', 
                 include_top = False, 
                 
                 input_shape = (img_rows, img_cols, 3))

got the weights of the model from imagenet.

Inpsecting each layer

Every Layer is trainable .Let's check

# Let's print our layers 
for (i,layer) in enumerate(model.layers):

    print(str(i) + " "+ layer.__class__.__name__, layer.trainable)

Let's freeze all layers except the top 4

We need to freeze layers to avoid them to train again Here.

from keras.applications import VGG16


# VGG16 was designed to work on 224 x 224 pixel input images sizes
img_rows = 224
img_cols = 224 


# Re-loads the VGG16 model without the top or FC layers
model = VGG16(weights = 'imagenet', 
                 include_top = False, 
                 input_shape = (img_rows, img_cols, 3))


# Here we freeze the last 4 layers 
# Layers are set to trainable as True by default
for layer in model.layers:
    layer.trainable = False
    
# Let's print our layers 
for (i,layer) in enumerate(model.layers):

    print(str(i) + " "+ layer.__class__.__name__, layer.trainable)

Lets make a function that return our FC head

In this function we separated the top layer of pre-trained VGG16 model for adding the new task which we need to perform .

def addTopModel(bottom_model, num_classes, D=256):
    """creates the top or head of the model that will be 
    placed ontop of the bottom layers"""
    top_model = bottom_model.output
    top_model = Flatten(name = "flatten")(top_model)
    top_model = Dense(D, activation = "relu")(top_model)
    top_model = Dropout(0.3)(top_model)
    top_model = Dense(num_classes, activation = "softmax")(top_model

    return top_model

Let's add our FC Head back onto VGG

After adding our new task we need to again attach the head to pre-trained model.

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D, ZeroPadding2D
from keras.layers.normalization import BatchNormalization
from keras.models import Model


# I want to detect faces with 3 different classes 
num_classes = 3


FC_Head = addTopModel(model, num_classes)


modelnew = Model(inputs=model.input, outputs=FC_Head)

print(modelnew.summary())

.

Model: "model_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_5 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
dense_3 (Dense)              (None, 256)               6422784   
_________________________________________________________________
dropout_2 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 3)                 771       
=================================================================
Total params: 21,138,243
Trainable params: 6,423,555
Non-trainable params: 14,714,688
_________________________________________________________________

None

Loading our Dataset

Load the dataset which we collected for face Recognition.

from keras.preprocessing.image import ImageDataGenerator


train_data_dir = '/content/drive/My Drive/dataset/Train/'
validation_data_dir = '/content/drive/My Drive/dataset/Val/'


train_datagen = ImageDataGenerator(
      rescale=1./255,
      rotation_range=20,
      width_shift_range=0.2,
      height_shift_range=0.2,
      horizontal_flip=True,
      fill_mode='nearest')
 
validation_datagen = ImageDataGenerator(rescale=1./255)
 
# Change the batchsize according to your system RAM
train_batchsize = 16
val_batchsize = 10
 
train_generator = train_datagen.flow_from_directory(
        train_data_dir,
        target_size=(img_rows, img_cols),
        batch_size=train_batchsize,
        class_mode='categorical')
 
validation_generator = validation_datagen.flow_from_directory(
        validation_data_dir,
        target_size=(img_rows, img_cols),
        batch_size=val_batchsize,
        class_mode='categorical',

        shuffle=False)

Training our top layers

In this step we are going to train the model.

from keras.optimizers import RMSprop
from keras.callbacks import ModelCheckpoint, EarlyStopping
                   
checkpoint = ModelCheckpoint("face_recog.h5",
                             monitor="val_loss",
                             mode="min",
                             save_best_only = True,
                             verbose=1)


earlystop = EarlyStopping(monitor = 'val_loss', 
                          min_delta = 0, 
                          patience = 3,
                          verbose = 1,
                          restore_best_weights = True)


# we put our call backs into a callback list
callbacks = [earlystop, checkpoint]


# Note we use a very small learning rate 
modelnew.compile(loss = 'categorical_crossentropy',
              optimizer = RMSprop(lr = 0.001),
              metrics = ['accuracy'])


nb_train_samples = 1190
nb_validation_samples = 170
epochs = 3
batch_size = 16


history = modelnew.fit_generator(
    train_generator,
    steps_per_epoch = nb_train_samples // batch_size,
    epochs = epochs,
    callbacks = callbacks,
    validation_data = validation_generator,
    validation_steps = nb_validation_samples // batch_size)

modelnew.save("face_recog.h5")

Epoch 1/3
74/74 [==============================] - 667s 9s/step - loss: 1.5254 - accuracy: 0.7796 - val_loss: 1.2937 - val_accuracy: 0.9400

Epoch 00001: val_loss improved from inf to 1.29368, saving model to face_recog.h5
Epoch 2/3
74/74 [==============================] - 659s 9s/step - loss: 0.2183 - accuracy: 0.9333 - val_loss: 1.5331 - val_accuracy: 0.7700

Epoch 00002: val_loss did not improve from 1.29368
Epoch 3/3
74/74 [==============================] - 655s 9s/step - loss: 0.1356 - accuracy: 0.9544 - val_loss: 6.5446e-06 - val_accuracy: 0.9800

Epoch 00003: val_loss improved from 1.29368 to 0.00001, saving model to face_recog.h5

Our model will save in face_recog.h5

Prediction:

Let's predict from the model we created.

import os
os.system("tput setaf 34")
print("WAIT TO ENTER INTO PREDICTION STEP .....")
os.system("tput setaf 8")


from keras.models import load_model
from keras.preprocessing import image
import numpy as np




model = load_model('face_recog.h5')


os.system("tput setaf 27")
print("Copy and Paste the image HERE to predict .....")
while True :
    os.system("tput setaf 11")
    file = input("Enter the absolute path to photo :")
    img_width, img_height = 224, 224
    img = image.load_img(file , target_size = (img_width, img_height))
    img = image.img_to_array(img)
    img = np.expand_dims(img, axis = 0)


    pred = model.predict(img)


    if pred[0][0] == 1.0 :
        print('You are Bharath')
    elif pred[0][1] == 1.0 :
        print('You are Charan.')
    else : 
    	print('You are Siva')
    enter = input("Enter 0 to exit :")
    if enter == 0 :

            exit()

When we run this python file. I t will ask to provide the path of the image to predict.

I gave the Image of charan for prediction It predicted correct.

I did this task a month back just wrote a article with detail. Link to my post

GitHub Link:

Thank you Vimal Daga Sir.

Done this task under guidance of Vimal Daga Sir.In training of MLOps by Linux World Informatics Pvt Ltd.

Face Recognition with VGG16 Transfer Learning

Siva Naik Kethavath

DevOps Engineer | MLOps | DataOps | Founding Engineer

Transfer Learning

Model :

Loading the VGG16 Model

Inpsecting each layer

Let's freeze all layers except the top 4

Let's add our FC Head back onto VGG

.

Loading our Dataset

Training our top layers

更多精彩文章

社区洞察

其他会员也浏览了

Guide to Nvidia GenAI Associate Certification (NCA-GENL)

Breakthrough in AI Chip Technology: China Unveils World's First Super All-Analog Photoelectronic Chip

Deep Dive into Deep Learning Frameworks: A Technical Perspective

The Nvidia Gen AI LLM Certification Journey

AI Leap Forward: WizardLM Excels, SDXL 1.0 Transforms, Neural Magic Elevates!

Superfast Matrix-Multiplication-Free LLMs Are Finally Here

The Impact of Computer Vision in Every Sector

Nvidia Jetson Nano based Thumb Classification

Number System - Quantization of LLMs, Part-1

The Innovation of Computer Vision with AI and ML

Transfer Learning

Model :

Loading the VGG16 Model

Inpsecting each layer

Let's freeze all layers except the top 4

Let's add our FC Head back onto VGG

.

Loading our Dataset

Training our top layers

Kubernetes Configuration Manager - Helm

2021年7月30日

Configure Kubernetes Cluster with Ansible and Deploy WordPress application

2021年7月30日

Accurate Routing Rules for better Connectivity and Security

2021年7月1日

Configure Docker Containers with Ansible Dynamic Inventory

2021年6月30日

AWS Relational Database connectivity to Applications

2021年6月18日

Configuration of HDFS Cluster with Ansible

2021年4月28日

Configuration of Kubernetes Cluster with Ansible

2021年4月25日

Accurate Routing Rules for Best Security

2021年3月14日

Ansible Dynamic Inventory and Refresh Inventory at Run time

2020年12月19日

Dockerzie SSH and Configure Webserver in Docker with Ansible

2020年12月17日

社区洞察

其他会员也浏览了

Guide to Nvidia GenAI Associate Certification (NCA-GENL)

Breakthrough in AI Chip Technology: China Unveils World's First Super All-Analog Photoelectronic Chip

Deep Dive into Deep Learning Frameworks: A Technical Perspective

The Nvidia Gen AI LLM Certification Journey

AI Leap Forward: WizardLM Excels, SDXL 1.0 Transforms, Neural Magic Elevates!

Superfast Matrix-Multiplication-Free LLMs Are Finally Here

The Impact of Computer Vision in Every Sector

Nvidia Jetson Nano based Thumb Classification

Number System - Quantization of LLMs, Part-1

The Innovation of Computer Vision with AI and ML