Deep Learning Project
A Gentle Introduction to Siamese Neural Networks Architecture?
What are Siamese Neural Networks?
Siamese Neural Networks, or SNNs, is one of the most popular neural network architectures that use this strategy and can predict multiple classes from very little data. This ability has made Siamese neural networks very popular in real-world applications in security, face recognition, signature verification, and more.
So how does the neural network architecture of Siamese networks make this possible?
Siamese Neural Networks: An Overview
A Siamese network consists of two or more identical subnetworks: neural networks with the same architecture, configuration, and weights. Even during training, parameter updates happen simultaneously for both neural networks with the same weights.
The purpose of having identical subnetworks is to train the model based on a similarity function that measures how different the feature vectors of one image are from the other. Because of this architecture, the model can be trained without much data.
Why use Siamese Neural Networks ?
With siamese neural networks, the common class imbalance problem can be addressed since the network does not need too many samples for a given class in the training data.
Moreover, a new class can be added without training the entire network from scratch after the siamese neural network has been trained and deployed. The model trains by learning how similar or dissimilar image pairs are, samples from a new class can be added to the trained siamese network, and training can be resumed since the network architecture will compare the new images with the rest of the classes and update the weights and the fully connected layer.
This behaviour is unique to a network architecture that uses one-shot learning since other categories of neural networks would have to be trained from scratch on a large, class-balanced dataset for significant performance.
But how does a siamese network learn from such a small set of samples? Let's look at the architecture and how the training process in siamese neural networks works.
Siamese Neural Network Architecture Explained
As described above, the architecture below shows two identical subnetworks that make up a siamese neural network. Feature vectors from both networks are compared using a loss function L. There are two strategies for training the siamese network using different loss functions.
First, the feature vectors of similar and dissimilar pairs should be descriptive, informative, and distinct enough from each other so that segregation can be learned effectively.
And secondly, the feature vectors of similar image pairs should be similar enough, and those for dissimilar pairs should be dissimilar enough so that the model can quickly learn semantic similarity.
To make sure the model can learn these feature vectors quickly, the loss function should incentivize both learning the similarity and dissimilarity of things heavily enough. Here is where the siamese neural network strategy helps - by comparing one image with all the other images, the model learns what "similar" is and how to define and recognize dissimilar pairs.
To gain this kind of information, the cross-entropy loss cannot help as it works on a class prediction basis. Mean squared errors also do not give enough information needed for our goal. The most commonly used loss functions are a Contrastive loss function and a Triplet loss function. Let's look at each of them in detail.
Contrastive Loss Function
The contrastive loss function is a distance-based loss function that updates weights such that two similar feature vectors have a minimal Euclidean distance. In comparison, the distance is maximized between two different vectors.
In the equation shown below, y represents whether or not the vectors are dissimilar, and Dw is the Euclidean distance between the vectors. When the vectors are dissimilar (y=1), the loss function minimizes the second term -- for which Dw must be maximized (encourage more distance between dissimilar vectors). We want these vectors to have a distance of more than at least m, and we avoid computation if the vectors are already m units apart by defaulting to 0.
Similarly, if the vectors are similar (y=0), the loss function must minimize Dw.
领英推荐
Contrastive Loss Function in Siamese Neural Networks
However, because of the binary nature of this function to bring the vectors either close or far from each other, we cannot learn how similar two vectors are to each other. Thus, another loss function helps us learn both similarity score and dissimilarity in a better way.
Triplet Loss Function in Siamese Network
By using triplet loss, we can tell how similar an image looks to the others (within or outside its class) when compared. The siamese network learns the similarity ranking using the score computed in this fashion.
For this, the loss is computed by comparing a given image (called anchor image) with a positive image (which is similar to the anchor image) and a negative image (which is dissimilar to it). Computing the intra-distance for each of these pairs, the model knows what similarity looks like and how different the given image must be from the other classes.
So, in the equation below, f(A) is the anchor image, and f(P) and f(N) is the positive image and negative image, respectively. Again, for the loss function to minimize the RHS, the term with f(N) would have to be maximized and that with f(P) minimized. This aligns with the strategy that we want similar pairs closer and dissimilar pairs further apart. α is just a regularizing parameter.
Triplet Loss Function in Siamese Networks
Read here for further explanation on the Triplet Loss Function in Siamese Networks
Pros and Cons of Siamese Neural Networks
As we saw when getting introduced to siamese neural networks they offer many benefits over conventional CNNs in certain specific tasks.
Advantages of Siamese Network
Semantic Similarity: Firstly, siamese networks do not learn from training errors or mispredictions but from semantic similarity. This encourages the model to learn better and better embeddings that represent images from the support set and bring related concepts close in the feature space. By learning such a feature space, similar to how textual models learn word embeddings, the model learns concepts and attempts to understand why certain images are more similar than others instead of just extracting static features using convolutions.
Class Imbalance: The biggest benefit directly applicable to the real world is the capability of giving benchmark performance on very little data. With the data requirement reduced, the problem of class imbalance also vanishes.
Siamese Neural Network for Face Recognition?
Face recognition is nothing but another image recognition or classification task. One-shot learning is particularly applicable to this task because it is impossible to have sufficient samples of one person's face (one label) in practical cases. Face recognition is often used as an attendance system or security measure to restrict access to buildings and offices to employees only.
In this case, not only is it impractical to get many images of one person to get a decent success rate but adding access to an incoming new employee would mean training the entire CNN from scratch and risking the existing performance.
Siamese Neural Network for Image Classification?
Signature verification is a commonly found use of image classification in the context of one-shot learning. A signature verification system checks the authenticity of a given signature against the one existing in a dataset. Based on the sign's similarity, the sample can be classified as real or fake. With this task widely prevalent in banks and financial institutions worldwide, Siamese networks quickly became the go-to solution for this otherwise manually laborious task.
Yes, siamese networks are trained in a supervised fashion. It needs labeled information to know whether the images it compares are similar. However, one can also tune siamese networks to learn in a self-supervised (SSL: self-supervised learning) manner.
Important Links