BxD Primer Series: Transfer Learning Techniques
Hey there ??
Welcome to BxD Primer Series where we are covering topics such as Machine learning models, Neural Nets, GPT, Ensemble models, Hyper-automation in ‘one-post-one-topic’ format. Today’s post is on?Transfer Learning Techniques. Let’s get started:
The What:
Transfer learning involves using pre-trained models as a starting point for new related tasks. It allows developers to take advantage of the knowledge and learning captured by large datasets and neural networks, and apply it to new problems that have limited data.
Different types of transfer learning techniques can be used depending on the nature of new problem and available resources. Common techniques are (more details later):
Applications of Transfer Learning:
Some examples where transfer learning have worked successfully:
And many more…
Feature Extraction in Transfer Learning:
The process of feature extraction involves taking a pre-trained neural network and?removing the output layer?that was trained for original task. Remaining layers in the network can then be used as a?fixed feature extractor that can map?input data to a set of high-level features that capture important patterns in data.
The output of this feature extractor is then?fed into a new model, that is trained to perform a different task on new data. The advantage here is that pre-trained network has already learned to recognize important features in data, and this knowledge is leveraged to improve performance of new model. This requires significantly less data for new model and is computationally efficient.
Fine-tuning in Transfer Learning:
Fine-tuning involves taking a pre-trained model and adapting it to a new task by?re-training some or all of its layers?with a new dataset. It is used when pre-trained model needs to be adapted to a new task different from the one model was originally trained on.
Fine-tuning drastically reduces the amount of labeled data required for training and speeds up the training process by initializing the model with pre-trained weights.
Common methods for fine-tuning a pre-trained model:
Note: Main difference between feature extraction and fine-tuning is the degree of adaptation of pre-trained model to new task.
Domain Adaptation in Transfer Learning:
Domain adaptation is the process of adapting a pre-trained model to a new domain with different data distributions, without the need to retrain the model from scratch on new data.
For example, a model trained to recognize faces in images captured by high-quality camera may not perform well when applied to low-quality images captured by a surveillance camera. In this case, domain adaptation can be used to adapt the pre-trained model to new domain of low-quality images.
Data distributions alignment, style transfer (from source to target) and learning domain-invariant representations are common approaches in domain adaptation.
Multi-task Transfer Learning:
Multi-task learning is to train a single model to perform multiple tasks simultaneously. The model learns shared representations that can be useful for different tasks.
This is done by training the model on a?joint objective function?that combines the loss functions of all tasks. Shared layers of model learn to extract features that are useful for all tasks, while task-specific layers learn to perform specific tasks.
领英推荐
For example, in computer vision application, a multi-task learning model could be trained for object detection and image segmentation simultaneously. Shared layers of model learn to extract relevant features from input images, while task-specific layers learn to predict locations and labels of objects in image, as well as segment different regions of image.
One/Few-Shot Transfer Learning:
One/Few-shot learning involves training a model to recognize new objects or patterns from only one or a few examples, rather than the hundreds or thousands of examples typically required for traditional machine learning tasks.
For example, it could be used to quickly recognize new products in a retail setting, or to identify new types of tumors in medical imaging even in cases of few examples.
In one-shot learning, pre-trained model is typically used as a feature extractor. These features can then be used to train a new model that recognize new objects from only a few examples.
One approach is to use a siamese network, which consists of two identical neural networks that share same weights. First network is trained on a set of examples of a particular class, while second network is trained on a different set of examples?of same class. During training, the two networks are fed pairs of examples, and the objective is to?learn a similarity metric?that can distinguish between pairs of examples from same class and different classes. Trained model is used to recognize new examples from only one or few examples.
Overcoming Catastrophic Forgetting:
Catastrophic forgetting is a phenomenon where a neural network trained on one task tends to?forget its existing knowledge when learning a new task. This can occur when the network is fine-tuned on a new dataset or when new layers are added to the network.
Techniques to avoid catastrophic forgetting:
Note: Negative transfer is one more challenge in transfer learning which occurs when the pre-trained model is not well-suited for target task and actually hinders performance. This usually occurs when there are significant differences between source and target tasks.
The Why:
Reasons to use transfer learning techniques:
The Why Not:
Reasons to not use transfer learning techniques:
Time for you to support:
In next edition, we will wrap up the primer series.
Let us know your feedback!
Until then,
Have a great time! ??