FIFTY Transfer Learning Models (for Deep Neural Networks) From Keras & PyTorch with Useful Links (for advanced ML Practitioners) - Shailendra Kadre
Shailendra Kadre
Ex HP Inc, Satyam | ML DL NLP | ENGINEERING PROJECT/PROGRAM MANAGEMENT | YouTuber AI | MTech IITD | MS AI UK | PMP
Note from the author: It's a long document compiled by Shailendra Kadre from numerous web blogs, articles, research papers, Kaggle, and many more. It's intended to be a reference document for your day-to-day working with TL Models. Till now I have not seen any compilation of this kind, which depicts all the Keras and PyTorch TL models at a single place. I hope you will find it useful.
“Conventional machine learning and deep learning algorithms, so far, have been traditionally designed to work in isolation. These algorithms are trained to solve specific tasks. The models have to be rebuilt from scratch once the feature-space distribution changes. Transfer learning is the idea of overcoming the isolated learning paradigm and utilizing knowledge acquired for one task to solve related ones”….. …Dipanjan (DJ) Sarkar
Transfer learning finds applications in computing resource-intensive fields such as Computer Vision, NLP, and Audio/Speech.
In the following description, I am assuming the reader is already initiated to Transfer Learning (TL). Given below is a comprehensive listing of Transfer Learning Models.
Before we get into a more comprehensive list of 50 Transfer Learning Models, make sure that you are at least familiar with the following most popular and successful Computer Vision models from Keras. The theory is given along with the code for implementation.
10 Advanced Deep Learning Architectures Data Scientists Should Know!
ImageNet: VGGNet, ResNet, Inception, and Xception with Keras
https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/
The following are some important model comparison results that may be used to compare the performance of 12 different models.
The study results showed that, on the whole, the VGG16-FT was the optimal model among the 12 models, as it had the highest working accuracy of 98% and the fastest truck image classification speed of 41.1 images/s. In addition, the VGG16-BF, InceptionV3-BF, and Xception-BF all reached the satisfactory goal of a working accuracy of over 95%.
Reference Paper: Deep Learning Model Comparison for Vision-Based Classification of Full/Empty-Load Trucks in Earthmoving Operations file:https://www.researchgate.net/publication/337265632_Deep_Learning_Model_Comparison_for_Vision-Based_Classification_of_FullEmpty-Load_Trucks_in_Earthmoving_Operations
Refer https://keras.io/examples/ for excelant code examples from Keras
AutoML for large scale image classification and object detection: https://ai.googleblog.com/2017/11/automl-for-large-scale-image.html
Base model selection:
kaggle.com/c/imet-2019-fgvc6/discussion/89744
https://predictivehacks.com/object-detection-with-pre-trained-models-in-keras/
Transfer Learning Models from Keras
\n
Xception
Xception is an extension of the Inception architecture which replaces the standard Inception modules with depthwise separable convolutions. Xception sports the smallest weight serialization at only 91MB.
Many professionals choose Xception network because it has less parameter and high accuracy. There are less trade off compare to other networks which has high accuracy then xception network but the computation power required for other network is high. Reference link
Used for image classification using Keras. It is shown that this architecture, dubbed Xception, slightly outperforms Inception V3 on the ImageNet dataset (which Inception V3 was designed for), and significantly outperforms Inception V3 on a larger image classification dataset comprising 350 million images and 17,000 classes. Since the Xception architecture has the same number of parameters as Inception V3, the performance gains are not due to increased capacity but rather to a more efficient use of model parameters.
Useful Links:
https://keras.io/api/applications/xception/
https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/
https://www.kaggle.com/abnera/transfer-learning-keras-xception-cnn
\n
VGG16
The VGG-16 model is a 16-layer (convolution and fully connected) network built on the ImageNet database. The popular VGG-16 model, created by the Visual Geometry Group at the University of Oxford, specializes in building very deep convolutional networks for large-scale visual recognition.
We still use VGG in many deep learning image classification problems; however, smaller network architectures are often more desirable (such as SqueezeNet, GoogLeNet, etc.).
Inception
Useful Links:
https://keras.io/api/applications/vgg/#vgg16-function
https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/
https://www.kaggle.com/keras/vgg16/home
\n
VGG19
The concept of the VGG19 is same as the VGG16 excepts the network is 19 layers deep. The “16” and “19” stand for the number of weight layers in the network just has 3 more conv3 layers. The deeper the network, the more patterns it can extract from the given data.
VGG-19 is a convolutional neural network that is trained on more than a million images from the ImageNet database [1]. The network is 19 layers deep and can classify images into 1000 object categories, such as a keyboard, mouse, pencil, and many animals. As a result, the network has learned rich feature representations for a wide range of images.
We still use VGG in many deep learning image classification problems; however, smaller network architectures are often more desirable (such as SqueezeNet, GoogLeNet, etc.).
Useful Links:
https://keras.io/api/applications/vgg/#vgg19-function
https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/
https://www.kaggle.com/keras/vgg19/home
\n
ResNet50
Microsoft Research Asia (MRA) in 2015 [26] proposed a very deep neural network model called Residual Networks (ResNet) to ensure that the performance of top layers is as good as the lower layers without vanishing gradient and optimization problem. It allows building network deeper enough to extract complex patterns from the data while maintaining an optimum accuracy improvement. MRA succeeded in developing very deep neural networks of at least fifty convolution layers such as ResNet50, ResNet101 and ResNet152.
ResNet is one of the most powerful deep neural networks which has achieved fantabulous performance results in the ILSVRC 2015 classification challenge. ResNet has achieved excellent generalization performance on other recognition tasks and won the first place on ImageNet detection, ImageNet localization, COCO detection and COCO segmentation in ILSVRC and COCO 2015 competitions. There are many variants of ResNet architecture i.e. same concept but with a different number of layers. We have ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-110, ResNet-152, ResNet-164, ResNet-1202 etc. The name ResNet followed by a two or more digit number simply implies the ResNet architecture with a certain number of neural network layers.
Even though ResNet is much deeper than VGG16 and VGG19, the model size is actually substantially smaller due to the usage of global average pooling rather than fully-connected layers — this reduces the model size down to 102MB for ResNet50.
Perhaps three of the more popular models are as follows:
· VGG (e.g. VGG16 or VGG19).
· GoogLeNet (e.g. InceptionV3).
· Residual Network (e.g. ResNet50).
These models are both widely used for transfer learning both because of their performance, but also because they were examples that introduced specific architectural innovations, namely consistent and repeating structures (VGG), inception modules (GoogLeNet), and residual modules (ResNet).
Useful Links:
https://keras.io/api/applications/resnet/#resnet50-function
https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/
https://www.kaggle.com/keras/resnet50/home
https://cv-tricks.com/keras/understand-implement-resnets/
\n
ResNet101
ResNet-101 is a convolutional neural network that is 101 layers deep. Deeper ResNet encoder has produced better results taking performance into consideration when compared to its shallow counterpart.
Useful Links:
https://keras.io/api/applications/resnet/#resnet101-function
https://www.kaggle.com/pytorch/resnet101
\n
ResNet152
Deeper ResNet encoder has produced better results taking performance into consideration when compared to its shallow counterpart. Both training and validation accuracies using ResNet152 has surpassed what was obtained in both ResNet50 and ResNet101 9after making modifications like fine tuning). file:///C:/Users/Shailendra%20Kadre/Downloads/asi-03-00020%20(1).pdf
The winner of the ImageNet competition in 2015 was ResNet152 i.e. Residual Network having 152 layers variant. In this post, we will cover the concept of ResNet50 which can be generalized to any other variant of ResNet. Prior to the explanation of the deep residual network, I would like to talk about simple deep networks (networks having more number of convolution, pooling and activation layers stacked one over the other). Since 2013, the Deep Learning community started to build deeper networks because they were able to achieve high accuracy values. Furthermore, deeper networks can represent more complex features, therefore the model robustness and performance can be increased. However, stacking up more layers didn’t work for the researchers. While training deeper networks, the problem of accuracy degradation was observed. In other words, adding more layers to the network either made the accuracy value to saturate or it abruptly started to decrease. The culprit for accuracy degradation was vanishing gradient effect which can only be observed in deeper networks.
Useful Links:
https://keras.io/api/applications/resnet/#resnet152-function
https://www.kaggle.com/prasunmishra/resnet152-transfer-learning-pre-trained-weights
https://gist.github.com/flyyufelix/7e2eafb149f72f4d38dd661882c554a6
\n
ResNet50V2
Till now we have discussed the ResNet50 version 1. Now, we will discuss the ResNet50 version 2 which is all about using the pre-activation of weight layers instead of post-activation.
The major differences between ResNet – V1 and ResNet – V2 are as follows:
I. ResNet V1 adds the second non-linearity after the addition operation is performed in between the x and F(x). ResNet V2 has removed the last non-linearity, therefore, clearing the path of the input to output in the form of identity connection.
II. ResNet V2 applies Batch Normalization and ReLU activation to the input before the multiplication with the weight matrix (convolution operation). ResNet V1 performs the convolution followed by Batch Normalization and ReLU activation.
The ResNet V2 mainly focuses on making the second non-linearity as an identity mapping i.e. the output of addition operation between the identity mapping and the residual mapping should be passed as it is to the next block for further processing. However, the output of the addition operation in ResNet V1 passes from ReLU activation and then transferred to the next block as the input.
Useful Links:
https://keras.io/api/applications/resnet/#resnet50v2-function
https://www.kaggle.com/mathormad/resnet50-v2-keras-focal-loss-mix-up
https://www.kaggle.com/nguyenhoa/dog-cat-classifier-resnet50v2-tf-keras-gradcam
\n
ResNet101V2
The primary difference between ResNetV2 and the original (V1) is that V2 uses batch normalization before each weight layer.
Useful Links:
https://keras.io/api/applications/resnet/#resnet101-function
https://www.codeproject.com/Articles/5252014/Transfer-Learning-with-TensorFlow-2
https://www.kaggle.com/jbeltranleon/xrays-multi-resnet101v2/input (it’s non-English. You can use the code and use Google Translate )
\n
ResNet152V2
The primary difference between ResNetV2 and the original (V1) is that V2 uses batch normalization before each weight layer.
Useful Links:
https://keras.io/api/applications/resnet/#resnet152v2-function
https://scisharp.github.io/Keras.NET/api/Keras.Applications.ResNetV2.ResNet152V2.html
https://www.kaggle.com/urayukitaka/comparing-resnet-model - Important Resnet Models Compared using Keras
Summary - Key Features of ResNet:
I. ResNet uses Batch Normalization at its core. The Batch Normalization adjusts the input layer to increase the performance of the network. The problem of covariate shift is mitigated.
II. ResNet makes use of the Identity Connection, which helps to protect the network from vanishing gradient problem.
III. Deep Residual Network uses bottleneck residual block design to increase the performance of the network.
\n
Note: Why it’s Difficult to Run ResNet Yourself and How MissingLink Can Help
ResNet can have between dozens to thousands of convolutional layers and can take a long time to train and execute – from hours to several weeks in extreme cases. You will need to distribute a ResNet model across multiple GPUs, and if performance is insufficient, scale out to multiple machines.
However, you’ll find that running a deep learning model on multiple machines is difficult:
- On-premises, you need to set up multiple machines for deep learning, manually run experiments and carefully watch resource utilization
- In the cloud, you can spin up machines quickly, but need to build and test machine images, and manually run experiments on each machine. You’ll need to “babysit” your machines to ensure an experiment is always running, and avoid wasting money with expensive GPU machines.
InceptionV3
Inceptionv3 is a convolutional neural network for assisting in image analysis and object detection, and got its start as a module for Googlenet.The weights for Inception V3 are smaller than both VGG and ResNet, coming in at 96MB. Inception-v3 is a convolutional neural network that is 48 layers deep. You can load a pretrained version of the network trained on more than a million images from the ImageNet database. The pretrained network can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals. As a result, the network has learned rich feature representations for a wide range of images. The network has an image input size of 299-by-299.
It has 4 versions. The first GoogLeNet must be the Inception-v1, but there are numerous typos in Inception-v3 which lead to wrong descriptions about Inception versions.
“The Inception deep convolutional architecture was introduced as GoogLeNet in (Szegedy et al. 2015a), here named Inception-v1. Later the Inception architecture was refined in various ways, first by the introduction of batch normalization (Ioffe and Szegedy 2015) (Inception-v2). Later by additional factorization ideas in the third iteration (Szegedy et al. 2015b) which will be referred to as Inception-v3 in this report.”
Inception helps classification of objects in the world of computer vision. One such use is in life sciences, where it aids in the research of Leukemia.
Inception v-3, is sometimes called the best performing high resolution image classifier based on Convolutional Neural Network out there today. Inception can be the best architecture to be implemented into the devices with low processing units, until a better model succeeds the results Inception has shown. It gives a good accuracy when it comes to recognizing facial expression from low resolution images
Useful Links:
https://keras.io/api/applications/inceptionv3/
https://www.pyimagesearch.com/2017/03/20/imagenet-vggnet-resnet-inception-xception-keras/
https://arxiv.org/abs/1512.00567
Inception-v3 on Google Cloud for reference about the architecture:
https://cloud.google.com/tpu/docs/inception-v3-advanced
\n
InceptionResNetV2
Very deep convolutional networks have been central to the largest advances in image recognition performance in recent years. One example is the Inception architecture that has been shown to achieve very good performance at relatively low computational cost. Recently, the introduction of residual connections in conjunction with a more traditional architecture has yielded state-of-the-art performance in the 2015 ILSVRC challenge; its performance was similar to the latest generation Inception-v3 network. This raises the question of whether there are any benefit in combining the Inception architecture with residual connections. Here we give clear empirical evidence that training with residual connections accelerates the training of Inception networks significantly. There is also some evidence of residual Inception networks outperforming similarly expensive Inception networks without residual connections by a thin margin.
“In order to spur even further progress in the field, today we are happy to announce the release of Inception-ResNet-v2, a convolutional neural network (CNN) that achieves a new state of the art in terms of accuracy on the ILSVRC image classification benchmark. Inception-ResNet-v2 is a variation of our earlier Inception V3 model which borrows some ideas from Microsoft's ResNet papers [1][2]. The full details of the model are in our arXiv preprint Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning.”… Reference: https://ai.googleblog.com/2016/08/improving-inception-and-image.html
Residual connections allow shortcuts in the model and have allowed researchers to successfully train even deeper neural networks, which have lead to even better performance. This has also enabled significant simplification of the Inception blocks.
Useful Links:
https://keras.io/api/applications/inceptionresnetv2/
https://medium.com/@mannasiladittya/building-inception-resnet-v2-in-keras-from-scratch-a3546c4d93f0
https://www.kaggle.com/keras/inceptionresnetv2
https://www.kaggle.com/byrachonok/pretrained-inceptionresnetv2-base-classifier
\n
MobileNet
Following are the advantages of using MobileNet over other state-of-the-art deep learning models.
· Reduced network size - 17MB.
· Reduced number of parameters - 4.2 million.
· Faster in performance and are useful for mobile applications.
· Small, low-latency convolutional neural network.
Advantages always come up with some disadvantages and with MobileNet, it’s the accuracy. Yes! Eventhough MobileNet has reduced size, reduced parameters and performs faster, it is less accurate than other state-of-the-art networks as discussed in this paper. But don’t worry. There is only a slight reduction in accuracy when compared to other networks.
Useful Links:
https://keras.io/api/applications/mobilenet/
https://towardsdatascience.com/transfer-learning-using-mobilenet-and-keras-c75daf7ff299
https://www.kaggle.com/satian/keras-mobilenet-starter
https://www.kaggle.com/hsinwenchang/keras-mobilenet-data-augmentation-visualize
\n
MobileNetV2
MobileNetV2 is Light Weight Model for Image Classification. Typically it can outperform MobileNetV1, NASNet, and ShuffleNet V1
“Last year we introduced MobileNetV1, a family of general purpose computer vision neural networks designed with mobile devices in mind to support classification, detection and more. The ability to run deep networks on personal mobile devices improves user experience, offering anytime, anywhere access, with additional benefits for security, privacy, and energy consumption. As new applications emerge allowing users to interact with the real world in real time, so does the need for ever more efficient neural networks.
Today, we are pleased to announce the availability of MobileNetV2 to power the next generation of mobile vision applications. MobileNetV2 is a significant improvement over MobileNetV1 and pushes the state of the art for mobile visual recognition including classification, object detection and semantic segmentation. MobileNetV2 is released as part of TensorFlow-Slim Image Classification Library, or you can start exploring MobileNetV2 right away in Colaboratory. Alternately, you can download the notebook and explore it locally using Jupyter. MobileNetV2 is also available as modules on TF-Hub, and pretrained checkpoints can be found on github.
MobileNetV2 builds upon the ideas from MobileNetV1 [1], using depthwise separable convolution as efficient building blocks. However, V2 introduces two new features to the architecture: 1) linear bottlenecks between the layers, and 2) shortcut connections between the bottlenecks1.”… Reference: https://ai.googleblog.com/2018/04/mobilenetv2-next-generation-of-on.html
https://keras.io/api/applications/mobilenet/#mobilenetv2-function
https://www.kaggle.com/devang/transfer-learning-with-keras-and-mobilenet-v2
https://developer.ridgerun.com/wiki/index.php?title=Keras_with_MobilenetV2_for_Deep_Learning
https://github.com/xiaochus/MobileNetV2
\n
DenseNet121
DenseNet architecture is new, it is a logical extension of ResNet.
“Densely Connected Convolutional Networks [1], DenseNets, are the next step on the way to keep increasing the depth of deep convolutional networks.
We have seen how we have gone from LeNet with 5 layers, to VGG with 19 layers and ResNets surpassing 100 even 1000 layers.
The problems arise with CNNs when they go deeper. This is because the path for information from the input layer until the output layer (and for the gradient in the opposite direction) becomes so big, that they can get vanished before reaching the other side.
DenseNets simplify the connectivity pattern between layers introduced in other architectures:
· Highway Networks
· Residual Networks
· Fractal Networks
The authors solve the problem ensuring maximum information (and gradient) flow. To do it, they simply connect every layer directly with each other.
Instead of drawing representational power from extremely deep or wide architectures, DenseNets exploit the potential of the network through feature reuse.
What problem DenseNets solve?
Counter-intuitively, by connecting this way DenseNets require fewer parameters than an equivalent traditional CNN, as there is no need to learn redundant feature maps.
Furthermore, some variations of ResNets have proven that many layers are barely contributing and can be dropped. In fact, the number of parameters of ResNets are big because every layer has its weights to learn. Instead, DenseNets layers are very narrow (e.g. 12 filters), and they just add a small set of new feature-maps.
Another problem with very deep networks was the problems to train, because of the mentioned flow of information and gradients. DenseNets solve this issue since each layer has direct access to the gradients from the loss function and the original input image.” Reference Link: https://towardsdatascience.com/understanding-and-visualizing-densenets-7f688092391a
DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less memory and computation to achieve high performance.
Various DensNet models compared: https://arxiv.org/pdf/1608.06993.pdf
Useful Links:
https://keras.io/api/applications/densenet/#densenet121-function
https://www.kaggle.com/pytorch/densenet121
https://www.kaggle.com/vinicioswentz/keras-densenet-malaria
https://www.kaggle.com/ashishpatel26/lung-opacity-classification-using-densenet-121
https://www.kaggle.com/phmagic/keras-densenet121-multi-label-baseline
https://github.com/keras-team/keras-applications/blob/master/keras_applications/densenet.py
https://pythonawesome.com/densenet-implementation-in-keras/
https://towardsdatascience.com/densenet-2810936aeebb
\n
DenseNet169
Various DensNet models compared: https://arxiv.org/pdf/1608.06993.pdf
Useful Links:
https://www.kaggle.com/jaymin71/keras-densenet169
https://keras.io/api/applications/densenet/#densenet169-function
https://keras.io/api/applications/densenet/
https://www.kaggle.com/luisda2994/densenet169-transfer-learning
\n
DenseNet201
Various DensNet models compared: https://arxiv.org/pdf/1608.06993.pdf
Useful Links:
https://keras.io/api/applications/densenet/#densenet201-function
https://www.kaggle.com/gbellport/pre-trained-densenet
https://www.kaggle.com/grecs2001/landcover-classification-keras-densenet201
\n
NASNetMobile
NASNet refers to Neural Architecture Search Network. It is a a family of models that were designed automatically by learning the model architectures directly on the dataset of interest.
NASNet-Mobile is a convolutional neural network that is trained on more than a million images from the ImageNet database [1]. The network can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals. As a result, the network has learned rich feature representations for a wide range of images. The network has an image input size of 224-by-224.
NasNetMobile architecture consists of a set of blocks built with neural network cells. Block is an operational module that includes transformations known from image classifying neural networks, including: normal convolutions, separable-convolutions, max-pooling, average-pooling, identity mapping, etc. The network had been trained to assign to an image 1 out of 1000 categories that include animals, flowers, and furniture. As a result, the network has ‘learned’ rich feature representations for a wide range of images.
Refer to the following for NASNet description:
AutoML for large scale image classification and object detection: https://ai.googleblog.com/2017/11/automl-for-large-scale-image.html
Useful Links:
https://keras.io/api/applications/nasnet/#nasnetmobile-function
https://www.kaggle.com/anjanatiha/malaria-detection-using-keras-accuracy-95
https://www.kaggle.com/CVxTz/cnn-starter-nasnet-mobile-0-9709-lb
\n
NASNetLarge
NASNet-Large is a convolutional neural network that is trained on more than a million images from the ImageNet database [1]. The network can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals. As a result, the network has learned rich feature representations for a wide range of images. The network has an image input size of 331-by-331.
https://keras.io/api/applications/nasnet/#nasnetlarge-function
https://www.kaggle.com/anjanatiha/malaria-detection-using-keras-accuracy-95
\n
EfficientNetB0
“Convolutional neural networks (CNNs) are commonly developed at a fixed resource cost, and then scaled up in order to achieve better accuracy when more resources are made available. For example, ResNet can be scaled up from ResNet-18 to ResNet-200 by increasing the number of layers, and recently, GPipe achieved 84.3% ImageNet top-1 accuracy by scaling up a baseline CNN by a factor of four. The conventional practice for model scaling is to arbitrarily increase the CNN depth or width, or to use larger input image resolution for training and evaluation. While these methods do improve accuracy, they usually require tedious manual tuning, and still often yield suboptimal performance. What if, instead, we could find a more principled method to scale up a CNN to obtain better accuracy and efficiency?
In our ICML 2019 paper, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”, we propose a novel model scaling method that uses a simple yet highly effective compound coefficient to scale up CNNs in a more structured manner. Unlike conventional approaches that arbitrarily scale network dimensions, such as width, depth and resolution, our method uniformly scales each dimension with a fixed set of scaling coefficients. Powered by this novel scaling method and recent progress on AutoML, we have developed a family of models, called EfficientNets, which superpass state-of-the-art accuracy with up to 10x better efficiency (smaller and faster).
Compound Model Scaling: A Better Way to Scale Up CNNs
In order to understand the effect of scaling the network, we systematically studied the impact of scaling different dimensions of the model. While scaling individual dimensions improves model performance, we observed that balancing all dimensions of the network—width, depth, and image resolution—against the available resources would best improve overall performance.
The first step in the compound scaling method is to perform a grid search to find the relationship between different scaling dimensions of the baseline network under a fixed resource constraint (e.g., 2x more FLOPS).This determines the appropriate scaling coefficient for each of the dimensions mentioned above. We then apply those coefficients to scale up the baseline network to the desired target model size or computational budget.
Comparison of different scaling methods. Unlike conventional scaling methods (b)-(d) that arbitrary scale a single dimension of the network, our compound scaling method uniformly scales up all dimensions in a principled way.
This compound scaling method consistently improves model accuracy and efficiency for scaling up existing models such as MobileNet (+1.4% imagenet accuracy), and ResNet (+0.7%), compared to conventional scaling methods.
By providing significant improvements to model efficiency, we expect EfficientNets could potentially serve as a new foundation for future computer vision tasks. Therefore, we have open-sourced all EfficientNet models, which we hope can benefit the larger machine learning community. You can find the EfficientNet source code and TPU training scripts here.”
Reference: https://ai.googleblog.com/2019/05/efficientnet-improving-accuracy-and.html
In 2012, AlexNet won the ImageNet Large Scale Visual Recognition Competition (ILSVRC) beating the nearest competitor by nearly 10% in top-5 accuracy on ImageNet dataset. AlexNet used a whopping 62 million parameters!
Soon people figured out the obvious ways in which AlexNet was not efficient. GoogleNet, the winner of ILSVRC 2014, used only 6.8 million parameters while being substantially more accurate than AlexNet.
After these initial inefficiencies were recognized and fixed, accuracy improvements in subsequent years came at the expense of an increased number of model parameters.
EfficientNetB0 equation suggests we can do model scaling on any CNN architecture. While that is true, the authors found that the choice of the initial model to scale makes a difference in the final output.
Useful Links:
https://keras.io/api/applications/efficientnet/#efficientnetb0-function
https://www.kaggle.com/fanconic/skin-cancer-efficientnetb0
https://www.kaggle.com/micajoumathematics/fine-tuning-efficientnetb0-on-cifar-100
\n
EfficientNetB1
All Elastic Models Compared:
https://www.dlology.com/blog/transfer-learning-with-efficientnet/
Useful Links:
https://www.kaggle.com/ixeption/transferlearning-with-efficientnetb1-keras-80
https://keras.io/api/applications/efficientnet/#efficientnetb1-function
\n
EfficientNetB2
All Elastic Models Compared:
https://www.dlology.com/blog/transfer-learning-with-efficientnet/
Useful Links:
https://www.kaggle.com/rsmits/keras-efficientnet-b2-starter-code
https://www.kaggle.com/xhlulu/efficientnet-keras-source-code
https://keras.io/api/applications/efficientnet/#efficientnetb2-function
\n
EfficientNetB3
All Elastic Models Compared:
https://www.dlology.com/blog/transfer-learning-with-efficientnet/
Useful Links:
https://www.kaggle.com/fanconic/efficientnetb3-regression-keras/input
https://www.kaggle.com/rsmits/keras-efficientnet-b3-training-inference
https://www.kaggle.com/xhlulu/efficientnet-keras-source-code
https://keras.io/api/applications/efficientnet/#efficientnetb3-function
\n
EfficientNetB4
All Elastic Models Compared:
https://www.dlology.com/blog/transfer-learning-with-efficientnet/
Useful Links:
https://www.kaggle.com/mobassir/keras-efficientnetb4-for-intracranial-hemorrhage
https://www.kaggle.com/xhlulu/efficientnet-keras-source-code
https://keras.io/api/applications/efficientnet/#efficientnetb4-function
\n
EfficientNetB5
All Elastic Models Compared:
https://www.dlology.com/blog/transfer-learning-with-efficientnet/
Useful Links:
https://www.kaggle.com/womeigaibian/efficientnetb5-with-keras-aptos-2019
https://www.kaggle.com/xhlulu/efficientnet-keras-source-code
https://keras.io/api/applications/efficientnet/#efficientnetb5-function
https://www.kaggle.com/carlolepelaars/efficientnetb5-with-keras-aptos-2019
\n
EfficientNetB6
Small Notes on How to Use B6-B7 Keras EfficientNet
In the kernel, it appears at the moment that if we use
!pip3 install efficientnet
we will get the old 0.0.4 version
So instead use
!git clone https://github.com/qubvel/efficientnet.git
By this we will get the latest version which has new B6/B7 implementation and pretrained weights … :)
How to use: (models can be build with Keras or Tensorflow frameworks (efficientnet.keras / efficientnet.tfkeras))
import efficientnet.keras as efn
model = efn.EfficientNetB6(weights='imagenet')
All Elastic Models Compared:
https://www.dlology.com/blog/transfer-learning-with-efficientnet/
Useful Links:
https://www.kaggle.com/xhlulu/efficientnet-keras-source-code
https://www.kaggle.com/calebeverett/efficientnetb6-with-transformation
https://keras.io/api/applications/efficientnet/#efficientnetb6-function
https://pypi.org/project/efficientnet/
\n
EfficientNetB7
All Elastic Models Compared:
https://www.dlology.com/blog/transfer-learning-with-efficientnet/
Useful Links:
https://www.kaggle.com/xhlulu/efficientnet-keras-source-code
https://www.kaggle.com/cansleepless/keras-efficientnetb7
https://keras.io/api/applications/efficientnet/#efficientnetb7-function
https://www.kaggle.com/qinhui1999/iwildcam-2020-efficientnetb7-tpu-starter-v1
https://www.kaggle.com/c/aptos2019-blindness-detection/discussion/100186
\n
\n
Disclaimer: This article is just a loose compilation from numerous web sources, articles, papers and Kaggle Kernels with due respect and credit to all the contributors. No way, the author claims any level of accuracy or the effectiveness of the information given here - though all care been taken to include the information as correctly and completeness as possible. The users can use this information based on their own risk and perception. The author is no way responsible for any consequences, what so ever, or any kind of losses. .. Happy Learning & Good Luck!!!
Other Transfer Learning Models from PyTorch or other sources
Note to other contributors: I have done detailing at some level for Keras. For PyTorch and other models as gven velow, I will leave the detailing to some other contributor like you. As you got my work for free so I would request, if you complete detailing for other models as given below, leave your work also as an open source so that everyone can learn.
\n
AlexNet
AlexNet the pioneer deepest learning model to be trained on ImageNet database to categorize 1000 different objects in 2012 [24] made an impressive breakthrough
\n
VGG
\n
Bidirectional Encoder Representations from Transformers (BERT) by Google
\n
Conv3D:
\n
Caffe Model Zoo
\n
FastText
\n
GloVe
\n
Google’s Inception V3 Model
\n
GoogleNet
GOOGLENET: NETWORK IN NETWORK: After the VGG-16 show, Google gave birth to the GoogleNet (Inception-V1): the other champion of ILSVRC-2014 with higher accuracy value than its predecessors. Unlike the prior networks, GoogleNet has a little strange architecture. Firstly, the networks such as VGG-16 have convolution layers stacked one over the other but GoogleNet arranges the convolution and pooling layers in a parallel manner to extract features using different kernel sizes. The overall intention was to increase the depth of the network and to gain a higher performance level as compared to previous winners of the ImageNet classification challenge. Secondly, the network uses 1×1 convolution operation to control the size of the volume passed for further processing in each inception module. The inception module is the collection of convolution and pooling operation performed in a parallel manner so that features can be extracted using different scales. Thirdly, the number of parameters present in the network is 24 million which makes GoogleNet a less compute-intensive model as compared to AlexNet and VGG-16. Fourthly, the network uses a Global Average Pooling layer in place of fully-connected layers. Ultimately, GoogleNet had achieved the lowest top-5 error of 6.67% in ILSVRC-2014.
\n
Google’s AutoML:
\n
ImageNet
\n
Inceptionet
\n
MobileNetV2
\n
MNASNet
\n
Resnet
\n
ResNeXt
It uses a different identity mappings building block, which has several different paths of stacked identity layers, with their outputs merged via addition. ResNeXt introduces a new hyperparameter called “cardinality”, which defines how many paths exist in each block.
\n
RCNN (Region Based CNN)
\n
ShuffleNetV2
\n
SqueezeNet
\n
Universal Sentence Encode by Google
\n
Word2Vec
\
Wide ResNet
\n
Yolo V2 & V3:
\n
Compilation by Shailendra Kadre, HP Inc., Bangalore.
Connect with me on LinkedIn https://www.dhirubhai.net/in/shailendra-kadre-19b2719/
----
For more details on Transfer learning, pleasec refer to the following informative article by Dipanjan (DJ) Sarkar
\n
\n
END OF DOCUMENT
AI and Data enthusiast
3 年Thx for sharing your knowledge
Director - Digital Delivery
4 年Well researched article. Looks like a painstakingly long analysis and studies. Very informative and useful for the Machine Learning practitioners.