Network Morphism

Network Morphism

Training a deep network is very time-consuming.

Network morphism: morphs a parent network into a child network, allowing fast knowledge transferring.

This is a systematic study on how to morph a well-trained neural network to a new one so that its network function can be completely preserved. 

The child network is able to achieve the performance of the parent network immediately, and its performance shall continue to improve as the training process goes on.

The proposed scheme allows any network morphism in an expanding mode for arbitrary non-linear neurons, including depth, width, kernel size and subnet morphing operations. The proposed algorithms work for both classic multi-layer perceptron models and convolutional neural networks.

This is fundamentally different from existing work related to network knowledge transferring such as pre-training to facilitate the convergence or adapt to new datasets with possible total change in network function.

Mathematically, a morphism is a structure-preserving map from one mathematical structure to another.
No alt text provided for this image

Network morphism vs Net2Net

  • Net2Net’s discussion is limited to width and depth changes, while NetMorph studies a variety of morphing types, including depth, width, kernel size, and subnet changes.
  • Net2Net needs to separately consider depth and width changes, while NetMorph is able to simultaneously conduct depth, width, and kernel size morphing in a single operation.
  • NetMorph is the first to handle arbitrary non-linear activation functions. Moreover, NetMorph is the first to make it possible to embed non-identity layers.

General Network Morphism

No alt text provided for this image

It is obvious that network morphism for classic neural networks is equivalent to a matrix decomposition problem. G = F_{l+1} . F_{l}

No alt text provided for this image

Kernel Size Morphing

No alt text provided for this image

Subnet Morphing

Modern networks are going deeper and deeper. It is challenging to manually design tens of or even hundreds of layers. One elegant strategy is to first design a subnet template, and then construct the network by these subnets. (for example the inception layer of GoogLeNet). Sequential subnet morphing is to morph from a single layer to multiple sequential layers.

Experiments

Experiment 1: Using MNIST data set, the parent model achieved 92.29% accuracy, which is considered as the baseline. Then, this model is morphed into a multiple layer perception (MLP) model by adding a PReLU hidden layer with the number of hidden neurons h = 50.

No alt text provided for this image

NetMorph works much better than Net2Net. NetMorph continues to improve the performance from 92% to 97%, while Net2Net improvesonly to 94%.

Experiment 2: Using CIFAR10, The baseline network we adopted is the Caffe cifar10_quick model with an accuracy of 78.15%. Then the unified notation is used, for example; cifar_111 is used to represent cifar10_quick, which has three convolutional layers and two fully connected layers.

No alt text provided for this image

We can see the superiority of NetMorph over Net2Net

Examining the performance of NetMorph for subnet morphing as can be seen, NetMorph achieves additional performance improvement.

No alt text provided for this image

Note: The sharp drop and increase are caused by the changes of learning rates.

Experiment 3: Evaluate kernel size and width morphing for CIFAR10

No alt text provided for this image

Conclusions 

? Network morphism is able to morph a well-trained parent network to a new child network, with the network function completely preserved. 

? The proposed algorithms enable the morphing of any continuous non-linear activation neurons. 

? Extensive experiments have been carried out to demonstrate the effectiveness of the proposed network morphism scheme.

The child network has the potential to grow into a more powerful one in a short time. (This also applies for humans!) 


Bonus A more recent paper from the same authors: MODULARIZED MORPHING OF NEURAL NETWORKS

No alt text provided for this image
No alt text provided for this image


Best Regards

要查看或添加评论,请登录

Ibrahim Sobh - PhD的更多文章

社区洞察

其他会员也浏览了