登录查看更多内容

Network Morphism

Ibrahim Sobh - PhD

?? Senior Expert of Artificial Intelligence, Valeo Group | LinkedIn Top Voice | Machine Learning | Deep Learning | Data Science | Computer Vision | NLP | Developer | Researcher | Lecturer

发布日期: 2019年10月19日

+ 关注

Training a deep network is very time-consuming.

Network morphism: morphs a parent network into a child network, allowing fast knowledge transferring.

This is a systematic study on how to morph a well-trained neural network to a new one so that its network function can be completely preserved.

The child network is able to achieve the performance of the parent network immediately, and its performance shall continue to improve as the training process goes on.

The proposed scheme allows any network morphism in an expanding mode for arbitrary non-linear neurons, including depth, width, kernel size and subnet morphing operations. The proposed algorithms work for both classic multi-layer perceptron models and convolutional neural networks.

This is fundamentally different from existing work related to network knowledge transferring such as pre-training to facilitate the convergence or adapt to new datasets with possible total change in network function.

Mathematically, a morphism is a structure-preserving map from one mathematical structure to another.

Network morphism vs Net2Net

Net2Net’s discussion is limited to width and depth changes, while NetMorph studies a variety of morphing types, including depth, width, kernel size, and subnet changes.
Net2Net needs to separately consider depth and width changes, while NetMorph is able to simultaneously conduct depth, width, and kernel size morphing in a single operation.
NetMorph is the first to handle arbitrary non-linear activation functions. Moreover, NetMorph is the first to make it possible to embed non-identity layers.

General Network Morphism

It is obvious that network morphism for classic neural networks is equivalent to a matrix decomposition problem. G = F_{l+1} . F_{l}

Kernel Size Morphing

Subnet Morphing

Modern networks are going deeper and deeper. It is challenging to manually design tens of or even hundreds of layers. One elegant strategy is to first design a subnet template, and then construct the network by these subnets. (for example the inception layer of GoogLeNet). Sequential subnet morphing is to morph from a single layer to multiple sequential layers.

Experiments

Experiment 1: Using MNIST data set, the parent model achieved 92.29% accuracy, which is considered as the baseline. Then, this model is morphed into a multiple layer perception (MLP) model by adding a PReLU hidden layer with the number of hidden neurons h = 50.

NetMorph works much better than Net2Net. NetMorph continues to improve the performance from 92% to 97%, while Net2Net improvesonly to 94%.

Experiment 2: Using CIFAR10, The baseline network we adopted is the Caffe cifar10_quick model with an accuracy of 78.15%. Then the unified notation is used, for example; cifar_111 is used to represent cifar10_quick, which has three convolutional layers and two fully connected layers.

We can see the superiority of NetMorph over Net2Net

Examining the performance of NetMorph for subnet morphing as can be seen, NetMorph achieves additional performance improvement.

Note: The sharp drop and increase are caused by the changes of learning rates.

Experiment 3: Evaluate kernel size and width morphing for CIFAR10

Conclusions

? Network morphism is able to morph a well-trained parent network to a new child network, with the network function completely preserved.

? The proposed algorithms enable the morphing of any continuous non-linear activation neurons.

? Extensive experiments have been carried out to demonstrate the effectiveness of the proposed network morphism scheme.

The child network has the potential to grow into a more powerful one in a short time. (This also applies for humans!)

Bonus A more recent paper from the same authors: MODULARIZED MORPHING OF NEURAL NETWORKS

Best Regards

要查看或添加评论，请登录

Ibrahim Sobh - PhD的更多文章

The Evolution and Applications of Attention Mechanisms in Deep Learning: A Comprehensive Survey

2025年3月1日

The Evolution and Applications of Attention Mechanisms in Deep Learning: A Comprehensive Survey

Article created by Perplexity Deep Research. Prompt: "You are a deep-learning experienced researcher.

1 条评论
The Judicial Cognitive Process: From Case Inception to Judgment and the Promise of AI Augmentation

2025年3月1日

The Judicial Cognitive Process: From Case Inception to Judgment and the Promise of AI Augmentation

Research Report Created by Perplexity Deep Research My Research Question : "Now I want to dig deeper in the human judge…

3 条评论
How to Learn Artificial Intelligence: A Beginner’s Guide

2024年5月31日

How to Learn Artificial Intelligence: A Beginner’s Guide

Artificial Intelligence (AI) is a fascinating field that simulates human intelligence and task performance using…
[????????????] ?????????????????? ???????????? explained with code ??

2023年1月28日

[????????????] ?????????????????? ???????????? explained with code ??

"During the last two years there has been a plethora of large generative models such as ChatGPT or Stable Diffusion…

2 条评论
A conversation with ChatGPT about AI, study roadmap, applications, interview questions with answers, salaries, and more!

2023年1月21日

A conversation with ChatGPT about AI, study roadmap, applications, interview questions with answers, salaries, and more!

Hello everyone, and thank you all for being here today! Let me introduce our new star, the ChatGPT, who will discuss…
10 Object detectors with code [YOLOF, YOLOX, DETR, Deformable DETR, SparseR-CNN, VarifocalNet, PAA, SABL, ATSS, Double Heads]

2022年2月17日

10 Object detectors with code [YOLOF, YOLOX, DETR, Deformable DETR, SparseR-CNN, VarifocalNet, PAA, SABL, ATSS, Double Heads]

In this article, 10 well-known pre-trained object detectors are loaded and used in a standard and easy way. YOLOF: You…

6 条评论
FNet: Do we need the attention layer at all? [Explained with code]

2021年10月30日

FNet: Do we need the attention layer at all? [Explained with code]

FNet: Mixing Tokens with Fourier Transforms "In this work, we investigate whether simpler token mixing mechanisms can…
Patches Are All You Need! [with code]

2021年10月28日

Patches Are All You Need! [with code]

"It is only a matter of time before Transformers become the dominant architecture for vision domains, just as they have…
MLP is all you need! [with code]

2021年10月23日

MLP is all you need! [with code]

From Google: MLP-Mixer: An all-MLP Architecture for Vision Main idea: "While convolutions and attention are both…

2 条评论
9 Steps for solving any machine learning problem

2021年8月28日

9 Steps for solving any machine learning problem

In this article, we will present a universal blueprint that we can use to attack and solve any machine-learning…

2 条评论

See all articles

Network Morphism

Ibrahim Sobh - PhD

?? Senior Expert of Artificial Intelligence, Valeo Group | LinkedIn Top Voice | Machine Learning | Deep Learning | Data Science | Computer Vision | NLP | Developer | Researcher | Lecturer

General Network Morphism

Kernel Size Morphing

Subnet Morphing

Experiments

Conclusions

Ibrahim Sobh - PhD的更多文章

社区洞察

其他会员也浏览了

BxD Primer Series: Convolutional Neural Networks

Convolutional Neural Networks: A Comprehensive Guide Exploring the power of CNNs in image analysis

Softmax: A Comprehensive Guide

Convolutional Neural Networks (CNNs)

BxD Primer Series: Hopfield Neural Networks

Neural Network architectures that no one is talking about !

Decoding the CNN Architecture: Unveiling the Power and Precision of Convolutional Neural Networks - Part ⅠⅠ

Hyperparameter Tuning of Neural Networks: From Dirichlet Lens

Harnessing Convolutional Neural Networks for Damage Detection in the Built Environment

Can Neural Networks Save Us?

General Network Morphism

Kernel Size Morphing

Subnet Morphing

Experiments

Conclusions

Ibrahim Sobh - PhD的更多文章

The Evolution and Applications of Attention Mechanisms in Deep Learning: A Comprehensive Survey

The Judicial Cognitive Process: From Case Inception to Judgment and the Promise of AI Augmentation

How to Learn Artificial Intelligence: A Beginner’s Guide

[????????????] ?????????????????? ???????????? explained with code ??

A conversation with ChatGPT about AI, study roadmap, applications, interview questions with answers, salaries, and more!

10 Object detectors with code [YOLOF, YOLOX, DETR, Deformable DETR, SparseR-CNN, VarifocalNet, PAA, SABL, ATSS, Double Heads]

FNet: Do we need the attention layer at all? [Explained with code]

Patches Are All You Need! [with code]

MLP is all you need! [with code]

9 Steps for solving any machine learning problem

社区洞察

其他会员也浏览了

BxD Primer Series: Convolutional Neural Networks

Convolutional Neural Networks: A Comprehensive Guide Exploring the power of CNNs in image analysis

Softmax: A Comprehensive Guide

Convolutional Neural Networks (CNNs)

BxD Primer Series: Hopfield Neural Networks

Neural Network architectures that no one is talking about !

Decoding the CNN Architecture: Unveiling the Power and Precision of Convolutional Neural Networks - Part ⅠⅠ

Hyperparameter Tuning of Neural Networks: From Dirichlet Lens

Harnessing Convolutional Neural Networks for Damage Detection in the Built Environment

Can Neural Networks Save Us?