Why do we need normalization of images, before feeding them into Deep Neural Networks? Why can't we just do end-to-end learning?
Neil Pradhan
Full Stack Data & AI Engineer | Master's degree from KTH with specialization in Machine Learning
In general, normalization speeds up the learning process. But as we know, there is no free lunch, and everything comes at a cost. In some cases, normalization may lead to the loss of important information which can affect the training process (if for instance, you need to have the brightness information in your classification problem). Here, I have tried to summarize information I found regarding the normalization process including papers that avoid normalization but claim similar performance,?and paper that gives guidelines on using normalization technique based on input image noise, also a paper I found that analyses the significance of normalizing withing layers in Deep Neural Network.
There are different types of Normalization.
When the input is normalized:
- We have the image pixels between [0,1] or [-1,1]
Intuitive reasoning: It helps keep a single global learning rate for the batch as back backpropagation may be affected by different ranges of values (some higher and lower if not normalized) in gradient calculation, thus reducing learning time (fewer epochs).
When Normalization happens within the hidden layers:
- Batch Normalization solves the vanishing/exploding gradient problem (can’t be used if the batch size is smaller) (Ioffe & Szegedy, 2015)
- Layer Normalization helps stabilize neurons in recurrent neural network
- Instance Normalization and group Normalization are used for style transfer and CNN respectively
When Normalization happens on the weights of the network:
领英推è
- (Salimans & Kingma, 2016)
(Yu & Spiliopoulos, 2022) claims that normalization within hidden layers affects more on the outer layer more than the inner layers, with regards to learning speed and test accuracy, whereas (Shao et al., 2020) have found a way to avoid normalization same time claiming equivalent performance.
(Kocio?ek et al., 2020) has done experiments in which they show, how different types of noise should be treated with different Normalization techniques and gives general guidance on when to use which normalization technique. Still looking to learn more in this area, feel free to add comments.
References:
Ioffe, S., & Szegedy, C. (2015). Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift (arXiv:1502.03167). arXiv. https://doi.org/10.48550/arXiv.1502.03167
Kocio?ek, M., Strzelecki, M., & Obuchowicz, R. (2020). Does image normalization and intensity resolution impact texture classification? Computerized Medical Imaging and Graphics, 81, 101716. https://doi.org/10.1016/j.compmedimag.2020.101716
Salimans, T., & Kingma, D. P. (2016). Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks (arXiv:1602.07868). arXiv. https://doi.org/10.48550/arXiv.1602.07868
Shao, J., Hu, K., Wang, C., Xue, X., & Raj, B. (2020). Is normalization indispensable for training deep neural networks? Advances in Neural Information Processing Systems, 33, 13434–13444. https://papers.nips.cc/paper/2020/hash/9b8619251a19057cff70779273e95aa6-Abstract.html
Yu, J., & Spiliopoulos, K. (2022). Normalization effects on deep neural networks (arXiv:2209.01018). arXiv. https://doi.org/10.48550/arXiv.2209.01018