Internal Covariate Shift  and Batch Normalization

Internal Covariate Shift and Batch Normalization

Internal Covariate Shift

Internal covariate shift [1,2,3] refers to the phenomenon where the distribution of inputs to a deep neural network changes as the network's weights are updated during training. This can result in slower convergence of the network and poorer performance on the training set, as well as generalization difficulties when the network is applied to new data.?

Training Issues due to the Internal Covariate Shift

Inappropriate handling of Internal covariate shift results in the following problems (including but not limited to):

  • Generalization Issues: Generalization in deep learning refers to the ability of a trained model to perform well on unseen data. A model that is able to generalize well can make accurate predictions on new data that it has never seen before, while a model that overfits the training data may perform poorly on new data.
  • Gradient-Flow-related issues: These include the problems related to (a)?vanishing gradient, (b) Exploding gradients, (c) Effective convergence of modes, (d) Overfitting problem, and (e) stable training, etc.
  • Learning-Rate-related issues: Slow learning/ converging rates during training are also major problems in this area.

Tutorials

In the following tutorials, I tried to explain the issues of Internal covariate shift in detail and also tried to explain, how Batch Normalization is helpful in solving such Problems.


Reference:

  1. Ioffe, S., & Szegedy, C. (2015, June). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In International conference on machine learning (pp. 448-456). pmlr.
  2. Awais, M., Iqbal, M. T. B., & Bae, S. H. (2020). Revisiting internal covariate shift for batch normalization. IEEE Transactions on Neural Networks and Learning Systems, 32(11), 5082-5092.
  3. Schneider, S., Rusak, E., Eck, L., Bringmann, O., Brendel, W., & Bethge, M. (2020). Improving robustness against common corruptions by covariate shift adaptation. Advances in Neural Information Processing Systems, 33, 11539-11551.

要查看或添加评论,请登录

Niraj Kumar, Ph.D.的更多文章

社区洞察

其他会员也浏览了