Rotation Invariance in Neural Nets

A recent CVPR paper "Strike (with) a Pose:" very elegantly and forcefully demonstrate the importance of invariance in deep learning models. Column 1 is the classification accuracy of a recent CNN based ImageNet classifier (Inception). One can see it does a great job. However, it is very bad with images that looks closely related to our eyes but are rotated. This is primarily due to lack of rotation invariance and other invariance in modern CNN. As one know from signal processing, convolution give us translate invariance. Scale invariance (think how far the object is from the camera) is partially handled by down-sampling.

Google Inception-v3 classifer results

The shocking fact reported by this study is how bad a very modern CNN is handling scale issues -

No alt text provided for this image

Q: "We found a change in rotation as small as 8.02? can cause an object to be misclassified. Along the spatial dimensions, a translation resulting in the object moving as few as 2 pixels horizontally or 4.5 px vertically also caused the DNN to misclassify. Lastly, along the z-axis, a change in “size” (i.e., the area of the object’s bounding box) of only 5.4% can cause an object to be misclassified."

Part of the issue is in the data-augmentation pipeline used by these models during training. Typically random cropping, mirror is used. Perhaps some do some randomized intensity sampling. Basically none of them tries to do any scaling or rotation. One might ask 'why not?". The simple sad fact is because they do not help if you all you want is to win ImageNet competition!!

Some conscientious researchers are trying to tackle these problems. However, they are a distinct minority:

  • "Learning rotation invariance in deep hierarchies using circular symmetric filters" Kohli et al ICASSP
  • "TI-POOLING: transformation-invariant pooling for feature learning in Convolutional Neural Network" Laptev et al arXiv: 1604.06318 2016
  • "Oriented Response Networks" Zhou et al. CVPR 2017.

We should support them because they are fighting a lonely war.


Rob Markovic

Inventor, Technology Counsel, Sigma Male, Discerning

5 年

If we had truly, perfectly explainable AI, this would be much more obvious and we could move on to different network model types that don't have these serious issues.

回复
Jimmy Jose

ML Architect| NLP | Deep Learning | Generative AI | Data Scientist | OCR | Computer Vision| RAG| LLM

5 年

More courses need to come on data augmentation but it's not really sexy. It's a bit scary that a lot of these models could be deployed by people who are not aware that there could arise an issue like this. Thanks for sharing this :)

回复
Matt Challacombe

Quantum. Biochemistry. Applied Maths & HPC. Science as crypto-asset.

5 年

See also failures in Quantum Chemistry around locally spherical grids, and assumed levels of rotational invariance massively violated:? https://cen.acs.org/physical-chemistry/computational-chemistry/Density-functional-theory-error-discovered/97/web/2019/07

It is not the guideline as such. It is because the rules (objective) of ImageNet is such that the score will not be improved by those augmentations. Which point out the artificial nature of these sort of competitions.? This also applies to most Kaggle.

要查看或添加评论,请登录

Manny Ko的更多文章

  • Improved Enum for Python

    Improved Enum for Python

    Sample code to demonstrate how to use some of added methods in Enumbase to write a command line declaration that is…

    8 条评论
  • Claude Shannon

    Claude Shannon

    Well he only invented the whole 'entropy' thing. Started the field of Information Theory (Mutual-information…

    14 条评论
  • Accurate and fast PI in Python

    Accurate and fast PI in Python

    In my last article I show how to use numpy and numba to speed up a naive Monte Carlo method to compute PI. We managed…

    1 条评论
  • Numpy+numba for 20x in Python

    Numpy+numba for 20x in Python

    Method 1: Naive Monte-Carlo rejection sampling to compute PI - C/C++ style. I am using 10 million MC samples…

    24 条评论
  • 6th method for Pandas: swifter

    6th method for Pandas: swifter

    In my previous Pandas article I show 5 different ways to apply a heaverside() function to a small dataset of NY hotel…

    6 条评论
  • Pandas: 5 Very Different Performances, more than 2000x

    Pandas: 5 Very Different Performances, more than 2000x

    Our dataset consists of 1631 records for locations of hotels in New York. The above is the output from 5 different ways…

    12 条评论
  • Deep Image Prior

    Deep Image Prior

    We might be aware of recent amazing results of generative-networks being able to upsample a very low quality/low-res…

    1 条评论

社区洞察

其他会员也浏览了