Rotation Invariance in Neural Nets
A recent CVPR paper "Strike (with) a Pose:" very elegantly and forcefully demonstrate the importance of invariance in deep learning models. Column 1 is the classification accuracy of a recent CNN based ImageNet classifier (Inception). One can see it does a great job. However, it is very bad with images that looks closely related to our eyes but are rotated. This is primarily due to lack of rotation invariance and other invariance in modern CNN. As one know from signal processing, convolution give us translate invariance. Scale invariance (think how far the object is from the camera) is partially handled by down-sampling.
The shocking fact reported by this study is how bad a very modern CNN is handling scale issues -
Q: "We found a change in rotation as small as 8.02? can cause an object to be misclassified. Along the spatial dimensions, a translation resulting in the object moving as few as 2 pixels horizontally or 4.5 px vertically also caused the DNN to misclassify. Lastly, along the z-axis, a change in “size” (i.e., the area of the object’s bounding box) of only 5.4% can cause an object to be misclassified."
Part of the issue is in the data-augmentation pipeline used by these models during training. Typically random cropping, mirror is used. Perhaps some do some randomized intensity sampling. Basically none of them tries to do any scaling or rotation. One might ask 'why not?". The simple sad fact is because they do not help if you all you want is to win ImageNet competition!!
Some conscientious researchers are trying to tackle these problems. However, they are a distinct minority:
- "Learning rotation invariance in deep hierarchies using circular symmetric filters" Kohli et al ICASSP
- "TI-POOLING: transformation-invariant pooling for feature learning in Convolutional Neural Network" Laptev et al arXiv: 1604.06318 2016
- "Oriented Response Networks" Zhou et al. CVPR 2017.
We should support them because they are fighting a lonely war.
Inventor, Technology Counsel, Sigma Male, Discerning
5 年If we had truly, perfectly explainable AI, this would be much more obvious and we could move on to different network model types that don't have these serious issues.
ML Architect| NLP | Deep Learning | Generative AI | Data Scientist | OCR | Computer Vision| RAG| LLM
5 年More courses need to come on data augmentation but it's not really sexy. It's a bit scary that a lot of these models could be deployed by people who are not aware that there could arise an issue like this. Thanks for sharing this :)
Quantum. Biochemistry. Applied Maths & HPC. Science as crypto-asset.
5 年See also failures in Quantum Chemistry around locally spherical grids, and assumed levels of rotational invariance massively violated:? https://cen.acs.org/physical-chemistry/computational-chemistry/Density-functional-theory-error-discovered/97/web/2019/07
It is not the guideline as such. It is because the rules (objective) of ImageNet is such that the score will not be improved by those augmentations. Which point out the artificial nature of these sort of competitions.? This also applies to most Kaggle.