Small input distortions can mislead deep learning models

Small input distortions can mislead deep learning models

Deep learning models indicate outstanding performance in many computer vision tasks such as image classification, segmentation and object detection. But are they robust enough against the minor distortions in the input? In other words, can deep learning models still show the same performance when the images are corrupted even a little? Several studies indicate that image distortions that do not affect the human visual system can mislead even the state-of-the-art deep learning models.

One form of input distortions that can mislead deep learning models is common corruptions which can easily occur in the real world due to camera/lens, light or object related problems, e.g., camera and focus shift, rapid objects, resolution decrease, changing lighting conditions. Gaussian noise, motion blur, defocus blur, contrast change, brightness change, fog and rotation change are some of the common corruptions which can cause degradation of the model performance [1, 2]. Considering that deep learning models are also used in the perception systems of autonomous vehicles, being vulnerable to such distortions can lead to vital consequences. A Tesla car in autopilot mode was involved in a fatal crash in 2016 since it failed to recognize a white truck against a bright spring sky [3].

Another type of distortion is adding crafted perturbations to the input image in order to confuse deep neural networks. Such corrupted images are called adversarial examples and obtained by applying adversarial attacks to the input image. Adversarial attack algorithms try to fool a classifier by finding the smallest additive distortions in RGB space [4]. Either a malicious attack or accidental data corruption may cause deep learning systems to encounter adversarial perturbations. A group of hackers at McAfee Advanced Threat Research managed to make the Tesla car detect the speed limit sign of 35 mph as 85 mph by making minor changes to the speed limit sign [5]. Szegedy et al. highlight that deep learning models are vulnerable to adversarial examples, i.e. the performance of a deep learning model may drop significantly when a small perturbation is added to the input image [6]. Adversarial perturbations are usually not noticeable to the human eye.

There are several types of adversarial attacks. White-box attacks require access to model architecture and parameters, while black-box attacks do not require any information about the targeted model. Targeted attacks aim the target model to classify the input as a particular class rather than the actual class. Untargeted attacks are only intended to deceive the model, they do not have a target class. Some of the common adversarial attack algorithms are summarized in the following sentences. Fast Gradient Sign Method (FGSM) [7] is an adversarial attack calculating gradients which maximize the loss and adding small perturbations to the input according to the sign of the gradient. DeepFool [8] is another white-box adversarial attack which calculates decision boundaries and adds minimum possible perturbation to the input to push it beyond the boundary. Basic Iterative Method (BIM) [9] bases on the same idea with FGSM. While FGSM has only one step, BIM works iteratively by calculating the direction of the gradient at each step. Carlini-Wagner [10] finds a small change which is sufficient to alter image classification results by addressing the problem as an optimization instance.

Creating deep learning models that are robust against such input distortions is critical for making intelligent systems safe and reliable.

REFERENCES

[1] Hendrycks, D., & Dietterich, T. (2019). Benchmarking neural network robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261.

[2] Michaelis, C., Mitzkus, B., Geirhos, R., Rusak, E., Bringmann, O., Ecker, A. S., ... & Brendel, W. (2019). Benchmarking robustness in object detection: Autonomous driving when winter is coming. arXiv preprint arXiv:1907.07484.

[3] Yadron, D., & Tynan, D. (2016, June 30). Tesla driver dies in first fatal crash while using Autopilot Mode. The Guardian. Retrieved June 8, 2022, from https://www.theguardian.com/technology/2016/jun/30/tesla-autopilot-death-self-driving-car-elon-musk

[4] Carlini, N., Katz, G., Barrett, C., & Dill, D. L. (2017). Ground-Truth Adversarial Examples. arXiv preprint arXiv:1709.10207.

[5] Lambert, F. (2020, February 19). Tesla autopilot gets tricked into accelerating from 35 to 85 MPH with modified speed limit sign. Electrek. Retrieved June 8, 2022, from https://electrek.co/2020/02/19/tesla-autopilot-tricked-accelerate-speed-limit-sign/

[6] Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2013). Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199.

[7] Goodfellow, I. J., Shlens, J., & Szegedy, C. (2014). Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572.

[8] Moosavi-Dezfooli, S. M., Fawzi, A., & Frossard, P. (2016). Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2574-2582).

[9] Kurakin, A., Goodfellow, I., & Bengio, S. (2016). Adversarial examples in the physical world.

[10] Carlini, N., & Wagner, D. (2017, May). Towards evaluating the robustness of neural networks. In 2017 ieee symposium on security and privacy (sp) (pp. 39-57). IEEE.

要查看或添加评论,请登录

testint.ai的更多文章

社区洞察

其他会员也浏览了