This report explores adversarial attacks targeting deep image classification models, specifically ResNet-34 and DenseNet-121 trained on the ImageNet-1K dataset. We implement and assess the effectiveness of Fast Gradient Sign Method (FGSM), Projected Gradient Descent (PGD), and localized patch-based attacks, all constrained by strict and spatial limitations. Our results reveal significant declines in both top-1 and top-5 accuracy, and emphasize the cross-architecture transferability of adversarial inputs. We detail our approach, share findings and insights.
Introduction
This study investigates the susceptibility of a ResNet-34 model (trained on ImageNet-1K) to adversarial attacks, aiming to degrade accuracy without visible changes to images. The transferability of these attacks to a DenseNet-121 model is also examined.
Iterative attacks (PGD) are slightly more effective.
Patch attacks are powerful despite limited scope.
Adversarial examples transfer well across architectures (ResNet ↔ DenseNet), posing real-world risks.
Mitigation
Defense strategies include:
Adversarial training
Input transformations
Robust architecture design
However, these approaches often struggle against adaptive attacks, making robust defense an ongoing research challenge.
Conclusion
The vulnerability of deep image classification models to adversarial attacks have been systematically evaluated. Using ResNet-34 and DenseNet-121 trained on a subset of ImageNet-1K, we implemented and analyzed FGSM, PGD, and localized patch attacks under strict perturbation constraints. Our results demonstrated that even small perturbations can significantly degrade model performance, with adversarial examples transferring effectively across architectures. While methods such as adversarial training offer partial defences, the persistence of these vulnerabilities highlights the need for continued research into more robust and resilient learning systems.
References
[1] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and Harnessing Adversarial Examples. arXiv preprint arXiv:1412.6572, 2014.
[2] Aleksander Madry, Aleksandar Makelov, Ludwig Schmidt, Dimitris Tsipras, and Adrian Vladu. Towards Deep Learning Models Resistant to Adversarial Attacks. International Conference on Learning Representations (ICLR), 2018
[3] Nicolas Papernot, Patrick McDaniel, and Ian Goodfellow. Transferability in Machine Learning: from Phenomena to Black-Box Attacks using Adversarial Samples. arXiv preprint arXiv:1605.07277, 2016
[4] Tom B. Brown, D. Mane, A. Roy, M. Abadi, and J. Gilmer. Adversarial Patch. arXiv preprint arXiv:1712.09665, 2017.
[5] Naveed Akhtar and Ajmal Mian. Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey. IEEE Access, 6:14410–14430, 2018.