Deep neural networks, despite their remarkable performance on image classification tasks, remain critically vulnerable to adversarial examples — imperceptibly perturbed inputs engineered to induce misclassification. In this paper, we propose and evaluate a dual-layer defense framework against adversarial attacks on image classifiers trained on the ImageNet- 1K Mini dataset. Our approach combines a convolutional autoencoder as a preprocessing denoising step with a ResNet-50 classifier augmented by a novel block-switching mechanism to disrupt adversarial gradient signal. We evaluate the system under Fast Gradient Sign Method (FGSM) and Projected Gradient Descent (PGD) attacks across a range of perturbation budgets ? ? {0.01, 0.02, 0.03, 0.05, 0.07, 0.10}. Our defense recovers +23.08% accuracy against FGSM and +54.44% against PGD on in-distribution data, and achieves 100% full recovery on an out- of-distribution generalisation test (83.3% baseline). We further employ Gradient-weighted Class Activation Mapping (GRAD- CAM) to visually explain the disruption of model attention under attack and its restoration after defense. All experiments are conducted on the Kaggle T4 GPU platform, and the full pipeline is deployed as an interactive Streamlit web application.
Introduction
The text explains that while deep learning models have achieved very high performance in visual recognition tasks, they are vulnerable to adversarial examples—small, imperceptible changes to input images that can mislead models into making incorrect predictions. This poses serious risks in critical applications like autonomous driving, medical diagnosis, and security systems.
The paper highlights that existing defenses against such attacks are often ineffective, reduce model accuracy, or fail to generalize. To address this, it proposes a robust and interpretable defense pipeline. The approach includes using a denoising autoencoder to remove adversarial noise, a modified ResNet-50 with block-switching to disrupt attack gradients, and evaluation against strong attacks like FGSM and PGD. It also incorporates visualization techniques (GRAD-CAM) to understand how attacks affect model attention and how defenses restore it.
The study reviews prior work on adversarial attacks and defenses, noting methods like FGSM, PGD, adversarial training, and preprocessing techniques. It also emphasizes the importance of explainability in understanding model behavior under attack.
For experimentation, the system is trained and tested on the ImageNet-1K Mini dataset, which contains diverse categories but shows class imbalance. Data preprocessing confirms that standard normalization is suitable, though the dataset has higher variability.
Overall, the paper presents a comprehensive and effective strategy to improve the robustness, interpretability, and generalization of deep learning models against adversarial attacks.
Conclusion
We have presented a dual-layer defense framework against adversarial attacks on deep image classifiers, combining a convolutional autoencoder for perturbation suppression with a block-switching ResNet-50. Our system demonstrates strong recovery rates — +23.08% for FGSM and +54.44% for PGD on in-distribution data, and full 100% recovery on an out-of- distribution generalisation test. GRAD-CAM analysis provides interpretable evidence of both the attack’s disruption mecha- nism and the defense’s restoration effect. The full pipeline is deployed as an interactive Streamlit application for real-time demonstration, and all experiments are reproducible via public Kaggle notebooks [16]–[18].
References
[1] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus, “Intriguing properties of neural networks,” in Proc. Int. Conf. Learn. Representations (ICLR), 2014.
[2] I. J. Goodfellow, J. Shlens, and C. Szegedy, “Explaining and harnessing adversarial examples,” in Proc. Int. Conf. Learn. Representations (ICLR), 2015.
[3] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu, “Towards deep learning models resistant to adversarial attacks,” in Proc. Int. Conf. Learn. Representations (ICLR), 2018.
[4] N. Carlini and D. Wagner, “Towards evaluating the robustness of neural networks,” in Proc. IEEE Symp. Security and Privacy (S&P), 2017,
pp. 39–57.
[5] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2016, pp. 770–778.
[6] O. Russakovsky et al., “ImageNet large scale visual recognition chal- lenge,” Int. J. Comput. Vis., vol. 115, no. 3, pp. 211–252, 2015.
[7] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and Batra, “Grad-CAM: Visual explanations from deep networks via gradient-based localization,” in Proc. IEEE Int. Conf. Computer Vision (ICCV), 2017, pp. 618–626.
[8] D. Meng and H. Chen, “MagNet: A two-pronged defense against adver- sarial examples,” in Proc. ACM Conf. Computer and Communications Security (CCS), 2017, pp. 135–147.
[9] P. Samangouei, M. Kabkab, and R. Chellappa, “Defense-GAN: Protect- ing classifiers against adversarial attacks using generative models,” in Proc. Int. Conf. Learn. Representations (ICLR), 2018.
[10] C. Guo, M. Rana, M. Cisse, and L. van der Maaten, “Countering adversarial images using input transformations,” in Proc. Int. Conf. Learn. Representations (ICLR), 2018.
[11] G. S. Dhillon, K. Azizzadenesheli, Z. C. Lipton, J. Bernstein, J. Kossaifi, Khanna, and A. Anandkumar, “Stochastic activation pruning for robust adversarial defense,” in Proc. Int. Conf. Learn. Representations (ICLR), 2018.
[12] A. Athalye, N. Carlini, and D. Wagner, “Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples,” in Proc. Int. Conf. Machine Learning (ICML), 2018, pp. 274–283.
[13] Y. Luo, X. Boix, G. Roig, T. Poggio, and Q. Zhao, “Foveation- based mechanisms alleviate adversarial examples,” arXiv preprint arXiv:1511.06292, 2015.
[14] H. Zhang and J. Wang, “Defense against adversarial attacks using feature scattering-based adversarial training,” in Proc. Adv. Neural Inf. Process. Syst. (NeurIPS), 2019.
[15] I. Figotin, “ImageNet Mini 1000,” Kaggle Dataset, 2021. [Online].
Available: https://www.kaggle.com/datasets/ifigotin/imagenetmini-1000
[16] Adwaith R, “Adversarial Attack and Defense on ML Models — Note- book 1,” Kaggle, 2024. [Online]. Available: https://www.kaggle.com/ code/radwaith/adversarial-attack-and-defense-on-ml-models
[17] Adwaith R, “Adversarial Attack and Defense on ML Models — Note- book 2,” Kaggle, 2024. [Online]. Available: https://www.kaggle.com/ code/radwaith/adversarial-attack-and-defense-on-ml-models-nb2
[18] Adwaith R, “Generalization Test on Hand-Picked Data,” Kag- gle, 2024. [Online]. Available: https://www.kaggle.com/code/radwaith/ generalization-test-on-hand-picked-data