The rapid expansion of social media has escalated the dissemination of manipulated and fake images, threatening the integrity of digital content. Conventional detection methods often falter when confronted with advanced manipulation techniques. This research presents SahAI, an innovative fake image detection model leveraging a pre trained Vision Transformer (ViT) for effective binary classification of images as real or fake. By adapting the ViT architecture with a custom classifier, SahAI achieves high detection accuracy with minimal retraining. The model identifies tampered images and provides a confidence-based classification output. SahAI demonstrates exceptional performance, attaining a training accuracy of 99.12% and a test accuracy of 97.53%, positioning it as a robust tool for verifying social media content authenticity.
Introduction
The rise of AI-driven tools like deepfakes and face-swapping has increased the prevalence of digitally manipulated images, raising concerns about media authenticity, especially on social media. Traditional detection methods, mainly CNN-based, struggle to identify subtle, widespread manipulations due to limited global context understanding. To overcome this, the study introduces SahAI, a fake image detection model built on a pre-trained Vision Transformer (ViT), which uses self-attention to capture global image features more effectively.
SahAI’s architecture involves fine-tuning ViT-B/16 with a custom classifier for binary classification (real vs. fake). It was trained on a Kaggle dataset of 2041 images (roughly half real, half fake), using data augmentation and the Adam optimizer. After 5 epochs, SahAI achieved high accuracy: 99.12% on training data and 97.53% on test data, with strong precision and recall (~97.3%), indicating reliable performance.
The results demonstrate that Vision Transformers’ ability to capture long-range dependencies helps detect image manipulations better than CNNs, making SahAI an effective and efficient model for enhancing the trustworthiness of digital images on social media.
Conclusion
SahAI offers a powerful solution for detecting fake images in social media, leveraging the Vision Transformer’s global feature extraction capabilities to achieve exceptional accuracy (99.12% training, 97.53% test). Its simplicity and effectiveness make it a practical tool for content verification. Future enhancements could include:
- Adding localization capabilities to identify tampered regions using techniques like Grad-CAM.
- Integrating additional feature extractors, such as DenseNet, for hybrid modeling.
- Optimizing the model for real-time deployment on social media platforms.
This research lays a foundation for advancing digital forensics, ensuring the trustworthiness of online visual content.
References
[1] Dosovitskiy, A., et al. (2021). \"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.\" arXiv preprint arXiv:2010.11929.
[2] Krizhevsky, A., et al. (2012). \"ImageNet Classification with Deep Convolutional Neural Networks.\" Advances in Neural Information Processing Systems.
[3] Caiping Yan, Shuyuan Li, and Hong Li . TransU^2-Net: Hybrid transformer architecture for image splicing forgery detection. IEEE Access
[4] Vannhan Tran, Seong-Geun Kwon, Sukhwan Lee, Hoanh-su Le and Ki Ryong Kwon.Generalization Forgery Detection with of Meta Deepfake Model. IEEE Access
[5] Mashael Maashi, Hayam Alamro, Heba Mohsen, Noha Negm, Gouse Pasha Mohammed, Noura Abdelaziz, Sara Saadeldeen Ibrahim, and Mohammed Ibrahim Alsaid. Modeling of Reptile Search Algorithm with Deep learning Approach for Copy Move Image Forgery Detection. IEEE AccessSang In Lee, Jun Young Park, and IL Kyu Eom. CNN Based Copy-Move Forgery Detection using Rotation Invariant Wavelet Feature. IEEE Access
[6] Yi-Xiang Luo and Jiann-Liang Chen. Dual Attention Network approaches to Face Forgery Video Detection. IEEE Access.
[7] Abhishek Kashyap, Kapil Dev Tyagi Vaibhav Bhushan Tyagi. Robust and Optimized algorithm for Detection of Copy-Rotate-Move Tempering. IEEE Access.
[8] Huang, G., et al. \"Densely Connected Convolutional Networks (DenseNet).\"
[9] Kang Hyeon Rhee. Generation of Novelty Ground Truth Image using Image Classification and Segmantic Segmentation for Copy-Move Forgery Detection. IEEE Access.
[10] Perceptual Complementary Hashing Color with Wavelet Transform and Compressed Sensing for Reduced – Reference Image Quality Assessment.
[11] Xiaofei Li. Non-Relaxing Deep Hashing Method for Fast Image Retrivel. IEEE Access.
[12] Yichao Zhang, Xiangtao Zheng, and Xiaoqiang Lu. Remote Sensing Cross Model Retrieval by Deep Image Voice Hashing. IEEE Access.
[13] Hany M. Elgohary, Saad M. Darwish, and Saleh Mesbah Elkaffas. Improving Uncertain in chain of custody for investigation Access
[14] Y. Liu, C. Xia, X. Zhu, and S. Xu, “Two-stage copy move forgery detection with self deep matching and proposal superglue,” IEEE Trans. Image rocess
[15] Hany M. Elgohary, Saad M. Darwish, and Saleh Mesbah Elkaffas. Improving Uncertain in chain of custody for investigation Access.
[16] A Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai,T. Unterthiner, and N. Houlsby, “An Image is Worth 16×16 Words:Transformers for Image Recognition at Scale,” in Proc. Int. Conf. Learn.Representation.
[17] Y. Wei, J. Ma, Z. Wang, B. Xiao, and W. Zheng, “Image splicing forgery detection by combining synthetic adversarial networks and hybrid dense U-net based on multiple spaces,” Int. J. Intell. Syst.
[18] J. Li, N. Wang, L. Zhang, B. Du, and D. Tao, “Recurrent feature reasoning for image inpainting,” in Proc. IEEE/CVF Conf. Comput. Vis.Pattern Recognit.
[19] A. Novozamsky, B. Mahdian, and S. Saic, “IMD2020: A large scaleannotated dataset tailored for detecting manipulated images,” in Proc.IEEE/CVF Winter Conf. Appl. Comput. Vis. Workshops.
[20] C. Yang, H. Li, F. Lin, B. Jiang, and H. Zhao, “Constrained R-CNN: A general image manipulation detection model,” in Proc. IEEE Int Conf.multimedia expo.
[21] X. Guo, X. Liu, Z. Ren, S. Grosz, I. Masi and X. Liu, \"Hierarchical Fine Grained Image Forgery Detection and Localization,\" 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada [25] Y. Liu, B. Lv, X. Jin, X. Chen and X. Zhang, \"TBFormer: Two-Branch Transformer for Image Forgery Localization,\" in IEEE Signal Processing Letters.
[22] Abhishek Kashyap, Kapil Dev Tyagi Vaibhav Bhushan Tyagi. Robust and Optimized algorithm for Detection of Copy-Rotate-Move Tempering. IEEE Access.
[23] Kang Hyeon Rhee. Generation of Novelty Ground Truth Image using Image Classification and Segmantic Segmentation for Copy-Move Forgery Detection. IEEE Access.
[24] Perceptual Complementary Hashing Color with Wavelet Transform and Compressed Sensing for Reduced – Reference Image Quality Assessment.
[25] Xiaofei Li. Non-Relaxing Deep Hashing Method for Fast Image Retrivel. IEEE Access.
[26] [Yichao Zhang, Xiangtao Zheng, and Xiaoqiang Lu. Remote Sensing Cross Model Re