Deepfake technology, fuelled by rapid advances in generative adversarial networks (GANs) and diffusion models, has introduced unprecedented challenges to digital media authenticity and public trust. This paper presents a comprehensive, systematic review of deep-learning-based deepfake detection techniques, synthesising findings from more than 100 peer-reviewed publications spanning 2018–2026. We propose a unified taxonomy that organises detection approaches across three axes: spatial, temporal, and multi-modal. Key architectures examined include Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM), Vision Transformers (ViTs), Generative Adversarial Networks repurposed for detection, diffusion-model fingerprinting, and ensemble hybrid frameworks. For each category we critically evaluate detection accuracy, generalisation ability, and computational overhead, drawing on standardised benchmarks such as FaceForensics++, Celeb-DF v2, and the DeepFake Detection Challenge (DFDC). We further discuss the persistent open problems — cross-dataset generalisation, adversarial robustness, explainability, and real-time deployment — and outline a multi-stakeholder mitigation ecosystem involving platform responsibility, community-driven verification, and legislative governance. This review is intended to serve as a reference for researchers and practitioners working to safeguard the integrity of digital media.
Introduction
Deepfakes are highly realistic synthetic media created using AI techniques such as GANs, autoencoders, and diffusion models, where a person’s face, voice, or behavior is digitally generated rather than recorded. Since their rapid advancement, deepfakes have become a major societal threat, being used for political misinformation, financial fraud, identity theft, and non-consensual content, raising serious concerns about trust in digital media.
While detection methods have developed alongside generation techniques, they remain at a disadvantage because creating deepfakes is easier than reliably detecting them. Early forensic approaches based on hand-crafted visual artifacts have become less effective against modern AI-generated content.
The paper provides a structured overview of deepfake detection, including a taxonomy of methods and a comparison of state-of-the-art models. Detection approaches are grouped into three categories:
Spatial methods, which analyze individual frames for visual artifacts using CNNs.
Temporal methods, which examine motion inconsistencies across video frames using RNNs, LSTMs, or Transformers.
Multi-modal methods, which combine visual, audio, and physiological signals (such as heartbeat patterns via rPPG) to improve detection robustness.
Deepfake generation techniques are explained through autoencoders, GANs (like StyleGAN), and diffusion models, with diffusion models being more stable and realistic but computationally heavier.
Modern detection architectures increasingly rely on deep learning models such as CNNs, Vision Transformers, and hybrid CNN-RNN systems, which achieve high accuracy (around 95–98% on benchmark datasets like Celeb-DF v2). However, no single method is fully reliable across all scenarios.
The paper also introduces a general detection pipeline involving preprocessing, feature extraction, multi-modal fusion, and final classification (real vs fake). Comparative studies show that newer hybrid and multi-modal models perform better and generalize more effectively across datasets like DFDC and Celeb-DF v2.
Conclusion
This paper has presented a systematic review of deepfake detection using deep learning, synthesising findings from over 100 peer-reviewed publications into a unified taxonomy organised around spatial, temporal, and multi-modal detection paradigms.
The evidence suggests several priorities for the next generation of research: (1) generalisation-first training strategies; (2) lightweight and explainable architectures deployable at platform scale; (3) richer, more diverse benchmark datasets; and (4) interdisciplinary collaboration among computer scientists, forensic practitioners, legal scholars, and policymakers.
References
[1] B. K. Panigrahi, S. P. Mishra, and C. K. Samal, “Deepfake detection using deep learning: A review,” Advances in Research, vol. 26, no. 4, pp. 555–564, 2025.
[2] V. Patel, S. R. Padiya, and K. Patel, “DeepFake detection through deep learning: A comprehensive review,” Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol., vol. 12, no. 1, pp. 103–108, 2026.
[3] R. Ramanaharan, D. B. Guruge, and J. I. Agbinya, “DeepFake video detection: Insights into model generalisation,” Data and Information Management, vol. 9, p. 100099, 2025.
[4] B. V. P. Kumar, M. D. S. Ahmed, and M. Sadanandam, “Designing a safe ecosystem to prevent deepfake-driven misinformation on elections,” Digital Society, vol. 3, p. 19, 2024.
[5] I. J. Goodfellow et al., “Generative adversarial networks,” Advances in Neural Information Processing Systems, vol. 27, 2014.
[6] J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” Advances in Neural Information Processing Systems, vol. 33, pp. 6840–6851, 2020.
[7] B. C. Soundarya and H. L. Gururaj, “A novel Dense-Swish-CNN with Bi-LSTM framework for image deepfake detection,” IEEE Access, vol. 13, pp. 89641–89653, 2025.
[8] A. L. Pellicer, Y. Li, and P. Angelov, “PUDD: Towards robust multi-modal prototype-based deepfake detection,” in Proc. CVPR, 2024, pp. 3809–3817.
[9] R. Lanzino et al., “Faster than lies: Real-time deepfake detection using binary neural networks,” in Proc. CVPR, 2024, pp. 3771–3780.
[10] U. A. Ciftci, I. Demir, and L. Yin, “How do the hearts of deep fakes beat?” in Proc. IEEE IJCB, 2020, pp. 1–10.
[11] T. T. Nguyen et al., “Deep learning for deepfake creation and detection: A survey,” Computer Vision and Image Understanding, vol. 223, p. 103525, 2022.
[12] A. Rössler et al., “FaceForensics++: Learning to detect manipulated facial images,” in Proc. ICCV, 2019, pp. 1–11.
[13] A. Heidari et al., “Deepfake detection using deep learning methods: A systematic and comprehensive review,” WIREs Data Mining and Knowledge Discovery, vol. 14, no. 2, e1520, 2024.
[14] S. Ahmed, Y. Chen, and Y. Liu, “DefakeHop++: A lightweight deepfake detection model,” IEEE Transactions on Multimedia, 2023.
[15] D. Güera and E. J. Delp, “Deepfake video detection using recurrent neural networks,” in Proc. 15th IEEE AVSS, 2018, pp. 1–6.
[16] Z. Ba et al., “Exposing the deception: Uncovering more forgery clues for deepfake detection,” in Proc. AAAI, vol. 38, no. 2, 2024, pp. 719–728.
[17] P. Sharma, M. Kumar, and H. K. Sharma, “GAN-CNN ensemble: A robust deepfake detection model of social media images,” Procedia Computer Science, vol. 235, pp. 948–960, 2024.