The rapid advancement of artificial intelligence and deep learning technologies has enabled the creation of highly realistic manipulated videos, commonly known as deepfakes. These AI-generated fake videos are increasingly being used for misinformation, political manipulation, identity fraud, cyber harassment, and social engineering attacks. Detecting such manipulated content has therefore become a major challenge in digital media security. This paper presents a real-time deepfake video detection system using deep learning techniques. The proposed method combines a pre-trained ResNext Convolutional Neural Network (CNN) for frame-level feature extraction with a Long Short-Term Memory (LSTM) network for temporal sequence analysis. The system is trained on a balanced dataset created from FaceForensics++, Deepfake Detection Challenge (DFDC), and Celeb-DF datasets. During preprocessing, videos are split into frames, facial regions are extracted, and sequential frame analysis is performed. The proposed model effectively identifies manipulated videos by learning spatial and temporal inconsistencies present in deepfake videos. Experimental results demonstrate that the system achieves high detection accuracy and performs effectively in real-time scenarios. A web-based interface is also developed using Django to allow users to upload videos and obtain predictions with confidence scores.
Introduction
The text describes a research project on detecting deepfake videos using deep learning. Deepfakes are highly realistic AI-generated or manipulated videos created using models like GANs and tools such as FaceSwap and DeepFaceLab, which can alter faces, expressions, or identities. While these technologies have useful applications in entertainment and education, they are widely misused for misinformation, fraud, blackmail, and other cybercrimes.
Because deepfakes are difficult to distinguish from real videos, the project proposes an automated detection system. It uses a hybrid deep learning approach combining a ResNeXt CNN for extracting spatial facial features from video frames and an LSTM network for learning temporal inconsistencies across frames. The system classifies videos as real or fake and is deployed as a web application for real-time use.
The literature review highlights key datasets (FaceForensics++, DFDC, and Celeb-DF) and prior methods for detecting artifacts or facial inconsistencies, noting their limitations in handling advanced or high-quality deepfakes.
Conclusion
This paper presented a deep learning-based framework for detecting manipulated facial videos using a hybrid ResNeXt and Long Short-Term Memory (LSTM) architecture. The proposed system combines spatial feature extraction and temporal sequence analysis to effectively identify deepfake videos.The preprocessing pipeline involving frame extraction, face detection, cropping, and normalization improved the quality of input data and enhanced model performance. The ResNeXt CNN successfully extracted manipulation artifacts from facial frames, while the LSTM network captured temporal inconsistencies across sequential video frames.The model was trained using benchmark datasets including FaceForensics++, Deepfake Detection Challenge (DFDC), and Celeb-DF. Experimental evaluation demonstrated that the proposed framework achieved high accuracy, precision, recall, and F1-score while maintaining stable real-time prediction capability.
The integration of the trained model into a web-based application further improved usability and practical deployment capability. The proposed framework contributes toward preserving trust in digital media and reducing the harmful impact of manipulated content across online platforms.As deepfake generation technologies continue to evolve, reliable detection systems are essential for maintaining authenticity and preventing misinformation. The proposed research provides an effective and scalable approach for real-world deepfake detection applications.
References
[1] Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “FaceForensics++: Learning to Detect Manipulated Facial Images,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2019.
[2] B. Dolhansky, R. Howes, B. Pflaum, N. Baram, and C. Canton-Ferrer, “The Deepfake Detection Challenge Dataset,” arXiv preprint arXiv:2006.07397, 2020.
[3] Y. Li, X. Yang, P. Sun, H. Qi, and S. Lyu, “Celeb-DF: A Large-Scale Challenging Dataset for DeepFake Forensics,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[4] Y. Li and S. Lyu, “Exposing DeepFake Videos by Detecting Face Warping Artifacts,” in IEEE Workshop on Information Forensics and Security (WIFS), 2018.
[5] H. H. Nguyen, J. Yamagishi, and I. Echizen, “Capsule-Forensics: Using Capsule Networks to Detect Forged Images and Videos,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.
[6] S. Sabir, J. Cheng, A. Jaiswal, W. AbdAlmageed, I. Masi, and P. Natarajan, “Recurrent Convolutional Strategies for Face Manipulation Detection in Videos,” Interfaces, vol. 8, pp. 80–87, 2019.
[7] K. He, X. Zhang, S. Ren, and J. Sun, “Deep Residual Learning for Image Recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[8] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[9] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv preprint arXiv:1412.6980, 2014.