The rapid growth of deepfake technology has created serious challenges for digital trust and online security. Deepfakes are synthetic videos generated using advanced artificial intelligence techniques, especially Generative Adversarial Networks (GANs), which can realistically manipulate faces, expressions, and speech. Traditional detection systems mainly analyse individual video frames and often fail to identify inconsistencies that appear across time. This research proposes a hybrid deep learning model that combines Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) networks to analyse both spatial and temporal features of videos. EfficientNetB0 is used to extract visual features from each frame, while stacked LSTM layers learn motion patterns and sequential inconsistencies. The model achieved 88.83% accuracy, 100% recall, and a 91.95% F1-score. A Flask-based web application was also developed for practical deployment. Results show that combining spatial and temporal learning significantly improves deepfake detection performance compared to CNN-only approaches.
Introduction
The rapid advancement of artificial intelligence has enabled the creation of highly realistic deepfake videos, which pose significant risks such as misinformation, identity theft, and reputational damage. Traditional deepfake detection methods mainly analyze individual video frames, making them unable to capture temporal inconsistencies like unnatural blinking, lip-sync errors, and irregular facial movements. To address this limitation, this study proposes a hybrid CNN–LSTM deep learning framework that combines spatial feature extraction with temporal sequence analysis for more accurate deepfake detection. A user-friendly web application is also developed, allowing users to upload videos and receive authenticity predictions.
The proposed system processes uploaded videos through frame extraction, preprocessing, CNN-based spatial feature extraction using EfficientNetB0, and LSTM-based temporal modeling to learn motion patterns across video sequences. Frames are resized to 32×32 pixels, grouped into sequences of 15 frames, and classified as real or fake using stacked LSTM layers with dropout and regularization to reduce overfitting. The model is trained on real and fake videos collected from Kaggle and other sources using the Adam optimizer and binary cross-entropy loss.
Experimental results demonstrate that the hybrid CNN–LSTM model achieves 88.83% accuracy, 85.09% precision, 100% recall, 91.95% F1-score, and 90.09% ROC-AUC, significantly outperforming a CNN-only baseline, which achieved only 72% accuracy. The LSTM component effectively detects temporal artifacts such as frame flickering, lip-sync mismatches, irregular facial motion, and progressive visual anomalies that frame-based models cannot identify. Although challenges remain with high-quality GAN-generated videos, occlusions, and real-time processing, the proposed system provides a practical and reliable solution for deepfake detection. A Flask-based web application built with TensorFlow, OpenCV, and PostgreSQL further demonstrates the framework's applicability in real-world digital media authentication.
Conclusion
This study presents a hybrid CNN–LSTM model for deepfake detection that combines spatial and temporal learning. Experimental results demonstrate significant improvements over traditional frame-based approaches, achieving 88.83% accuracy and 100% recall.
Future work will focus on: audio–visual multimodal detection, transformer-based temporal modelling, real-time optimization, adversarial robustness, and continuous learning from user feedback.
References
[1] H. Cho Taliya et al., \"Review: Deepfake Detection Techniques using DNN,\" ICAST 2023, pp. 480–484, Doi: 10.1109/ICAST59062.2023.10454938.
[2] T. Jung, S. Kim, K. Kim, \"Deep Vision: Deepfakes Detection Using Human Eye Blinking Pattern,\" IEEE Access, vol. 8, pp. 83144–83154, 2020.
[3] Malik et al., \"Deepfake Detection for Human Face Images and Videos: A Survey,\" IEEE Access, vol. 10, pp. 18757–18775, 2022.
[4] S. Waseem et al., \"Deepfake on Face and Expression Swap: A Review,\" IEEE Access, vol. 11, pp. 117865–117906, 2023.
[5] M. S. Rana et al., \"Deepfake Detection: A Systematic Literature Review,\" IEEE Access, vol. 10, pp. 25494–25513, 2022.
[6] Kaushal et al., \"A Comparative Study on Deepfake Detection Algorithms,\" ICAC3N 2022, pp. 854–860, Doi: 10.1109/ICAC3N56670.2022.10074593.