Face Morph Attack Detection Using LSTM-CNN Hybrid Model

Authors: K Rohith, K Nagarjuna, B Venu Yadav, Mr. G. Rama Chandra Kumar

DOI Link: https://doi.org/10.22214/ijraset.2025.71126

Abstract

Deepfake content distribution raises serious challenges to online safety and trust, especially on social media and news websites. This paper suggests a robust deep learning system specifically for precise detection of deepfake videos. The system relies on a new architecture that combines two approaches: EfficientNetB2 for spatial feature analysis of faces in videos and LSTM-CNN layers for analysis of temporal differences in the features. Each video frame is analyzed extensively to ascertain if it is real or fake. In designing the system, we used a balanced dataset of videos clearly labeled as real or fake. To counter instances of possible data imbalance, we used expert techniques like class weighting and performance improvement. Further, we improved the system\'s ability to detect deepfakes by fine-tuning certain threshold parameters. We also designed a user-friendly interface for the system that is easy to operate, allowing users to upload their videos and get real-time results without the need for technical expertise. This ease of use makes the tool accessible to everyone interested in ascertaining the authenticity of deepfakes. The use of state-of-the-art technologies in feature extraction, sequence modeling, and the easy-to-use interface makes this system a reliable tool for deepfake detection. Tests show that the system is extremely accurate and performs well when dealing with different types of videos. Its reliability and robustness make it suitable for use in real-world applications in digital forensic analysis and media authenticity verification.

Introduction

Overview

Digital face manipulation—especially face morphing (merging multiple facial images)—poses a growing threat to biometric systems, such as those used for passport issuance, access control, and digital identity verification. These morphs can deceive facial recognition algorithms and enable identity fraud.

This study introduces a deep learning-powered system that detects face morphing attacks in both images and videos using a hybrid CNN-LSTM architecture, presented through a user-friendly web interface for practical and real-time use.

Key Contributions

Detection Model: Hybrid deep learning model combining:
- EfficientNetB2 for spatial feature extraction
- LSTM for capturing temporal/sequential features in videos
Real-Time Deployment: A web interface lets users upload facial images or videos for instant morph detection.
Security Application: Enhances security in biometrics, border control, law enforcement, and forensics by converting complex model predictions into actionable insights.

Literature Review Insights

Previous efforts have explored various techniques:

CNN models detect morphs in static images but lack temporal context.
SVMs and facial landmark methods offer some accuracy but lack adaptability.
Hybrid models (CNN + LSTM) show improved results but are often computationally heavy or not deployed on usable platforms.
Mobile/Android apps exist but use simplistic rule-based classifiers and lack robustness.

The proposed work fills the gap by integrating deep feature learning, temporal processing, and web deployment into a complete and scalable system.

Methodology

A. Dataset Collection

Datasets: FaceForensics++, FRLL, MorGAN
Types: Real and morphed images, facial video sequences
Tools: Open-source morphing software, facial metadata

B. Preprocessing

Face detection (MTCNN/dlib)
Frame extraction from videos (25 frames/video)
Normalization and resizing (e.g., 224×224)
Dataset split: 70% training, 15% validation, 15% test

C. Feature Extraction

EfficientNetB2 used for high-dimensional embedding
Sequential data prepared as vectors of shape (25, 1408) for video analysis

D. Model Architecture

Image-Based Detection: CNN + EfficientNetB2 + Dense layers
Video-Based Detection: CNN-LSTM Hybrid
Best model: EfficientNetB2 + LSTM based on F1-score, precision, recall, and AUC

E. Real-Time Inference

Extracts frames, performs real-time analysis, and applies classification thresholds for improved accuracy

F. Web Interface

Built with HTML, CSS, JavaScript (frontend) and Flask (backend)
Deployed via Render or Heroku
Features: drag-drop upload, live prediction, responsive UI

Results & Performance

A. Model Performance

Test Accuracy: 90.4%
Validation Accuracy: 91.8%
Recall: 89.6%
F1-Score: 89.8%
Low validation/test loss: ~0.31–0.35
Confusion matrix shows strong detection with minor misclassification of blurred or occluded genuine faces

B. Training Graphs

Steady convergence with early stopping, class weighting, and learning rate decay to avoid overfitting

Applications

Passport & Border Control: Prevent fraudulent document use at checkpoints
Secure Access Control: Authenticate identity in corporate and public infrastructure
Digital Identity Verification: Support services like eKYC, banking, and online account setup
Law Enforcement & Forensics: Validate authenticity of facial evidence in legal cases
Biometric Research: Enable further academic and industrial exploration of morph detection
Compliance Testing: Verify biometric systems’ resistance to morphing attacks

Advantages

???? High detection accuracy using advanced deep learning
? Real-time usability in critical environments
???? Web-accessible interface for widespread deployment
???? Enhanced biometric security across sectors
???? Explainable insights through performance graphs and confusion matrices
???? Scalable and modular, easily integrated into existing systems

References

[1] Afchar, Darius, et al. \"MesoNet: a compact facial video forgery detection network.\" 2018 IEEE International Workshop on Information Forensics and Security (WIFS). IEEE, 2018. [2] Dolhansky, Brian, et al. \"The Deepfake Detection Challenge (DFDC) Preview Dataset.\" arXiv preprint arXiv:1910.08854 (2019). [3] Korshunov, Pavel, and Sébastien Marcel. \"Deepfakes: a new threat to face recognition? Assessment and detection.\" arXiv preprint arXiv:1812.08685 (2018). [4] Chollet, François. \"Xception: Deep Learning with Depthwise Separable Convolutions.\" Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. [5] OpenAI. “ChatGPT: Language Model for Natural Language Understanding and Generation.” https://chat.openai.com [6] IEEE Xplore Digital Library. “Deep Learning and Computer Vision Papers.” https://ieeexplore.ieee.org/ [7] Wikipedia Contributors. “Deepfake.” Wikipedia, The Free Encyclopedia.https://en.wikipedia.org/wiki/Deepfake [8] Hochreiter, Sepp, and Jürgen Schmidhuber. \"Long short-term memory.\" Neural computation 9.8 (1997): 1735–1780. [9] Kingma, Diederik P., and Jimmy Ba. \"Adam: A method for stochastic optimization.\" arXiv preprint arXiv:1412.6980 (2014). [10] OpenCV. “Open Source Computer Vision Library.” https://opencv.org/ [11] Flask Documentation. “Flask Web Framework.” https://flask.palletsprojects.com/

Copyright

Copyright © 2025 K Rohith, K Nagarjuna, B Venu Yadav, Mr. G. Rama Chandra Kumar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET71126

Publish Date : 2025-05-16

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here