Enhancing Deepfake Detection Through Hybrid MobileNet-LSTM Model with Real-Time Image and Video Analysis

Authors: Ewan Ulysses P F, Varun Prakash M, Dr. Raghavendra V

DOI Link: https://doi.org/10.22214/ijraset.2025.70664

Abstract

The race to crush information integrity and public trust is being won by one thing: deepfakes, AI manipulated media. The goal is to enhance our Deepfake detection using MobileNet and LSTM network. MobileNet’s lightweight CNN architecture is able to coreinduce thevisual features in an image by an image and in avideoand can pick up the slight visual clues of textures and facial structure. A temporal inconsistencies that cannotbeseenby image base methods are then analyzed using an LSTM network. The hybrid model is trained on real and deepfake media datasets, and is thus adaptable to emerging deepfake techniques. This has auserfaceinterfacetoanalyzethereal time and fly mediatogettheanalysisandanalysisscoreand visual feedback of the identified artifacts. Unique to this system is its versatility for images and videos, and its real time capability, making it a suitable choice forpracticaluse in social media,journalism,andlawenforcementcombating the spread of misinformation with a guarantee of digital media authenticity.

Introduction

Overview

The rapid advancement of AI has led to the creation of deepfake technology, capable of producing highly realistic fake videos and images. Initially viewed as creative tools, deepfakes now raise serious concerns in politics, media, and public trust due to their potential for spreading misinformation and manipulating public opinion.

At the core of deepfake generation are Generative Adversarial Networks (GANs), which can produce near-authentic media with minimal expertise. Current detection methods focus mainly on visual inconsistencies, such as lighting or facial alignment, but often fail against more sophisticated deepfakes. Most also analyze single images or frames, overlooking the temporal dynamics in videos.

Proposed Solution

This project presents a hybrid deepfake detection model that combines:

MobileNet: A lightweight CNN for spatial feature extraction (e.g., texture anomalies, misaligned facial features).
LSTM (Long Short-Term Memory): A type of recurrent neural network for temporal analysis across video frames (e.g., motion inconsistencies, facial expression shifts).

This dual-focus model captures both spatial and temporal irregularities, increasing robustness and adaptability to new types of deepfakes. It’s designed to run in real-time, making it suitable for practical uses such as journalism, social media monitoring, and public safety.

Methodology

Data Collection & Preprocessing:
- Collected a large and diverse dataset of real and fake videos.
- Augmented data through rotation, noise addition, etc.
- Normalized video frames and ensured balanced distribution.
Feature Extraction (MobileNet):
- Uses pre-trained weights (e.g., from ImageNet).
- Captures spatial inconsistencies in individual frames.
- Lightweight and efficient for real-time use.
Temporal Analysis (LSTM):
- Analyzes sequential patterns across frames.
- Identifies temporal anomalies like inconsistent motion or facial dynamics.
Model Training:
- Combined loss functions: Binary Cross-Entropy + Temporal Loss.
- Used adversarial training and learning rate scheduling to improve robustness.
User Interface:
- Allows users to upload images or videos.
- Provides real-time detection score and visual explanations.
- Designed to be accessible for non-technical users.
Evaluation & Deployment:
- Tested on varied media sources.
- Incorporated user feedback for interface improvements.
- Continuous updates with online learning and multimodal capabilities.

Performance Results

Compared against standalone models:

Model	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
Standalone MobileNet	84.5	82.3	85.0	83.6
Standalone LSTM	80.2	78.9	80.0	79.4
CNN Baseline	83.1	81.5	82.0	81.7
Proposed Hybrid	91.8	89.4	90.5	89.9

Interpretation:

The MobileNet-LSTM hybrid model significantly outperforms others in all metrics.
This shows the importance of combining spatial and temporal analysis for detecting deepfakes.

Literature Review Highlights

Existing models often specialize in either spatial or temporal features, limiting their effectiveness.
Some use Transformers or biometric analysis (e.g., iris scans), which are effective but computationally heavy.
Hybrid models and transfer learning show promise but often lack real-time performance or adaptability

Conclusion

This work introduces ahybridMobileNet-LSTMmodel resulting in a robust and effective solutionfordeepfake detection, leveraging the joint usage of spatial and temporal analysis. By virtue of the efficient extractionof fine grained spatial features by the MobileNet component and temporal patterns on sequential frames by the LSTM network, manipulations in the images as well as video are detected. The experimental results verify the superiority of the proposed model compared to standalone architectures and traditional CNN based methods that result in superior accuracy, precision, recall and F1-score. Finally, we show the accuracy of theMobileNet-LSTM model is 91.8%, which is robust and adaptable to different deep fake techniques. The model achieves a precision of 89.4% and recall of 90.5%, with high reliability in termsofdetectingmanipulatedcontentand low rate of false positives and negatives.Thisresultsin balanced F1-score of 89.9% which demonstrates thatit generalizes well across all sorts of real world testing scenarios rendering it well suited to be used as part of numerous applications. The proposed model is lightweight and enables realtime detection, supporting real time deepfake media detection requiring timely and accurate detection. This capability makes social media monitoring, digital forensics, journalism and public safety practical applications of this capability. The user-friendly interface also improves usability by providing non technical users an intuitive way to get detection results and visual explanations of detected anomalies helping build trust and transparency for business users. Overall, the proposed hybrid MobileNet-LSTM model makes the contributions of removing the spatial and temporal limitations of deepfake detection systems by using spatial analysisandtemporalanalysis.Andowing to its real time capability, high performance and user friendly design, it is an appropriate tool to combat the persistent problem of deepfake media.Laterworksmay be aimed at incorporating multimodal inputs,e.g.audio and textual analysis, to further enhance detection accuracy and flexibility in dynamic digitalenvironments. VI. FUTURE WORK Thereby future work can improve the scalability, adaptability and robustness of mobile deepfakedetection model MobileNet LSTM on mobile model either on real world test cases. Pruning techniques and lightweight architectures would make large scale real time deployment in particular on the resource constrained environment of mobile devices. The future of deepfakes is, if we want to catch up with the technology, to include online training or somehow transfer learning to avoid a full restart every time the manipulation technique gets new. To increase detection accuracy even further, we can extend the model by adding multimodal detection by joint audio and visual analysis for consistency across channels, particularly when dealing with video content. Apart from user trust and adoption, the model should ideally become more transparent and interpretable, through visualizationtechniquesfocusedonregionsthat have been manipulated or at least on temporal inconsistencies developed. But we cannot ignore thatwe can also make the model more robust to subtle manipulation attempts by also dealing with adversarial modelrobustnessthroughadversarialtraining.Such expanded testing of the model on diverse scenarios including livemediastreamswouldvalidatethemodel’s performance in beyond deriving test datasets, whichalso align with the claims of ethical and privacy concerns raised by journalism and law enforcement. ThesemodelswillcontinuetoadvanceMobileNet-LSTM will as a powerful tool for deepfake detection, for an ever evolving media landscape. REFERENCES [1] Chan, C. P. M., Lee, S. W., & See, J. (2021). DeepFakeVideoDetectionUsingDeepLearningand IrisDetection.2021IEEEInternational Conference on Image Processing (ICIP), Anchorage,AK, USA, pp. 1239–1243. doi: 10.1109/ICIP42928.2021.9506664. [2] Dong,J., Wang, W.,Tang,Y.,Zhang,Y.,&Liu,H.(2021).Deepfake Video Detection Using Inception-ResNet and AttentionMechanism. 2021 IEEE International Conference onMultimediaandExpo (ICME), Shenzhen, China, pp. 1–6. doi:10.1109/ICME51207.2021.9428434. [3] Husen, M. N., Kurniawan, A., & Pamungkas, M. (2020).Deepfake Detection on Video Sequences Using Inception-ResNet-v2and LSTM. 2020 3rd International Conference on IntelligentAutonomous Systems (ICoIAS), Singapore, pp. 134–138. doi:10.1109/ICoIAS49312.2020.9081914. [4] Sabir, M., Cheng, J., Jaiswal, A., Wu, Y., Nataraj, L., &Chandrasekaran, S. (2020). Recurrent Convolutional Strategies forFace Manipulation Detection in Videos. 2020 IEEE/CVF Conferenceon Computer Vision and Pattern Recognition Workshops (CVPRW),Seattle, WA, USA, pp. 1643–1652. doi:10.1109/CVPRW50498.2020.00203. [5] Korshunov, P., & Marcel, S. (2020). Human vs. Machine:Benchmarking Humans Against Deepfake Detection Systems. 20208th International Conference on Biometrics Theory,ApplicationsandSystems (BTAS), Washington, DC, USA, pp. 1–6. doi:10.1109/BTAS48898.2020.9528289. [6] Chugh, R., Agarwal, V., Subramanian, S., &Ramakrishnan, K. R. (2021). Not Made ForEachOther— CombiningCNN and Transformers for Deepfake Detection. 2021 IEEE/CVFInternational Conference on Computer Vision (ICCV),Montreal,QC,Canada, pp. 15024–15033. doi: 10.1109/ICCV48922.2021.01514. [7] Mittal, S., Verma, A.,&Jain,R.(2020).TransferLearningfor Deepfake Detection. 2020 6th InternationalConferenceonSignalProcessing and Communication (ICSC), Noida,India,pp.69–74.doi:10.1109/ICSC48311.2020.9182767. [8] Wang, W., Zhang, Y., & Liu, H. (2020). Deep NeuralNetworks forDeepfakeDetection:ASurvey.2020IEEEInternationalConference on Artificial Intelligence and Knowledge Engineering(AIKE), Phuket, Thailand, pp. 290–296. doi:10.1109/AIKE48582.2020.00048. [9] Luo,X.,Lv,J.,Song,H.,Yu,Z.,&Yang,G.(2020). Dual-Stream CNNs for Forgery Detection in DeepFake Videos.2020IEEE International Conference on Image Processing (ICIP), AbuDhabi, UAE, pp. 2556–2560. doi: 10.1109/ICIP40778.2020.9191035. [10] Wang,X.,Li,Y.,&Jiang,H.(2019).FakeVideoDetectionwith Convolutional Neural Networks. 2019 10th InternationalConference on AdvancedComputationalIntelligence(ICACI),Guilin,China, pp. 182–186. doi: 10.1109/ICACI.2019.8778503. [11] Rosetti, N. (2020). Deepfake Detection Using LSTMs,Transformers and Video-Level Artifacts. [12] Stroebel, L., Llewellyn, M., Hartley, T., Ip, T. S., &Ahmed, M. (2023). A Systematic Literature Review on theEffectiveness of Deepfake Detection Techniques. Journal of CyberSecurity Technology, 7(2), 83–113. [13] Kaushal, A., Singh, S., Negi,S.,&Chhaukar,S.(2022).AComparative Study on DeepFake Detection Algorithms. 2022 4thInternational Conference onAdvancesinComputing,CommunicationControl and Networking (ICAC3N), pp. 854–860. IEEE. [14] Wen,Y.,Lei,Z.,Yang,Y.,Liu,C.,&Ma,M.(2022). Multi-PathGMM-MobileNetBasedonAttackAlgorithmsandCodecsfor Synthetic Speech and Deepfake Detection. INTERSPEECH, pp.4795– 4799. [15] Abhineswari, M., Charan, K.S.,&Shrikarti,B.N.(2024).Deep Fake Detection Using Transfer Learning:AComparativeStudyof Multiple Neural Networks. 2024 International Conference onSignal Processing, Computation, Electronics, Power andTelecommunication (IConSCEPT), pp. 1–6. IEEE.

References

[1] Chan, C. P. M., Lee, S. W., & See, J. (2021). DeepFakeVideoDetectionUsingDeepLearningand IrisDetection.2021IEEEInternational Conference on Image Processing (ICIP), Anchorage,AK, USA, pp. 1239–1243. doi: 10.1109/ICIP42928.2021.9506664. [2] Dong,J., Wang, W.,Tang,Y.,Zhang,Y.,&Liu,H.(2021).Deepfake Video Detection Using Inception-ResNet and AttentionMechanism. 2021 IEEE International Conference onMultimediaandExpo (ICME), Shenzhen, China, pp. 1–6. doi:10.1109/ICME51207.2021.9428434. [3] Husen, M. N., Kurniawan, A., & Pamungkas, M. (2020).Deepfake Detection on Video Sequences Using Inception-ResNet-v2and LSTM. 2020 3rd International Conference on IntelligentAutonomous Systems (ICoIAS), Singapore, pp. 134–138. doi:10.1109/ICoIAS49312.2020.9081914. [4] Sabir, M., Cheng, J., Jaiswal, A., Wu, Y., Nataraj, L., &Chandrasekaran, S. (2020). Recurrent Convolutional Strategies forFace Manipulation Detection in Videos. 2020 IEEE/CVF Conferenceon Computer Vision and Pattern Recognition Workshops (CVPRW),Seattle, WA, USA, pp. 1643–1652. doi:10.1109/CVPRW50498.2020.00203. [5] Korshunov, P., & Marcel, S. (2020). Human vs. Machine:Benchmarking Humans Against Deepfake Detection Systems. 20208th International Conference on Biometrics Theory,ApplicationsandSystems (BTAS), Washington, DC, USA, pp. 1–6. doi:10.1109/BTAS48898.2020.9528289. [6] Chugh, R., Agarwal, V., Subramanian, S., &Ramakrishnan, K. R. (2021). Not Made ForEachOther— CombiningCNN and Transformers for Deepfake Detection. 2021 IEEE/CVFInternational Conference on Computer Vision (ICCV),Montreal,QC,Canada, pp. 15024–15033. doi: 10.1109/ICCV48922.2021.01514. [7] Mittal, S., Verma, A.,&Jain,R.(2020).TransferLearningfor Deepfake Detection. 2020 6th InternationalConferenceonSignalProcessing and Communication (ICSC), Noida,India,pp.69–74.doi:10.1109/ICSC48311.2020.9182767. [8] Wang, W., Zhang, Y., & Liu, H. (2020). Deep NeuralNetworks forDeepfakeDetection:ASurvey.2020IEEEInternationalConference on Artificial Intelligence and Knowledge Engineering(AIKE), Phuket, Thailand, pp. 290–296. doi:10.1109/AIKE48582.2020.00048. [9] Luo,X.,Lv,J.,Song,H.,Yu,Z.,&Yang,G.(2020). Dual-Stream CNNs for Forgery Detection in DeepFake Videos.2020IEEE International Conference on Image Processing (ICIP), AbuDhabi, UAE, pp. 2556–2560. doi: 10.1109/ICIP40778.2020.9191035. [10] Wang,X.,Li,Y.,&Jiang,H.(2019).FakeVideoDetectionwith Convolutional Neural Networks. 2019 10th InternationalConference on AdvancedComputationalIntelligence(ICACI),Guilin,China, pp. 182–186. doi: 10.1109/ICACI.2019.8778503. [11] Rosetti, N. (2020). Deepfake Detection Using LSTMs,Transformers and Video-Level Artifacts. [12] Stroebel, L., Llewellyn, M., Hartley, T., Ip, T. S., &Ahmed, M. (2023). A Systematic Literature Review on theEffectiveness of Deepfake Detection Techniques. Journal of CyberSecurity Technology, 7(2), 83–113. [13] Kaushal, A., Singh, S., Negi,S.,&Chhaukar,S.(2022).AComparative Study on DeepFake Detection Algorithms. 2022 4thInternational Conference onAdvancesinComputing,CommunicationControl and Networking (ICAC3N), pp. 854–860. IEEE. [14] Wen,Y.,Lei,Z.,Yang,Y.,Liu,C.,&Ma,M.(2022). Multi-PathGMM-MobileNetBasedonAttackAlgorithmsandCodecsfor Synthetic Speech and Deepfake Detection. INTERSPEECH, pp.4795– 4799. [15] Abhineswari, M., Charan, K.S.,&Shrikarti,B.N.(2024).Deep Fake Detection Using Transfer Learning:AComparativeStudyof Multiple Neural Networks. 2024 International Conference onSignal Processing, Computation, Electronics, Power andTelecommunication (IConSCEPT), pp. 1–6. IEEE.

Copyright

Copyright © 2025 Ewan Ulysses P F, Varun Prakash M, Dr. Raghavendra V. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET70664

Publish Date : 2025-05-09

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here