Hybrid ResNet-CNN Framework for Real-Time Deepfake Image and Video Detection

Authors: Amuthavalli G, Dr. I. Shahanaz Begum, Poonggazhal TTK

DOI Link: https://doi.org/10.22214/ijraset.2026.82491

Abstract

Due to the rise of online digital communication channels, the exchange of image and video information on the web is at an all-time high, leading to issues related to authentication. With the recent improvements in AI technologies, the concept of deepfakes has emerged, allowing to produce very realistic altered images and videos that can mimic actual human beings and events. Such forged data can be utilized in a number of ways, including but not limited to misinformation, identity theft, cyber crimes, political manipulation, and other malicious activities against individual people and society. As such, there has been a need to develop solutions that would allow detecting such alterations and identifying deepfakes. This study aims to offer an advanced deep-learning based solution for detecting deepfake images and videos. The CNN layer performs the task of extracting both low-level and high-level spatial features, whereas the ResNet structure solves the issue of vanishing gradients and enables deep learning by the model. Compared to the standard way of learning, the proposed method of developing a model gives rise to fewer complications and resolves itself much faster than traditional means. Results from numerous studies conducted on many popular datasets support conclusions that the proposed system outperforms conventional methods in both accuracy and false positive rate for identifying deepfake videos as well as requiring very little time to provide an output once a video has been provided as input. The proposed model also performs well across the range of scenarios based on the variation of how inputs are altered (e.g., the level of resolution, the amount of compression used, the type of alterations made to the input). The proposed model is perfectly suited for integration into many types of applications such as social media monitoring, law enforcement, digital verification and cyber security, since the proposed model can detect deepfake videos in real-time.

Introduction

The rapid growth of digital communication and artificial intelligence has increased the creation and sharing of digital media, but it has also led to the rise of manipulated images and videos known as deepfakes. These AI-generated forgeries can be used for misinformation, identity theft, cybercrime, and reputation damage. Traditional forgery detection methods based on manual inspection or low-level image features are no longer effective against advanced manipulation techniques such as GAN-based deepfakes. Existing automated detection systems also face challenges related to scalability, accuracy, real-time performance, and adaptability to different manipulation methods.

The research proposes an intelligent deepfake detection framework using a hybrid CNN and ResNet deep learning model to identify forged images and videos. The system analyzes visual inconsistencies such as facial distortions, lighting errors, texture abnormalities, blending artifacts, and unnatural expressions that are difficult for humans to detect. The proposed model supports multiple media formats and aims to provide a scalable, reliable, and real-time solution for applications such as social media monitoring, cybersecurity, and digital forensics.

The dataset consists of real and manipulated images/videos created using techniques such as Face Swap, Face Reenactment, Lip Syncing, and GAN-based synthesis. The data is preprocessed through resizing, normalization, frame extraction, and augmentation methods like rotation, cropping, and brightness adjustment. The dataset is divided into training, validation, and testing sets to evaluate model performance.

The main objectives of the study are to develop an accurate deepfake detection system, reduce false positives and negatives, improve generalization across different datasets, support image and video analysis, and minimize human involvement in media verification. The system combines CNN-based feature extraction with ResNet residual learning to improve detection accuracy and computational efficiency.

Previous research has explored methods such as Error-Level Analysis, machine learning classifiers, transfer learning, CNN, RNN, GAN-based detectors, and localization techniques. However, existing approaches still suffer from limitations including poor generalization, vulnerability to new deepfake generation methods, high computational requirements, and limited real-time capability.

The proposed methodology includes four main stages:

Image/Video Upload and Training – collecting real and fake media samples and preparing them for learning.
Input Processing – resizing images, extracting video frames, and normalizing data.
Feature Extraction – using CNN and ResNet architectures to detect spatial and temporal manipulation patterns.
Forgery Detection – classifying media as authentic or fake using extracted deep features.

The implementation uses Python-based deep learning frameworks such as TensorFlow, Keras, OpenCV, NumPy, Pandas, and Scikit-learn. The model is trained using optimization techniques like Adam optimizer and evaluated using metrics including accuracy, precision, recall, F1-score, AUC, and confusion matrix.

Experimental results show that the proposed hybrid CNN-ResNet model significantly improves deepfake detection performance compared to existing methods. The model achieved approximately 96.85% accuracy, 95.72% precision, 96.10% recall, and 95.91% F1-score, demonstrating better detection capability, reduced false predictions, and faster inference suitable for real-time applications.

Conclusion

From the results(fig 3 to 7) obtained through the research, it is clear that the proposed framework of deep learning is a plausible approach in detecting deepfake images and videos in the present-day digital world. Through the integration of Convolutional Neural Networks and ResNet architectures, the system has enabled the detection and learning of various attributes related to images and videos by identifying any alterations made to their characteristics like inconsistencies in facial expressions, unnatural fusions, and distorted lighting and textures. This has led to the creation of an accurate, efficient, and fast classifier of genuine and fake media content. Furthermore, the application of residual connections has significantly helped improve the training speed and reduce computational costs compared to other traditional techniques. In contrast to the existing systems for authentication, the proposed system will resolve all of the limitations posed by those methods in terms of scalability, robustness, and efficiency. Moreover, the proposed system will help combat various problems associated with synthetic media due to its capability of automating the authentication process. The findings from this study help establish trustworthiness within digital communication systems. In general, this project demonstrates the significance of using deep learning models to counter forgery attacks on images.

References

[1] Rafique, Rimsha, et al. \"Deep fake detection and classification using error-level analysis and deep learning.\" Scientific reports 13.1 (2023): 7422. [2] Heidari, Arash, et al. \"Deepfake detection using deep learning methods: A systematic and comprehensive review.\" Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 14.2 (2024): e1520. [3] Wang, Tianyi, et al. \"Deepfake detection: A comprehensive survey from the reliability perspective.\" ACM Computing Surveys 57.3 (2024): 1-35. [4] Suratkar, Shraddha, and Faruk Kazi. \"Deep fake video detection using transfer learning approach.\" Arabian Journal for Science and Engineering 48.8 (2023): 9727-9737. [5] Patel, Yogesh, et al. \"Deepfake generation and detection: Case study and challenges.\" IEEE Access 11 (2023): 143296-143323. [6] Ustubioglu, Beste. \"An attack-independent audio forgery detection technique based on cochleagram images of segments with dynamic threshold.\" IEEE Access 12 (2024): 82660-82675. [7] Wang, Zhi, Yiwen Guo, and WangmengZuo. \"Deepfake forensics via an adversarial game.\" IEEE Transactions on Image Processing 31 (2022): 3541-3552. [8] Rana, Md Shohel, Mohammad Nur Nobi, Beddhu Murali, and Andrew H. Sung. \"Deepfake detection: A systematic literature review.\" IEEE Access (2022). [9] Huang, Yihao, Felix Juefei-Xu, Qing Guo, Yang Liu, and Geguang Pu. \"Fakelocator: Robust localization of GAN-based face manipulations.\" IEEE Transactions on Information Forensics and Security 17 (2022): 2657-2672. [10] Nirkin, Yuval, Yosi Keller, and Tal Hassner. \"FSGANv2: Improved subject agnostic face swapping and reenactment.\" IEEE Transactions on Pattern Analysis and Machine Intelligence 45, no. 1 (2022): 560-575.

Copyright

Copyright © 2026 Amuthavalli G, Dr. I. Shahanaz Begum, Poonggazhal TTK. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET82491

Publish Date : 2026-05-13

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here