In recent months, advancements in free deep learning-based software tools have made it easier to create convincing face swaps in videos, often leaving minimal traces of manipulation. This phenomenon is commonly known as \"DeepFake\" (DF) videos. While manipulations of digital videos using visual effects have been demonstrated for decades, recent progress in deep learning has significantly enhanced the realism of fake content and the ease with which it can be generated. These artificially intelligent tools, commonly referred to as AI-synthesized media or DF, have simplified the creation process. However, detecting DeepFake videos poses a significant challenge. Despite the simplicity of generating DF using AI tools, training algorithms to identify them is not straightforward. To address this issue, we have taken a step forward by employing Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) for detection. Our system utilizes a CNN to extract features at the frame level. These features are then employed to train an RNN, which learns to classify whether a video has undergone manipulation. The system is capable of detecting temporal inconsistencies between frames introduced by the tools used in DF creation. To evaluate the effectiveness of our approach, we tested it against a large set of fake videos collected from a standard dataset. The results demonstrate the competitiveness of our system, achieved through a simple architecture.
Introduction
The rapid growth of smartphone technology, high-speed internet, and deep learning has enabled widespread creation and sharing of digital videos, but it has also led to the rise of DeepFake content, which can spread misinformation and pose serious social risks. To address this, the proposed system introduces a deep learning-based method to detect fake videos by identifying artifacts created during the DeepFake generation process using Generative Adversarial Networks (GANs).
The method focuses on detecting inconsistencies in face regions caused by fixed-resolution synthesis and affine transformations. It analyzes videos by splitting them into frames, extracting spatial features using a ResNeXt CNN, and capturing temporal inconsistencies with an LSTM-based RNN.
The system includes a user-friendly web platform where users can upload videos to check authenticity, with potential expansion into browser plugins or social media integration. The dataset combines real and fake videos from multiple sources, and preprocessing involves frame extraction, face detection, and dataset standardization.
The model architecture combines ResNeXt for feature extraction and LSTM for sequence analysis, enabling accurate detection of DeepFake videos. The output provides both classification (real or fake) and a confidence score, helping users assess reliability.
Overall, the system offers an effective, scalable solution for detecting DeepFake content, improving digital media integrity and promoting a safer online environment.
Conclusion
We introduced an approach that employs a neural network for the classification of videos, determining whether they are deepfakes or genuine recordings. Our method not only categorizes the videos but also provides a confidence score associated with the model\'s assessment. The inspiration for our method comes from the techniques used in the creation of deepfakes, particularly those generated by Generative Adversarial Networks (GANs) with the assistance of Autoencoders. Our approach focuses on detecting deepfakes at the frame level, utilizing a ResNext Convolutional Neural Network (CNN), and extends to video classification using Recurrent Neural Network (RNN) in conjunction with Long Short-Term Memory (LSTM). By leveraging these techniques, our proposed method demonstrates the capability to identify whether a video is a deepfake or real, based on the parameters outlined in the associated research paper. We are confident that our method will yield a high level of accuracy when applied to real-time data. The combination of frame-level detection and video classification, along with the integration of deep learning components, positions our approach as a robust solution for discerning between authentic and manipulated videos. This has implications for enhancing the accuracy and reliability of real-time video content analysis, contributing to the broader field of video forensics and authentication.
References
[1] Yuezun Li, Siwei Lyu, “ExposingDF Videos By Detecting Face Warping Artifacts,” in arXiv:1811.00656v3.
[2] Yuezun Li, Ming-Ching Chang and Siwei Lyu “Exposing AI Created Fake Videos by Detecting Eye Blinking” in arxiv.
[3] Huy H. Nguyen, Junichi Yamagishi, and Isao Echizen “Using capsule networks to detect forged images and videos”.
[4] Hyeongwoo Kim, Pablo Garrido, Ayush Tewari and Weipeng Xu “Deep Video Portraits” in arXiv: 1901.02212v2.
[5] Umur Aybars Ciftci, ?Ilke Demir, Lijun Yin “Detection of Synthetic Portrait Videos using Biological Signals” in arXiv: 1901.02212v2.
[6] Luisa Verdoliva. Media forensics and deepfakes: an overview. arXiv preprint arXiv:2001.06564, 2020.
[7] Martyn Jolly. Fake photographs: making truths in photogra phy. 2003.
[8] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS, 2014.
[9] David G¨uera and Edward J Delp. Deepfake video detection using recurrent neural networks. In AVSS, 2018.
[10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.
[11] An Overview of ResNet and its Variants: https://towardsdatascience.com/an-overview-of-resnet- and-its-variants-5281e2f56035
[12] Long Short-Term Memory: From Zero to Hero with Pytorch: https://blog.floydhub.com/long-short-term- memory-from-zero-to-hero-with-pytorch/
[13] Sequence Models And LSTM [22] F. Song, X. Tan, X. Liu, and S. Chen, “Eyes closeness detection from still images with multi-scale histograms of principal oriented gradients,” Pattern Recognition, vol. 47, no. 9, pp. 2825–2838, 2014. [23] D. E. King, “Dlib-ml: A machine learning toolkit,” JMLR, vol. 10, pp. 1755–1758, 2009. Networks https://pytorch.org/tutorials/beginner/nlp/sequence_mod els_tutorial.html
[14] https://discuss.pytorch.org/t/confused-about-the-image- preprocessing-in-classification/3965
[15] https://www.kaggle.com/c/deepfake-detection- challenge/data
[16] https://github.com/ondyari/FaceForensics
[17] Y. Qian et al. Recurrent color constancy. Proceedings of the IEEE International Conference on Computer Vision, pages 5459–5467, Oct. 2017. Venice, Italy.
[18] P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros. Image-to- image translation with conditional adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5967–5976, July 2017. Honolulu, HI.
[19] R. Raghavendra, Kiran B. Raja, Sushma Venkatesh, and Christoph Busch, “Transferable deep-CNN features for detecting digital and print-scanned morphed face images,” in CVPRW. IEEE, 2017.
[20] Tiago de Freitas Pereira, Andr´e Anjos, Jos´e Mario De Martino, and S´ebastien Marcel, “Can face anti spoofing countermeasures work in a real world scenario?,”in ICB. IEEE, 2013. [21] Nicolas Rahmouni, Vincent Nozick, Junichi Yamagishi, and Isao Echizen, “Distinguishing computer graphics from natural images using convolution neural networks,” in WIFS. IEEE, 2017.