Authors: Kimaya Kulkarni, Sahil Khanolkar, Yash Walke, Rahul Sonkamble
DOI Link: https://doi.org/10.22214/ijraset.2022.44110
Certificate: View Certificate
The free access to large-scale public databases, together with the fast progress of deep learning techniques, in particular with the Generative Advertising Networks, has led to the creation of very realistic fake content with its corresponding society in this time of false or fake news. This survey provides a thorough review of techniques to detect DeepFake manipulations.
Deepfake (coming from "deep learning" and "fake") is a method that will superimpose face images of a target person onto a video of a source person to create a video of the target person doing or saying things the source person does. Fake images and videos, including facial feature information, are generated by digital manipulation. They have become a greater public concern than ever before, particularly with the DeepFake methods.
This term was coined by a Reddit user named "deepfakes" in late 2017 to have developed a machine-learning algorithm that helped him to transpose celebrity faces into porn videos. Deepfakes are being used to swap the faces of celebrities or targeted politicians with bodies in pornographic images and videos.
Deepfakes can thus be used to incite political or religious tensions between countries, as well as to deceive the public and affect results in election campaigns, or create chaos in financial markets by creating fake news. It may be even used to generate fake satellite images of the Earth and make it contain objects that do not exist in the real world to confuse military analysts, e.g., like creating a fake bridge across a river, since there is no such a bridge in the real world. This will mislead a force of troops who have been guided to cross the bridge in a real battle.
There are also positive uses of deepfakes, such as creating voices for those who have lost theirs or updating episodes of movies without reshooting them. However, the number of malicious applications of this deepfake outnumbers the positive ones. The method that creates those manipulated images and videos has become much simpler today as it needs as little as an identity photo or a short video of a target individual. Less and less effort is required to produce stunningly convincing tempered footage. Recent advancements can even create a deepfake from a still image. That’s why Deepfakes are a threat affecting not only public figures but also ordinary people at this time. This survey includes techniques for detecting deepfakes. The paper examines various methods for detecting deepfakes. Deepfake detection is typically regarded as a binary classification problem in which classifiers are used to distinguish between authentic and tampered videos.
II. LITERATURE REVIEW
Motivated by the ongoing success of digital face manipulations, particularly DeepFakes, this survey offers various detection techniques over time. In general, most current face manipulations appear to be easy to detect in controlled scenarios, i.e., when fake detectors are evaluated in the same conditions for which they were trained. It has been demonstrated that the majority of the benchmarks included in this survey achieve very low error rates in manipulation detection. This scenario, however, may not be very realistic because fake images and videos are commonly shared on large social networks and suffer from a wide range of variations such as compression level and resizing, noise, and so on. On the other hand, current detection methods are primarily focused on the drawbacks of deepfake generation pipelines, i.e. identifying competitors' weaknesses in order to attack them. This type of information and knowledge is not always available in the advertising environment, where attackers primarily try not to expose such deepfake creation technologies. Furthermore, facial manipulation techniques are constantly being refined. These factors motivate further research into the fake detectors' ability to generalise against unknown conditions. Another research direction could be to incorporate detection methods into distribution platforms such as social media to increase its overall effectiveness in dealing with the widespread impact of deep-fake. On these platforms, a screening or filtering mechanism based on effective detection methods can be implemented to aid in the detection of deepfakes. Videos and photographs have been widely used as evidence in police investigations and legal proceedings. Digital media forensics experts with a background in computer or law enforcement and experience collecting, examining, and analysing digital information may present them as evidence in a court of law. This approach can be used by intelligence services attempting to influence decisions made by influential figures such as politicians who are at the forefront of national and international security threats. Detecting the deepfake alarming issue, The research community has concentrated on developing deepfake detection algorithms, with numerous results published. Using detection methods to detect deepfakes is critical, but understanding the true intent of those who publish deepfakes is even more critical. This necessitates user judgement based on the social context in which deepfake is discovered, for example, who distributed it and what they said about it. A study on the social context of deepfakes to assist users in making such decisions is thus worthwhile. Machine learning and AI algorithms were used to help determine the authenticity of digital media and produced accurate and reliable results.
III. PROPOSED METHODOLOGY
We are using Kaggle's Deepfake challenge dataset , which contains 3000 videos from randomly collected sources. Our dataset is divided into 70 percent train dataset and 30 percent test dataset.
Dataset preprocessing includes splitting videos into frames and ten, followed by face detection, cropping of the detected frame, and creating a new face cropped dataset. The remaining frames will be ignored during preprocessing
The model is made up of resnext50 32x4d and an LSTM layer. The Data Loader loads and divides preprocessed face cropped videos into train and test sets. The frames from the processed videos are then passed to the model in mini batches for training and testing.
???????VII. FEATURE EXTRACTION WITH RESTNEXT50
The ResNext50 is being used to extract the features and also accurately detecting the frame level of the features. The CNN Network will then be tuned by adding extra layers and selecting a reasonable learning rate to converge with the gradient. Following the last pooling layers, there are 2048-dimensional feature vectors that will be used for sequential LSTM input..
???????VIII. LTMS FOR SEQUENTIAL PROCESSING
Assume we take ResNext CNN feature vectors of input frames as input and train a 2-node neural network with a probability that the sequence is part of a deep fake video. The main challenge here is to design a model that can recursively process a sequence in a meaningful pattern. Now, we propose the use of a 2048 LSTM unit with a 0.4 chance of dropout, which is capable of achieving this goal. LSTM is used for sequentially processing frames in order to perform temporal analysis of a video by comparing the frame at "t"and second with the frame at "t-n" seconds. Where n is the number of frames preceding the t.
When a new video is uploaded, it will go through the same preprocessing step to obtain the cropped video with the face. The data will then be passed directly to the trained model, which will predict whether the video is real or fake..4
The model's output will indicate whether the video is real or fake based on the model's confidence.
Audio altered deepfakes are not detected in the current module, but this can be accomplished in the future
 Yuezun Li, Siwei Lyu, “ExposingDF Videos By Detecting Face Warping Artifacts,” in arXiv:1811.00656v3.  Yuezun Li, Ming-Ching Chang and Siwei Lyu “Exposing AI Created Fake Videos by Detecting Eye Blinking” in arxiv.  Huy H. Nguyen , Junichi Yamagishi, and Isao Echizen “ Using capsule networks to detect forged images and videos ”.  Hyeongwoo Kim, Pablo Garrido, Ayush Tewari and Weipeng Xu “Deep Video Portraits” in arXiv:1901.02212v2.  Umur Aybars Ciftci, ?Ilke Demir, Lijun Yin “Detection of Synthetic Portrait Videos using Biological Signals” in arXiv:1901.02212v2.  Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In NIPS, 2014.  David G¨uera and Edward J Delp. Deepfake video detection using recurrent neural networks. In AVSS, 2018.  Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016.  An Overview of ResNet and its Variants : https://towardsdatascience.com/an-overview-of-resnetand-its-variants-5281e2f56035  Long Short-Term Memory: From Zero to Hero with Pytorch: https://blog.floydhub.com/long-short-term-memory-from-zero-to-hero-with-pytorch/  Sequence Models And LSTM Networks https://pytorch.org/tutorials/beginner/nlp/sequence_mod els_tutorial.html  https://discuss.pytorch.org/t/confused-about-the-image-preprocessing-in-classification/3965  https://www.kaggle.com/c/deepfake-detection-challenge/data  https://www.researchgate.net/publication/336058980_Deep_Learning_for_Deepfakes_Creation_and_Detection_A_Survey  Y. Qian et al. Recurrent color constancy. Proceedings of the IEEE International Conference on Computer Vision, pages 5459–5467, Oct. 2017. Venice, Italy.  P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros. Image-to-image translation with conditional adversarial networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 5967–5976, July 2017. Honolulu, HI.  R. Raghavendra, Kiran B. Raja, Sushma Venkatesh, and Christoph Busch, “Transferable deep-CNN features for detecting digital and print-scanned morphed face images,” in CVPRW. IEEE, 2017.  Tiago de Freitas Pereira, Andr´e Anjos, Jos´e Mario De Martino, and S´ebastien Marcel, “Can face anti spoofing countermeasures work in a real world scenario?,”in ICB. IEEE, 2013.  Nicolas Rahmouni, Vincent Nozick, Junichi Yamagishi, and Isao Echizen, “Distinguishing computer graphics from natural images using convolution neural networks,” in WIFS. IEEE, 2017.  F. Song, X. Tan, X. Liu, and S. Chen, “Eyes closeness detection from still images with multi-scale histograms of principal oriented gradients,” Pattern Recognition, vol. 47, no. 9, pp. 2825–2838, 2014. D. E. King, “Dlib-ml: A machine learning toolkit,” JMLR, vol. 10, pp. 1755–1758, 2009
Copyright © 2022 Kimaya Kulkarni, Sahil Khanolkar, Yash Walke, Rahul Sonkamble. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Authors : Kimaya
Paper Id : IJRASET44110
Publish Date : 2022-06-11
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here