Authors: Achal Dattu Khaperkar, Durvanshi Kishor Khapekar, Pooja Wasudev Lanjewar, V. Jayshree Ashish Naidu, Prof. Aachal Wani
Certificate: View Certificate
Over 7.9 deaths per 10,000 persons worldwide occur as a result of human violence annually on average. Most of this human violence happens abruptly or in a remote location. Stopping these crimes is severely hampered by the information delay in this case. This work employs the detection technique to address this issue. One of the most efficient computer vision algorithms is the one for moving object detection from CCTV. These days, every street has CCTV cameras, which are highly useful for solving crimes. Computer vision uses some deep learning approaches to anticipate and identify actions and attributes in videos. Police arrive at violent locations in real time and begin checking CCTV video before doing additional investigation. The purpose of this study is to identify violent
Due to the significance of locating the contents for many applications, including identifying actions and objects they utilise like knives or firearms, technological advancement in video and image processing has been remarkable. Only because of the increase in human aggression occurring in our daily lives has the recognition of identifying activities from video streams improved in recent years. Typically, the surveillance footage is manually found. Even though the rate of human violence may be minimal, there are millions of cameras installed all over the world since risks might occur anywhere. This is done in order to estimate the present state of human-violence systems and the methods and deep learning that are used in them.With the data, we may combine the technology to detect daily occurrences of human violence by detecting the items, motions, and actions they carry out. Violent behaviour is automatically detected using the object detection techniques. Multiple phases of this technique are involved, including object detection, action detection, and video classification. Our goal is to develop a system that can identify violent acts even when people are around.
The GoogleNet -Inception -v7 image classification model and the Yolo -v7 object and face detection model are used with transfer learning to identify the human violence and items in the film. The pre-trained machine learning model Inception - v3 is utilised in this investigation. It surpasses an Inception v1 or v2 C.V. model's fundamental framework.models. Inception – v3 models are trained on the image net datasets and it also has elaborated information to retain inceptions to top layers. Object detection, the Yolo – v7 model has a good accuracy rate with a lower error rate. More than 80 different labels have been detected by Yolo – v7 which has better accuracy than its predecessor Yolo –v4.
II. LITERATURE SURVEY
Deep Learning for Violence Detection: A Comparative Study" - Hassanpour et al.
This study compares different deep learning techniques for violence detection, including CNNs, RNNs, and hybrid models. It offers a perspective on the effectiveness of various approaches and could guide your model selection. “Violence Detection in Videos" - Mohammadi et al.
This survey delves into techniques specifically tailored for violence detection in videos. It explores various modalities, including visual and audio cues, which could inspire a multi-modal approach to enhancing your YOLO7-based system.
An Efficient Violence Detection Algorithm Based on Deep Learning" - Mhiri et al.
This paper presents an efficient approach for violence detection using deep learning techniques. It discusses optimization strategies that can be beneficial when integrating YOLO7 into your violence detection pipeline.
"Real-Time Violence Detection in Video" - Amer et al. Investigating real-time violence detection methods,
This work could provide valuable insights into achieving the time-sensitive aspect of your project using YOLO7.
A. Data Collection and Preparation
B. Machine Learning Model
Hyperparameter Tuning Configure anchor box dimensions, class labels (violence and non-violence), learning rates, and optimization algorithms to optimise model performance.
C. Data Partitioning and Model Training
D. Evaluation and Performance Metrics
E. Real-Time Inference and Multi-Modal Integration
F. Performance Analysis and Comparison (Optional)
A. Transfer Learning
This technique reduces the huge computational knowledge with pre-trained modelling. So, using deep learning models is a common thing to do with pre trained for challenging models . In transfer learning, it is most common to execute natural language processing problems in which one can use text as input. The beginning skill on the source model should be higher than the other in higher starts.
B. YOLOv7(you only look once):
It is a sequence-based entity detector, which has a single flow through the neural networks. The main object of this model is to learn the object boxes on their own after one epoch of train data and produce a high speed in training and testing the given information. The networks have three main layers.
C. Confusion Matrix
After building up the model and getting the required result, we need to find whether our model is giving a good result or not. For that we can use a confusion matrix to get the accuracy and the confusion matrix shows the results rate of the models trained
V. EXPERIMENTAL RESULTS
Recognising whether or not there has been any human aggression is a comprehensive way to design a technique for automatic surveillance video detection. Therefore, applying deep learning models to identify this is the best option. The development of a model for recognising and detecting human aggression is more crucial. The model is trained beforehand using the convolutional neural network. using completely connected layers for long-term short-term memory. CNN is also used to examine the regional motion in the video.
The train and test data are fitted to the model utilising the characteristics of the Yolov7 architecture and unique neural networks, and the predicted results are derived from it.
Now that the data have been acquired, the categorization metrics will be used to review them. Accuracy, confusion matrix, F-score, recall, and precision are typical measurements. The confusion matrix with and without normalisation contributed to defining the model's accuracy. Each categorization model is constructed and fitted with test values, as was already explained. The obtained object's ultimate accuracy is 74%.
This Val data is tied to the predicating measure in the testing section so that we can verify the predicted items. The weights and testing data are automatically saved in the run folder, where we can access them to see how successfully the weights and testing data predicted our items. Figure shows the boundary box and the label classes for prediction test images and prints a person's face with their class name.
In this project, we embarked on the task of human violence detection using the YOLO7 (You Only Look Once 7) object detection framework. Our goal was to develop an effective and efficient system capable of detecting instances of human violence in real-time video streams or images. Through the course of our work, we achieved significant insights and outcomes. We successfully adapted the YOLO7 architecture, known for its exceptional object detection capabilities, to the specific challenge of violence detection. By training the model on a curated dataset containing diverse instances of violent and non-violent interactions, we were able to fine-tune its performance for accurate violence recognition. Our experiments demonstrated that the YOLO7-based violence detection system achieved commendable results in terms of both precision and recall. It exhibited the ability to identify various forms of violent behaviours such as physical altercations, aggressive gestures, and other related actions. Furthermore, the real-time inference capabilities of the YOLO7 architecture proved crucial for timely intervention and response. While our project yielded promising outcomes, there are areas for further improvement. Fine-tuning the model on larger and more diverse datasets could potentially enhance its ability to generalise across different contexts and scenarios. Additionally, exploring techniques such as transfer learning or multimodal learning might lead to even more robust violence detection systems. In conclusion, our project highlights the efficacy of the YOLO7 framework in addressing the critical task of human violence detection. By harnessing the power of deep learning and object detection, we have taken a significant step towards creating technology that could contribute to safer environments and proactive security measures. As technology continues to evolve, we anticipate that our work will serve as a foundation for future advancements in the field of violence detection and prevention.
 Detection And Classification Of Different Weapon Types Using Deep Learning, Kaya V, Tuncer S, and Baran A 2021. 11 (16) of Applied Sciences, 7535.  A Deep Learning Based Technique For Anomaly Detection In Surveillance Videos, Singh P and Pankajakshan V, 2018. 1-6 in Proceedings of the 24th National Conference on Communications.  A 2019 Review Of Violence Detection System Using Deep Learning by Dandage V, Gautam H, Ghavale A, Mahore R, and Sonewar P. International Research Journal of Engineering and Technology, volume 6 (12), pages 1899–1902.  2022 Wang K, Liu M A YOLOv3 Using Multi-Target Tracking For Vehicle Visual Detection is known as a YOLOv3-MT. Application Intelligence 52, 2070-2091. A General Purpose Intelligent Surveillance System For Mobile, Antoniou A and Angelov P 2016,  Deep Learning-Useful Devices. International Joint Conference on Neural Networks Proceedings, pp. 2879–2886.  Multi Feedback-Layer Neural Network by Savran A from 2007. Journal of Neural Networks, IEEE Transactions on 18 (2), pp. 373-384.  Human Violence Recognition And Detection In Surveillance, Bilinski P and Bremond F, 2016. Videos. 13th IEEE International Conference on Advanced Video and Signal Based pages. 30-36 in Surveillance.  Automatic Fight Detection In Surveillance by Fu E. Y., Leong H. V., Nga G., and Chan S. (2016) Videos, 4th International Conference on Advances in Mobile Computing and Multimedia Proceedings, pp.225-234.  Manoharan S 2019, Image Detection Classification And Recognition For Leak Detection In Automobiles. Journal of Innovative Image Processing, 01 (02), pp. 61–70.  Kim J H, Song J H and Lim D H 2020. CT Image Denoising Using Inception Model. Journal of the Korean Data And Information Science Society, 31 (3), pp. 487–501.  Bhargav P, Sree Lakshmi Keerthi B S L, Charitha K, Sarath B and Pratap A R 2020 Face Clustering On Image Repository Using Convolutional Neural Network. Int. Journal of Psychosocial Rehabilitation, 24 (5), pp. 5104–11.  Fauzi F, Szulczyk K and Basyith A 2018 Moving In The Right Direction To Fight Financial Crime: Prevention And Detection. Journal of Financial Crime, 25 (2), pp. 362–368.  Du S, Zhang B, Zhang P, Xiang P and Du H 2021, FA-YOLO: An Improved YOLO Model For Infrared Occlusion Object Detection Under Confusing Background. Wireless Communications and Mobile Computing, 1896029.  Sharma J, Giri C, Granmo O C and Goodwin M 2019 Multi-Layer Intrusion Detection System With Extratrees Feature Selection, Extreme Learning Machine Ensemble, And Softmax Aggregation. EURASIP Journal on Information Security, 15.  Xu B 2021 Improved Convolutional Neural Network in Remote Sensing Image Classification. Neural Computing and Applications, 33, pp. 8169–80.
Copyright © 2023 Achal Dattu Khaperkar, Durvanshi Kishor Khapekar, Pooja Wasudev Lanjewar, V. Jayshree Ashish Naidu, Prof. Aachal Wani. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.