Violent incidents threaten public safety and current surveillance systems still suffer from latencies, false positives, and very little meaningful response. This paper proposes an automated and adaptive, real-time detection of violence system by fusing YOLOv8-based detection of humans and lethal weapon localization with a MobileNetV2 classifier. An adaptive image enhancement task takes into account lighting differences. A temporal filter requires that a positive report must be maintained for 30 consecutive frames prior to triggering an alert. Upon confirmation, an image with a timestamp and geolocation metadata is sent to a Telegram bot
Introduction
1. Introduction
Security and surveillance systems are essential for public safety but often lack the ability to detect violent activities in real time. This project proposes a deep learning-based real-time violence detection system using Convolutional Neural Networks (CNNs) to automatically identify and alert authorities during violent incidents.
2. Importance of Real-Time Violence Detection
Real-time detection can significantly enhance:
Public safety by enabling faster law enforcement response
Security in businesses by protecting employees and customers
School and campus safety by preventing escalation of violence
3. Related Work
Deep learning: CNN models like ResNet, Inception, and MobileNet have shown promise in video analysis.
Object detection: YOLO algorithms are efficient for detecting humans and weapons in real-time feeds.
Image enhancement: Techniques like histogram equalization improve detection accuracy.
Alert systems: Some systems integrate real-time alerts using messaging apps for fast response.
4. Problem Statement
To build a real-time surveillance system capable of recognizing violent behavior and sending automated alerts to authorities.
5. Objectives
Detect human presence using YOLOv4 or YOLOv8
Identify violent actions with MobileNetV2
Analyze continuous video frames in real-time
Trigger an alert system when violence is detected
Notify officials with timestamped images and location data via Telegram
6. Methodology
Dataset: 1,000 video clips (5s each) split equally between violent and non-violent scenes; training uses 350 from each category per epoch.
MobileNetV2: Efficient CNN with depth-wise separable convolutions and inverted residual blocks to detect violence with reduced computation.
Image Enhancement: Improves brightness and color using Python Imaging Library (PIL).
Alert Module:
Detects violence in frames
Sends alert if 30 consecutive frames confirm violence
Shares image, timestamp, and location with officials via Telegram
7. System Architecture
The system integrates:
YOLO for object (human/weapon) detection
MobileNetV2 for violence recognition
Image enhancement for clearer visuals
Real-time alert dispatch system for automated responses
Conclusion
Violence scene detection in real-time is a challenging problem due to the diverse content and large variations quality. In this research, we use the MobileNet v2 model to offer an innovative and efficient technique for identifying violent events in real-time surveillance footage. The proposed network has a good recognition accuracy in typical benchmark datasets, indicating that it can learn discriminative motion saliency maps successfully. It’s also computationally efficient, making it ideal for use in time-critical applications and low-end devices. Here, we had also shown the working of an Alert system that is integrated with the pretrained model. In comparison to other stateoftheart approaches, this methodology will give a far superior option.
References
1) Real time violence detection in surveillance videos using Convolutional Neural Networks1: This paper discusses the use of a MobileNet model for real-time violence detection in surveillance videos. The model was compared with AlexNet, VGG-16, and GoogleNet models, and it showed outstanding performance in terms of accuracy, loss, and computation time on the hockey fight dataset.
2) Violence Detection in Surveillance Videos with Deep Network Using Transfer Learning2: This paper proposed a deep representation-based model using the concept of transfer learning for violent scenes detection to identify aggressive human behaviors. The proposed approach outperformed state-of-the-art accuracies by learning most discriminating features achieving high accuracies on Hockey and Movies datasets.
3) Violence Detection In Surveillance Videos Using Deep Learning3: This paper proposes a triple-staged end-to-end deep learning violence detection framework. First, persons are detected in the surveillance video stream using a light-weight convolutional neural network (CNN) model to reduce and overcome the voluminous processing of useless frames.
4) Violence Detection in Videos using Deep Recurrent and Convolutional Neural Networks4: This work proposes a deep learning architecture for violence detection, which combines both recurrent neural networks (RNNs) and 2-dimensional convolutional neural networks (2D CNN). In addition to video frames, they use optical flow computed using the captured sequences.
And here are some literature reviews:
1) Challenges and Methods of Violence Detection in Surveillance Video: A Survey5:
2) This article presents a survey of the latest methods of violence detection in video sequences. It exposes the main challenges in this area and classifies the methods into five broad categories.
3) An overview of violence detection techniques: current challenges and future directions6: This paper focuses on an overview of deep sequence learning approaches along with localization strategies of the detected violence. It also dives into the initial image processing and machine learning-based VD literature and their possible advantages such as efficiency against the current complex models.
4) A Review of Computer Vision Techniques for Video Violence Detection and Classification7: Using an extensive literature review, this research investigates and analyzes various methods for recognizing violence from surveillance cameras using computer vision.
5) State-of-the-art violence detection techniques in video surveillance: A systematic review8: In this systematic review, they provide a comprehensive assessment of the video violence detection problems that have been described in state-of-the-art researches.
A. Blogs
a) MobileNet v2 for violence detection:
1) Real Time Violence Detection | MobileNet Bi-LSTM | Kaggle: This blog post on Kaggle discusses the use of MobileNet Bi-LSTM for real-time violence detection.
2) Real Time Violence Detection using MobileNet and Bi-directional LSTM - GitHub:
3) This GitHub repository provides code and documentation for a project that uses MobileNet and Bi-directional LSTM for real-time violence detection.
4) Lightweight mobile network for real-time violence recognition: This blog post discusses the use of a lightweight network model, MobileNet-TSM, for real-time violence recognition.
5) Violence Detection Using MobileNet-V2: This GitHub repository provides code for a project that uses MobileNet V2 for violence detection.
6) Efficient Violence Detection in Surveillance Videos - MDPI: This paper presents a novel architecture for violence detection from video surveillance cameras using a U-Net-like network that uses MobileNet V2 as an encoder followed by LSTM for temporal feature extraction.
b) YOLO v4 for object detection:
1) 1. YOLOv4 Object Detection Tutorial with Image and Video - MLK: This blog post provides a beginner’s guide to YOLOv4 object detection with images and videos.
2) 2. YOLO: Algorithm for Object Detection Explained [+Examples] - Medium: This Medium article explains the YOLO algorithm for object detection and discusses different versions of YOLO, including YOLOv4.
3) 3. YOLOv4 - Ten Tactics to Build a Better Model - Roboflow Blog: This blog post discusses ten advanced tactics in YOLOv4 to build a better object detection model.
c) YOLOv8-Based Human and Weapon Localization
To efficiently detect humans and potential weapons in each frame, we integrate the latest YOLOv8 single-stage detector [7] as the first stage of our pipeline. Key details:
1) Backbone & Neck: YOLOv8 uses CSPDarknet as its feature extractor, followed by a PANet-style neck that aggregates multi-scale features for robust small-object detection.
2) Anchor-Free Detection: Unlike previous YOLO versions, YOLOv8 employs anchor-free “centroid” prediction, simplifying training and improving generalization to varied aspect ratios.
3) Detection Head: For each of the three output scales, the head predicts a 4D bounding-box offset plus two class confidences (human, weapon) and objectness.
4) Pre- &Post-Processing:
• Input: Frames are letter-boxed to 640×640, normalized to [0,1], and fed as a batch of size 1.
• Non-Max Suppression (NMS): We apply NMS with IoU threshold 0.45 and confidence threshold 0.25 to prune overlapping detections.
5) Performance: On our test videos, YOLOv8 was able to correctly find humans with an accuracy of 98.7% and weapons with an accuracy of 95.2%. It also runs very fast, processing about 50 frames per second on an NVIDIA GTX 1080 Ti graphic card, which means it works almost instantly without causing any delay before sending the frames for violence detection.
By front-loading the pipeline with this lightweight yet accurate detector, we guarantee that only regions containing people or possible weapons are forwarded to the MobileNetV2 violence classifier, reducing false positives and computational overhead.