Maintaining safety within public areas like schools and ensuring fairness in examination halls are of paramount importance in the modern educational landscape. Traditional surveillance systems depend heavily on manual monitoring, which is inherently prone to human error, fatigue, and causes critical delays in identifying sporadic events. This paper presents the implementation of a comprehensive, multi-scenario automated surveillance system that leverages advanced deep learning and computer vision techniques to actively analyze closed-circuit television (CCTV) feeds. We categorize our approach into two distinct routing pipelines: a campus safety mode utilizing YOLOv7 for rapid human detection combined with a MobileNet-BiLSTM classifier for spatiotemporal violence recognition , and an academic integrity mode utilizing an improved SE-YOLOv8 model coupled with a ResNet 3D Convolutional Neural Network (CNN) for subtle cheating detection. By employing Squeeze Aggregated Excitation (SaE) modules and Deep Keyframe Detection, the proposed framework minimizes computational overhead. We analyze benchmark datasets, evaluation metrics, and overall performance, demonstrating that this hybrid framework achieves a 91% accuracy in violence detection and a 96.0% mean Average Precision (mAP) in cheating recognition. The deep learning backend is further integrated with a full-stack administrative dashboard and automated alerting mechanism, bridging the gap between theoretical computer vision models and deployable institutional needs.
Introduction
This text describes a dual-model AI-based surveillance system designed to improve security monitoring in educational institutions by overcoming the limitations of traditional CCTV systems.
The main problem identified is that conventional surveillance relies heavily on human operators, who often miss critical events due to fatigue, limited attention, and the difficulty of monitoring multiple video streams. This leads to delayed or missed detection of incidents such as violence or academic dishonesty.
To solve this, the paper proposes a deep learning-based intelligent surveillance framework that shifts from passive recording to real-time automated threat detection. The system uses optimized YOLO-based object detection models combined with temporal classifiers to analyze both spatial (what is in a frame) and temporal (how actions evolve over time) information.
The literature review explains that earlier methods like motion detection and optical flow were simple but unreliable in complex environments. Modern approaches using CNNs improved spatial accuracy but lacked motion understanding, while LSTM and 3D CNNs captured temporal patterns but were computationally expensive. As a result, hybrid models combining fast detectors (like YOLOv7) with lightweight classifiers are considered the most practical solution.
For violence detection, YOLO-based systems provide fast object detection, but need temporal models for action recognition. For cheating detection in exams, more complex models are required due to crowded seating and subtle behaviors, often using combinations of 2D and 3D networks with MLP-based enhancements for better feature learning.
Conclusion
Surveillance systems in educational institutions must transition from passive recording to active, intelligent monitoring to ensure the safety of students and the integrity of academic processes. This paper presented the comprehensive implementation of a dual-pipeline, multi-scenario framework designed to efficiently categorize and analyze video feeds. By utilizing Deep Keyframe Detection to minimize computational waste, followed by a YOLOv7 and MobileNet-BiLSTM combination, the system achieved a 91% accuracy in violence detection with rapid frame processing speeds. Concurrently, the MLP-enhanced SE-YOLOv8 combined with a ResNet 3D CNN successfully mitigated temporal confusion, achieving a 96.0% mAP50 for subtle cheating recognition.
The integration of these deep learning models with a robust full-stack administrative dashboard and automated Telegram alerting mechanism elevates this research into a practical, deployable product. This unified approach successfully balances computational efficiency with high detection accuracy, providing a modern, automated solution that significantly reduces the reliance on manual human monitoring in educational environments.
References
[1] S. Senthilkumar, S. Kolte, G. Agarwal, and A. Shirish, \"Real Time Violence Detection System using YOLOv7 and Deep Learning Techniques,\" 2025 3rd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), IEEE, 2025, pp. 1447-1454.
[2] J. Lu, J. Wang, N. Song, Z. Luo, W. Zhang, and Y. Wang, \"Cheating Recognition in Examination Halls Based on Improved YOLOv8,\" 2024 International Conference on Artificial Intelligence of Things and Systems (AIoTSys), IEEE, 2024.
[3] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, \"You only look once: Unified, real-time object detection,\" Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779-788.
[4] K. He, X. Zhang, S. Ren, and J. Sun, \"Deep residual learning for image recognition,\" Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
[5] M. Narayanan, \"SENetV2: Aggregated dense layer for channelwise and global representations,\" arXiv preprint arXiv:2311.10807, 2023.
[6] X. Yan, S. Z. Gilani, H. Qin, M. Feng, L. Zhang, and A. Mian, \"Deep keyframe detection in human action videos,\" arXiv preprint arXiv:1804.10021, 2018.
[7] R. Kumar, A. Gupta, and D. Rajeswari, \"Violence Detection System using MobileNetV2,\" 2024 3rd International Conference on Applied Artificial Intelligence and Computing (ICAAIC), IEEE, 2024, pp. 1555-1560.
[8] L. A. Siddique, R. Junhai, T. Reza, S. S. Khan, and T. Rahman, \"Analysis of Real-Time Hostile Activity Detection from Spatiotemporal Features Using Time Distributed Deep CNNs, RNNs and Attention-Based Mechanisms,\" arXiv preprint arXiv:2302.11027, 2023.
[9] L. Rahmawati, S. Rustad, A. Marjuni, M. A. Soeleman, and P. N. Andono, \"Foggy-Based Object Detection In Video Using Faster R-CNN, YOLOv3, and SSD,\" 2023 International Seminar on Application for Technology of Information and Communication (iSemantic), IEEE, 2023, pp. 412-416.
[10] M. Malhotra and I. Chhabra, \"Automatic invigilation using computer vision,\" International Conference on Integrated Intelligent Computing Communication & Security, Atlantis Press, 2021, pp. 130-136.