Real-world security installations need to catch several different kinds of incidents simultaneously, yet the dominant research trend produces single-event models that are awkward to combine into one working system. Mismatched interfaces, separate calibration pipelines, and incompatible alert formats all create friction that individual benchmark papers simply do not address. PADS, the Proactive Anomaly Detection System introduced in this paper, handles four security-critical event categories inside one unified processing chain with no need for GPU hardware or domain-specific retraining. The four targets are unauthorised entry into restricted zones, prolonged presence inside monitored areas, unattended personal luggage, and fire or arson activity. Video frames are routed simultaneously to two modules: a Surveillance Module that builds and maintains stateful person tracks over time, and an Arson Module that analyses a short rolling window of recent frames for fire signatures. Tracking combines YOLOv8 detections with OSNet appearance embeddings through a two-stage matcher that uses cosine similarity in the first stage and falls back to spatial IoU in the second. A trajectory-span filter eliminates false tracklets generated by detector noise on static scene elements. For fire, the system moves away from CLIP as the sole mechanism and instead fuses HSV colour-space masking with temporal per-pixel variance to exploit the flickering property of real flames, something static orange objects cannot replicate. CLIP remains available as a secondary confidence enhancer only. Tests on ABODA abandoned-object footage, publicly sourced fire clips, and custom intrusion and loitering recordings show reliable detection across all four categories, alert latency under one second for zone events, and near-zero false alarms on footage containing orange safety gear where earlier colour-only approaches produce unacceptable error rates.
Introduction
The document describes a real-time CCTV-based Proactive Anomaly Detection System (PADS) designed to unify multiple security tasks—fire detection, intrusion, loitering, and abandoned object detection—into a single pipeline and improve reliability over existing fragmented systems.
Urban CCTV systems generate massive video data, but human monitoring is unreliable due to attention fatigue, and most existing computer vision systems handle only one task at a time (e.g., fire detection or intrusion). A prior unified system (PASS-CCTV) showed strong performance but suffered from key issues such as false fire alarms (due to color confusion), tracking failures during occlusion, and incorrect abandoned-object detection. PADS is proposed to fix these limitations.
Key contributions
PADS introduces three main improvements:
A dual-signal fire detection method combining HSV color filtering and temporal flicker analysis to distinguish real flames from static orange objects.
A 5-frame lookback mechanism for abandoned-object ownership tracking to prevent loss of association during brief occlusions.
A recalibrated trajectory-based filter (ATM) to better distinguish real people from static false detections like poles or signs.
System design
The system is modular with two main components:
Surveillance Module: Handles person/luggage detection, multi-object tracking (YOLOv8 + OSNet), zone-based intrusion/loitering detection, and abandonment tracking using improved association logic.
Arson Module: Detects fire using color segmentation + flickering (temporal variance) and optionally enhances confidence using CLIP-based text-image similarity.
Tracking is improved using a two-stage matching cascade:
Appearance-based matching (cosine similarity of embeddings via OSNet)
Spatial IoU-based fallback matching for occluded or degraded cases
Zone-based detection uses polygon overlap for intrusion and timed thresholds for loitering, while abandoned object detection relies on stable ownership tracking.
Fire detection approach
Instead of relying mainly on CLIP (which caused false positives on orange objects in CCTV footage), the system prioritizes:
HSV color detection
Temporal flicker analysis across frames
Only sustained detections trigger alerts, reducing noise and false alarms.
Methodology and implementation
The system is built in Python using YOLOv8 for detection, OSNet for re-identification, OpenCV for image processing, and Hungarian matching for tracking assignment. It is designed to run in real time on CPU-only hardware.
Overall idea
PADS improves CCTV-based security monitoring by:
Unifying multiple anomaly detection tasks
Strengthening tracking stability
Reducing false alarms (especially for fire and abandoned objects)
Making the system more practical for real-world deployment
Conclusion
PADS addresses a real gap in video surveillance technology: the difficulty of combining multiple single-event detection systems into a coherent, deployable installation. Rather than producing yet another isolated detector, this work builds a four-event pipeline from a shared tracking foundation, adding three targeted improvements over the PASS-CCTV architecture [6] that the prior survey [S1] identified as its practical weak points.
The dual-signal fire detector replaces CLIP-only fire detection with an approach grounded in the physical behaviour of flames, eliminating the false-alarm problem that made colour-only and CLIP-only methods unreliable in environments containing orange safety equipment. The five-frame ownership lookback window removes a systematic false abandonment alert that affected every scenario with momentary occlusion. The recalibrated ATM filter suppresses non-human tracklets without penalising persons who pause, a correction the original 80-pixel threshold failed to make.
The complete system runs on a consumer laptop without GPU acceleration, requires no task-specific training on new data, and produces structured alert logs suitable for integration with existing security infrastructure. Alert latency for zone events stays below one second. Fire detection completes within three to six seconds of sustained flame onset.
Several directions remain open. Applying dehazing and rain-streak removal before detection, following the findings of Zhang et al. [20], would extend reliable performance to adverse weather without retraining.
Automatically calibrating loitering duration and abandonment distance thresholds from observed scene statistics would reduce per-site setup time. Maintaining person identities across non-overlapping camera views would extend coverage to larger physical spaces. And the modular architecture accommodates additional event categories, such as crowd density monitoring or detected aggression, without structural changes to the existing pipeline.
References
[1] W. Sultani, C. Chen, and M. Shah, \"Real-World Anomaly Detection in Surveillance Videos,\" Proc. IEEE/CVF CVPR, Salt Lake City, UT, USA, Jun. 2018, pp. 6479-6488. DOI: 10.1109/CVPR.2018.00657.
[2] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, \"Learning Transferable Visual Models from Natural Language Supervision,\" Proc. ICML, 2021, pp. 8748-8763. arXiv:2103.00020.
[3] G. Pang, C. Shen, L. Cao, and A. V. D. Hengel, \"Deep Learning for Anomaly Detection: A Review,\" ACM Computing Surveys, vol. 54, no. 2, pp. 1-38, Mar. 2021. DOI: 10.1145/3439950.
[4] B. Ramachandra, M. J. Jones, and R. R. Vatsavai, \"A Survey of Single-Scene Video Anomaly Detection,\" IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 5, pp. 2293-2312, 2020. DOI: 10.1109/TPAMI.2020.2994089.
[5] M. Abdalla, S. Javed, M. Al Radi, A. Ulhaq, and N. Werghi, \"Video Anomaly Detection in 10 Years: A Survey and Outlook,\" Neural Computing and Applications, 2024. arXiv:2405.19387. DOI: 10.1007/s00521-025-11659-8.
[6] H. Jeon, H. Kim, D. Kim, and J. Kim, \"PASS-CCTV: Proactive Anomaly Surveillance System for CCTV Footage Analysis in Adverse Environmental Conditions,\" Expert Systems with Applications, vol. 254, p. 124391, Nov. 2024. DOI: 10.1016/j.eswa.2024.124391.
[7] A.Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, \"Simple Online and Realtime Tracking,\" Proc. IEEE ICIP, Phoenix, Z, USA, Sep. 2016, pp. 3464-3468. arXiv:1602.00763.
[8] N. Wojke, A. Bewley, and D. Paulus, \"Simple Online and Realtime Tracking with a Deep Association Metric,\" Proc. IEEE ICIP, Beijing, China, Sep. 2017, pp. 3645-3649. arXiv:1703.07402.
[9] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, \"You Only Look Once: Unified, Real-Time Object Detection,\" Proc. IEEE/CVF CVPR, Las Vegas, NV, USA, Jun. 2016, pp. 779-788. DOI: 10.1109/CVPR.2016.91.
[10] G. Jocher, A. Chaurasia, A. Stoken, J. Borovec, NanoCode012, Y. Kwon, K. Michael, TaoXie, J. Fang, imyhxy, L. Alabdulmohsin et al., \"ultralytics/yolov5: v6.0,\" Zenodo, Nov. 2021. DOI: 10.5281/zenodo.3908559.
[11] K. Muhammad, J. Ahmad, I. Mehmood, S. Rho, and S. W. Baik, \"Convolutional Neural Networks Based Fire Detection in Surveillance Videos,\" IEEE Access, vol. 6, pp. 18174-18183, Mar. 2018. DOI: 10.1109/ACCESS.2018.2812835.
[12] R. T. Ionescu, F. S. Khan, M.-I. Georgescu, and L. Shao, \"Object-Centric Auto-Encoders and Dummy Anomalies for Abnormal Event Detection in Video,\" Proc. IEEE/CVF CVPR, Long Beach, CA, USA, Jun. 2019, pp. 7842-7851. DOI: 10.1109/CVPR.2019.00796.
[13] D. Gong, L. Liu, V. Le, B. Saha, M. R. Mansour, S. Venkatesh, and A. v. d. Hengel, \"Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection,\" Proc. IEEE/CVF ICCV, Seoul, South Korea, Oct. 2019, pp. 1705-1714. DOI: 10.1109/ICCV.2019.00180.
[14] K. Doshi and Y. Yilmaz, \"Continual Learning for Anomaly Detection in Surveillance Videos,\" Proc. IEEE/CVF CVPR Workshops, Seattle, WA, Jun. 2020, pp. 254-255. arXiv:2008.02787.
[15] A. M. Qasim, N. Abbas, A. Ali, and B. A. A. Al-Ghamdi, \"Abandoned Object Detection and Classification Using Deep Embedded Vision,\" IEEE Access, vol. 12, pp. 30786-30798, Feb. 2024. DOI: 10.1109/ACCESS.2024.3369233.
[16] J. C. Nunez, M. Berge, and T. Moeslund, \"Identifying Loitering Behavior with Trajectory Analysis,\" Proc. IEEE/CVF WACVW, Waikoloa, HI, Jan. 2024. DOI: 10.1109/WACVW60836.2024.00035.
[17] R. Nayak, U. C. Pati, and S. K. Das, \"A Survey on Deep Learning-Based Methods for Perimeter Intrusion Detection,\" MDPI Sensors vol. 22, no. 9, p. 3601, 2022. DOI: 10.3390/s22093601.
[18] J. Liang, Y. Yang, B. Li, P. Duan, Y. Xu, and B. Shi, \"Coherent Event Guided Low-Light Video Enhancement,\" Proc. IEEE/CVF ICCV, Paris, France, Oct. 2023, pp. 10615-10625.
[19] C. Li, C. Guo, W. Han, J. Gu, M.-M. Cheng, J. Cheng, and C. C. Loy, \"Low-Light Image and Video Enhancement Using Deep Learning: A Survey,\" IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 12, pp. 9396-9416, Dec. 2021. arXiv:1805.10536.
[20] Y. Zhang, H. Chen, X. Li, K. Wang, and Y. Yu, \"Benchmarking Deep Learning for Adverse Weather Object Detection,\" arXiv:2103.15114, 2021.
[21] S. Khan, H. Rahmani, S. A. A. Shah, and M. Bennamoun, \"A Survey on Video Anomaly Detection,\" arXiv:2105.03858, May 2021.