Automated anomaly detection in surveillance video has received considerable research attention over the past decade, yet a recurring limitation across the literature is that most systems are designed and evaluated under clean, controlled conditions. In practice, outdoor cameras routinely operate in fog, rain, low light, and snow, and very few published methods have been tested under such conditions. This survey examines twenty-one papers published between 2016 and 2024 that are relevant to detecting four specific event types: arson, physical intrusion, loitering, and abandoned objects. The papers cover a range of techniques including multiple instance learning [1], YOLO-based detection [9, 10], multi-object tracking via SORT [7] and DeepSORT [8], memory-augmented autoencoders [13], and vision-language approaches built on CLIP [2]. Our main finding is that while individual event detectors perform well on standard benchmarks, no existing system handles all four event types reliably under adverse weather. We document the specific failure modes and identify the research gaps that need to be addressed.
Introduction
The document reviews research on surveillance video anomaly detection systems and highlights how deep learning, especially models like CNNs, YOLO, and CLIP, has significantly improved the ability to detect events such as intrusion, fire, loitering, abandoned objects, and arson. A major focus is on the limitations of current systems, which are mostly evaluated under ideal conditions but perform poorly in real-world outdoor environments affected by rain, fog, snow, and low light. Another key issue is that most systems are designed for single event detection, making it difficult to build unified real-time surveillance solutions.
Among existing approaches, PASS-CCTV is identified as the only system that integrates multiple event types and evaluates performance under adverse weather conditions, though it still requires manual tuning and struggles with scalability. The review also compares different methods across tracking, fire detection, intrusion detection, loitering, and abandoned object detection, showing that performance heavily depends on reliable object tracking and environmental conditions.
Conclusion
This survey reviewed twenty-one papers covering deep learning-based anomaly detection for surveillance video, focusing on arson, intrusion, loitering, and abandoned objects under adverse weather conditions.
Across the reviewed work, genuine progress has been made on individual components. The MIL framework of Sultani et al. [1] established a viable path for weakly supervised training on real surveillance footage. SORT [7] and DeepSORT [8] provide practical multi-object trackers that support zone-based anomaly analysis. YOLO [9, 10] made detection fast enough for real-time use. MemAE [13] and object-centric autoencoders [12] improved the selectivity of reconstruction-based anomaly scoring. CLIP [2] enabled zero-shot event recognition that outperformed task-specific detectors on arson in [6]. The ConvLSTM pipeline of Qasim et al. [15] achieved strong accuracy on available abandoned-object benchmarks. These are solid individual contributions.
The more difficult question is how these contributions hold up when integrated and tested under conditions that reflect actual outdoor deployment. PASS-CCTV [6] is the closest thing to an answer that the current literature provides. It shows that multi-event detection is technically feasible and that reasonable performance under some adverse conditions is achievable. The gap between its validation results and its abroad-subset results also shows that there is still work to do on generalization and weather-specific robustness.
For the accompanying project, we plan to build on the PASS-CCTV architecture [6] and incorporate weather-aware preprocessing strategies informed by the findings of Zhang et al. [20] and the enhancement approaches of Liang et al. [18] and Li et al. [19]. We acknowledge that the available public datasets do not fully represent the target deployment conditions, which will constrain the strength of our experimental conclusions. Even so, demonstrating measurable improvement under controlled weather-variation scenarios, and clearly documenting what does and does not transfer across conditions, would represent a useful step toward a practically deployable system.
References
[1] W. Sultani, C. Chen, and M. Shah, \"Real-World Anomaly Detection in Surveillance Videos,\" Proc. IEEE/CVF CVPR, Salt Lake City, UT, USA, Jun. 2018, pp. 6479-6488. DOI: 10.1109/CVPR.2018.00657.
[2] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, \"Learning Transferable Visual Models from Natural Language Supervision,\" Proc. ICML, 2021, pp. 8748-8763. arXiv:2103.00020.
[3] G. Pang, C. Shen, L. Cao, and A. V. D. Hengel, \"Deep Learning for Anomaly Detection: A Review,\" ACM Computing Surveys, vol. 54, no. 2, pp. 1-38, Mar. 2021. DOI: 10.1145/3439950.
[4] B. Ramachandra, M. J. Jones, and R. R. Vatsavai, \"A Survey of Single-Scene Video Anomaly Detection,\" IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 5, pp. 2293-2312, 2020. DOI: 10.1109/TPAMI.2020.2994089.
[5] M. Abdalla, S. Javed, M. Al Radi, A. Ulhaq, and N. Werghi, \"Video Anomaly Detection in 10 Years: A Survey and Outlook,\" Neural Computing and Applications, 2024. arXiv:2405.19387. DOI: 10.1007/s00521-025-11659-8.
[6] H. Jeon, H. Kim, D. Kim, and J. Kim, \"PASS-CCTV: Proactive Anomaly Surveillance System for CCTV Footage Analysis in Adverse Environmental Conditions,\" Expert Systems with Applications, vol. 254, p. 124391, Nov. 2024. DOI: 10.1016/j.eswa.2024.124391.
[7] A.Bewley, Z. Ge, L. Ott, F. Ramos, and B. Upcroft, \"Simple Online and Realtime Tracking,\" Proc. IEEE ICIP, Phoenix, Z, USA, Sep. 2016, pp. 3464-3468. arXiv:1602.00763.
[8] N. Wojke, A. Bewley, and D. Paulus, \"Simple Online and Realtime Tracking with a Deep Association Metric,\" Proc. IEEE ICIP, Beijing, China, Sep. 2017, pp. 3645-3649. arXiv:1703.07402.
[9] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, \"You Only Look Once: Unified, Real-Time Object Detection,\" Proc. IEEE/CVF CVPR, Las Vegas, NV, USA, Jun. 2016, pp. 779-788. DOI: 10.1109/CVPR.2016.91.
[10] G. Jocher, A. Chaurasia, A. Stoken, J. Borovec, NanoCode012, Y. Kwon, K. Michael, TaoXie, J. Fang, imyhxy, L. Alabdulmohsin et al., \"ultralytics/yolov5: v6.0,\" Zenodo, Nov. 2021. DOI: 10.5281/zenodo.3908559.
[11] K. Muhammad, J. Ahmad, I. Mehmood, S. Rho, and S. W. Baik, \"Convolutional Neural Networks Based Fire Detection in Surveillance Videos,\" IEEE Access, vol. 6, pp. 18174-18183, Mar. 2018. DOI: 10.1109/ACCESS.2018.2812835.
[12] R. T. Ionescu, F. S. Khan, M.-I. Georgescu, and L. Shao, \"Object-Centric Auto-Encoders and Dummy Anomalies for Abnormal Event Detection in Video,\" Proc. IEEE/CVF CVPR, Long Beach, CA, USA, Jun. 2019, pp. 7842-7851. DOI: 10.1109/CVPR.2019.00796.
[13] D. Gong, L. Liu, V. Le, B. Saha, M. R. Mansour, S. Venkatesh, and A. v. d. Hengel, \"Memorizing Normality to Detect Anomaly: Memory-Augmented Deep Autoencoder for Unsupervised Anomaly Detection,\" Proc. IEEE/CVF ICCV, Seoul, South Korea, Oct. 2019, pp. 1705-1714. DOI: 10.1109/ICCV.2019.00180.
[14] K. Doshi and Y. Yilmaz, \"Continual Learning for Anomaly Detection in Surveillance Videos,\" Proc. IEEE/CVF CVPR Workshops, Seattle, WA, Jun. 2020, pp. 254-255. arXiv:2008.02787.
[15] A. M. Qasim, N. Abbas, A. Ali, and B. A. A. Al-Ghamdi, \"Abandoned Object Detection and Classification Using Deep Embedded Vision,\" IEEE Access, vol. 12, pp. 30786-30798, Feb. 2024. DOI: 10.1109/ACCESS.2024.3369233.
[16] J. C. Nunez, M. Berge, and T. Moeslund, \"Identifying Loitering Behavior with Trajectory Analysis,\" Proc. IEEE/CVF WACVW, Waikoloa, HI, Jan. 2024. DOI: 10.1109/WACVW60836.2024.00035.
[17] R. Nayak, U. C. Pati, and S. K. Das, \"A Survey on Deep Learning-Based Methods for Perimeter Intrusion Detection,\" MDPI Sensors vol. 22, no. 9, p. 3601, 2022. DOI: 10.3390/s22093601.
[18] J. Liang, Y. Yang, B. Li, P. Duan, Y. Xu, and B. Shi, \"Coherent Event Guided Low-Light Video Enhancement,\" Proc. IEEE/CVF ICCV, Paris, France, Oct. 2023, pp. 10615-10625.
[19] C. Li, C. Guo, W. Han, J. Gu, M.-M. Cheng, J. Cheng, and C. C. Loy, \"Low-Light Image and Video Enhancement Using Deep Learning: A Survey,\" IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 12, pp. 9396-9416, Dec. 2021. arXiv:1805.10536.
[20] Y. Zhang, H. Chen, X. Li, K. Wang, and Y. Yu, \"Benchmarking Deep Learning for Adverse Weather Object Detection,\" arXiv:2103.15114, 2021.
[21] S. Khan, H. Rahmani, S. A. A. Shah, and M. Bennamoun, \"A Survey on Video Anomaly Detection,\" arXiv:2105.03858, May 2021.