Automated object detection has been a significant component in intelligent monitoring systems in recent years. Monitoring of objects manually is time-consuming and could give rise to various errors. Therefore, an automated system is supposed to recognize and analyze objects correctly. The object detection system proposed in this project is a Machine Learning and Computer Vision based system. This system is aimed at identifying four major categories including: cars, JCB machines, crowd, and potholes, making it possible to monitor the areas of the population, construction, and other infrastructures. The materials were gathered as annotated datasets on an open-source site like Roboflow and Kaggle and enhanced by methods like flipping, rotation, and mosaic augmentation. YOLOv8 is an improved object detector compared to the older models such as the YOLOv3 and Faster R-CNN, hence it is more accelerated and precise. Upon training and testing, the models had an average Average Precision (mAP at 0.5 ) of up to 87.2% on crowd detection and 83.8% on vehicle and machinery detection with regard to the detection of the designated objects.
Introduction
The paper proposes a real-time object detection system for smart city and infrastructure monitoring using the YOLOv8 deep learning model. Increasing urbanization has led to traffic congestion, construction activity, and public safety concerns, while traditional CCTV-based monitoring is slow, manual, and not scalable for real-time use.
To solve this, the study develops a YOLOv8-based multi-class detection system capable of identifying vehicles, crowds, potholes, and construction machinery (JCBs) from aerial and traffic images. Compared to older two-stage models (like Faster R-CNN), YOLO provides faster, single-pass detection suitable for real-time applications.
The system is trained using datasets from sources like Roboflow and Kaggle, which are cleaned, re-annotated, and augmented (flipping, rotation, brightness changes, mosaic/mixup). The dataset is split into training, validation, and testing sets, and YOLOv8 is trained using transfer learning from COCO-pretrained weights to improve efficiency and reduce training cost.
The methodology includes:
Data collection, annotation, and augmentation
Dataset preparation and splitting
YOLOv8 training with loss optimization (bounding box, classification, objectness losses)
Validation to prevent overfitting
Testing using metrics like Precision, Recall, and mAP
The final system achieves real-time, high-speed detection with good accuracy, making it suitable for traffic monitoring, construction site supervision, and smart city infrastructure management, while improving efficiency over traditional surveillance and older computer vision methods.
Conclusion
The project was able to create an effective tracking system that is built on the YOLOv8 architecture to detect vehicles, crowds, heavy machinery, and road damage. This model proved to be very suitable and rapid in the localization and detection of these objects in a wide scope of real life situations.
As the experimental results show, YOLOv8 is one of the most suitable models in this application that beats the older architecture like Faster R-CNN. The system has a maximum processing rate of 150 FPS; it is optimal in a live application in smart cities and highway surveillance. Although the model was most accurate with large and distinct objects, e.g. cars and JCB machines, its capability of detecting smaller road faults such as a pothole even with a lower confidence level is vital in avoiding failures of infrastructure.
The system has several ways into which it can be enhanced in future. To amplify the dataset with a greater variety of images i.e. those that were taken during heavy rain, fog or night, etc. would make the model more accurate in the challenging situations. The system could also be expanded to a more sophisticated traffic analysis system that would be able to estimate the speed of the vehicles and automatically identify license plates. Lastly, an alarm system might be adopted to give real-time information about the precise positions of the potholes to the repair crews so that they might undertake instant fixes and enhance the roads.
References
[1] A. B. Amjoud and M. Amrouch, \"Object Detection Using Deep Learning, CNNs and Vision Transformers: A Review,\" in IEEE Access, vol. 11, pp. 35479-35516, 2023, doi: 10.1109/ACCESS.2023.3266093.
[2] C. Dewi, R. -C. Chen, Y –C. Zhuang and W. E. Manongga, “Image Enhancement Method Utilizing YOLO models to Recongnize Road Markings at Night,” in IEEE Access, vol. 12, pp. 131065-131081, 2024, doi: 10.1109/ACCESS.2024.3440253.
[3] Ms. Shaik Ishrath Anjum, Ms.Syed Roohi, “Real-Time Object Detection using The YOLO Algorithm With the Opencv framework,” in IJCRT.ORG, vol 11, ISSN: 2320-2882, 2023 IJCRT.
[4] Thomas Moranduzzon, Farid Melgani, “An object Detection Technique for very high resolution remote sensing images,” in WoSSPA 2013.
[5] Yaru Cao, Zhijian He, Lujia Wang, Wenguan Wang, “VisDrone- DET2021: The Vision Meets Drone Object detection Challenge Result,” in IEEE/CVF (ICCVW) 2021.
[6] E. Arkin, N. Yadikar, Y. Muhtar and K. Ubul, \"A Survey of Object Detection Based on CNN and Transformer,\" 2021 IEEE 2nd International Conference on Pattern Recognition and Machine Learning (PRML), Chengdu, China, 2021, pp. 99-108, doi: 10.1109/PRML52754.2021.9520732.
[7] R. Varghese and S. M., \"YOLOv8: A Novel Object Detection Algorithm with Enhanced Performance and Robustness,\" 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS), Chennai, India, 2024, pp. 1- 6, doi: 10.1109/ADICS58448.2024.10533619.
[8] Z. Zou, K. Chen, Z. Shi, Y. Guo and J. Ye, \"Object Detection in 20 Years: A Survey,\" in Proceedings of the IEEE, vol. 111, no. 3, pp. 257- 276, March 2023, doi: 10.1109/JPROC.2023.3238524.
[9] M. Young, The Technical Writer’s Handbook. Mill Valley, CA: University Science, 1989.
[10] X. Han, Y. Zhong, R. Feng and L. Zhang, \"Robust geospatial object detection based on pre-trained faster R-CNN framework for high spatial resolution imagery,\" 2017 IEEE International Geoscience and Remote Sensing Symposium (IGARSS), Fort Worth, TX, USA, 2017, pp. 3353-3356, doi: 10.1109/IGARSS.2017.8127716.B. Hou, B. Li, Y. Li and J. Ma
[11] Q. Wu, Y. Zhou, X. Wu, G. Liang, Y. Ou, and T. Sun, “Real-time running detection system for UAV imagery based on optical flow and deep convolutional networks,” IET Intelligent Transport Systems, vol. 14, no. 5, pp. 278–287, Mar. 2020, doi: 10.1049/iet-its.2019.0455.
[12] I. Aydin, M. Sevi, K. Sahbaz, and M. Karakose, “Detection of rail defects with deep learning controlled autonomous UAV,” Conference Paper, 2019.
[13] A. Lekidis, A. G. Anastasiadis, and G. A. Vokas, “Electricity infrastructure inspection using AI and edge platform-based UAVs,” Energy Reports, vol. 8, pp. 1394–1411, Aug. 2022, doi: 10.1016/j.egyr.2022.07.115.
[14] S. Guan, H. Liu, H. R. Pourreza, and H. Mahyar, “Deep learning approaches in pavement distress identification: A review,” arXiv preprint arXiv:2308.00828, Aug. 2023
[15] L. P. Osco et al., “A review on deep learning in UAV remote sensing,” International Journal of Applied Earth Observation and Geoinformation, vol. 102, pp. 102456, July 2021, doi: 10.1016/j.jag.2021.102456.