In computer vision, object detection is still an essential job with applications in robotics, autonomous cars, and surveillance. The YOLO (You Only Look Once) model family is a well-liked option for real-time object recognition because of its reputation for striking a balance between speed and accuracy. Using the Pascal VOC 2012 dataset, this study presents a comparative examination of three YOLO variants: YOLOv8, YOLOv9, and YOLOv10. Key metrics included in the research are F1-score, accuracy, recall, and mean Average accuracy (mAP). Furthermore, a loss curve analysis is performed to evaluate each model\'s training effectiveness. The findings show that YOLOv9, which excels in identifying smaller and more complex items, has the best recall to accuracy ratio. Although YOLOv10 exhibits a modest underperformance in peak accuracy, its improved computing efficiency renders it the best option for real-time workloads. Even though YOLOv8 is the quickest, it has trouble with little objects and complex sceneries. This study helps each model\'s use in a variety of real-world circumstances by offering insightful information about its advantages and disadvantages.
Introduction
Object detection is a vital area in computer vision with diverse applications, and the YOLO (You Only Look Once) family of models has been at the forefront due to its effective balance of speed and accuracy. This study focuses on comparing the recent YOLOv8, YOLOv9, and YOLOv10 models, which each target different real-world challenges:
YOLOv8 prioritizes fast detection suitable for real-time applications but is less accurate on smaller objects.
YOLOv9 improves detection accuracy, especially for small and complex objects, using advanced feature pyramids and attention mechanisms.
YOLOv10 balances computational efficiency and accuracy, making it ideal for resource-constrained environments.
The models were evaluated using the PASCAL VOC 2012 dataset with metrics including precision, recall, F1-score, mean Average Precision (mAP), and loss curves to analyze training efficiency and detection performance.
Key findings:
YOLOv9 offers the best overall balance between precision, recall, and computational efficiency, excelling at detecting challenging objects.
YOLOv10 achieves the highest mAP and fastest convergence, making it suitable for real-time, resource-limited use cases.
YOLOv8 is the fastest model but less effective on small object detection.
The study fills a research gap by providing a direct, standardized comparison of these recent YOLO versions and highlights their respective trade-offs, guiding their application in various domains.
Conclusion
This comparative analysis of YOLOv8n, YOLOv9t, and YOLOv10n on the Pascal VOC 2012 dataset provides valuable insights into the strengths and limitations of each model:
YOLOv9t emerges as the best overall performer, particularly in scenarios requiring high accuracy and precision for smaller objects.
YOLOv10n offers the best balance between computational efficiency and detection accuracy, making it suitable for real-time applications where resources are limited.
YOLOv8n, while the fastest, is best suited for applications where rapid processing is critical, but high accuracy with smaller objects is less of a concern.
References
[1] Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016). You Only Look Once: Unified, Real-Time Object Detection. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 779-788.
[2] Bochkovskiy, A., Wang, C.-Y., & Liao, H.-Y. M. (2020). YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv preprint arXiv:2004.10934.
[3] Wang, C.-Y., Bochkovskiy, A., & Liao, H.-Y. M. (2022). YOLOv7: Trainable Bag-of-Freebies Sets New State-of-the-Art for Real-Time Object Detectors. arXiv preprint arXiv:2207.02696.
[4] Jocher, G., Chaurasia, A., Qiu, J. (2023). YOLO by Ultralytics (Version 8.0.0) [Computer software]. Retrieved from https://github.com/ultralytics/ultralytics.
[5] Terven, J., & Cordova-Esparza, D. (2023). A Comprehensive Review of YOLO: From YOLOv1 to YOLOv8 and Beyond.
[6] Li, C., Li, L., Jiang, H., et al. (2022). YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications.
[7] Safie, S. I., Kamal, N. S. A., Yusof, E. M. M., et al. (2023). Comparison of SqueezeNet and DarkNet-53 based YOLO-V3 Performance for Beehive Intelligent Monitoring System. 2023 IEEE 13th Symposium on Computer Applications & Industrial Electronics (ISCAIE).
[8] Gillani, I. S., Munawar, M. R., Talha, M., et al. (2023). YOLOv5, YOLO-X, YOLO-R, YOLOv7 Performance Comparison: A Survey.
[9] Padilla, R., Netto, S. L., & Da Silva, E. A. B. (2020). A Survey on Performance Metrics for Object-Detection Algorithms. 2020 International Conference on Systems Signals and Image Processing (IWSSIP).
[10] Lin, T.-Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., et al. (2014). Microsoft COCO: Common Objects in Context.
[11] Ren, S., He, K., Girshick, R., & Sun, J. (2016). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(6), 1137-1149.
[12] Farhadi, A., & Redmon, J. (2018). YOLOv3: An Incremental Improvement. arXiv preprint arXiv:1804.02767.
[13] Feng, C., Zhong, Y., Gao, Y., et al. (2021). TOOD: Task-aligned One-Stage Object Detection. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 626-643.
[14] Shuo, W. (2021). Research Towards YOLO-Series Algorithms: Comparison and Analysis of Object Detection Models for Real-Time UAV Applications. Journal of Physics: Conference Series, 1948(1), 012021.
[15] Sanchez-Matilla, R., Poiesi, F., & Cavallaro, A. (2016). Online Multi-Target Tracking with Strong and Weak Detections. Proceedings of the European Conference on Computer Vision (ECCV), 9914, 84-99.
[16] Diwan, T., Anirudh, G., & Tembhurne, J.V. (2022). Object Detection using YOLO: Challenges, Architectural Successors, Datasets and Applications. Multimed. Tools Appl., 82, 9243–9275.
[17] Wang, A., Chen, H., Liu, L., Chen, K., Lin, Z., Han, J., & Ding, G. (2024). YOLOv10: Real-Time End-to-End Object Detection. arXiv:2405.14458. https://doi.org/10.48550/arXiv.2405.14458.
[18] Wang, C.-Y., Yeh, I.-H., & Liao, H.-Y. M. (2024). YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv:2402.13616. https://doi.org/10.48550/arXiv.2402.13616.