In computer vision, object detection is a crucial task with many applications, such as robots, autonomous cars, and surveillance. The newest model in the YOLO (You Only Look Once) family, YOLOv8, brings important architectural enhancements to improve model efficiency, speed, and detection accuracy. YOLOv8 is a small object detection framework based on deep learning that has been used a lot in computer vision tools. Building blocks, preparation, and data addition methods are used to make it work. The training process was carefully set up to work with the limited resources that were available while still using modern deep learning techniques. The model was then refined on the custom dataset that was made just for the job. In terms of precision, recall, and class-wise detection ability, the results show some good signs. Our findings show that YOLOv8 is appropriate for both cloud-based and embedded applications since it maintains real-time inference speeds while achieving excellent detection accuracy. The results demonstrate YOLOv8\'s potential as an effective tool for a variety of real-world object detection applications.
Introduction
Object detection identifies and locates objects within images or video frames and is vital in fields like smart shopping, medical imaging, and autonomous vehicles. Traditional methods like Haar cascades and HOG were limited in flexibility and accuracy. The advent of deep learning, particularly convolutional neural networks (CNNs), significantly improved detection by handling diverse object shapes, sizes, and positions.
The YOLO (You Only Look Once) family of models revolutionized object detection by framing it as a single regression problem, allowing real-time detection with a balance of speed and accuracy. YOLO evolved through multiple versions:
YOLOv1 (2016): Introduced single-stage real-time detection but struggled with small objects.
YOLOv2 (2017): Improved accuracy with batch normalization and anchor boxes.
YOLOv3 (2018): Added multi-scale detection and a deeper backbone network (Darknet-53).
YOLOv4 (2020): Optimized architecture with advanced activation functions and data augmentation.
YOLOv5 (2020) and later versions (YOLOv6, YOLOv7, YOLOv8): Focused on deployment ease, accuracy, speed, and computational efficiency, with YOLOv8 introducing anchor-free detection and improved backbone for better performance.
This study trained the YOLOv8n (nano) model, the lightest YOLOv8 variant, using the diverse and challenging COCO dataset. Data augmentation techniques (flipping, rotation, noise, brightness/contrast adjustment, scaling, cropping) were applied to enhance robustness and generalization. Training used transfer learning, adaptive optimization (AdamW), and was configured for CPU-based environments with limited resources.
Results showed the model effectively learned object features, with improvements in loss, precision, recall, and mean Average Precision (mAP). Evaluation metrics, including precision-recall and F1-score curves, confirmed balanced and strong detection performance.
This work highlights YOLO’s continuous development and YOLOv8’s suitability for modern real-time object detection tasks across various applications.
Conclusion
The experiment shows that YOLOv8n can be successfully trained on the COCO dataset on a CPU with constrained resources. After only 5 epochs, the model\'s mAP@0.5 of 45.8% and mAP@0.5:0.95 of 31.9% demonstrated its capacity to learn despite limitations. This creates a good baseline, even when accuracy is low because to the short training period and CPU limitation. In order to increase accuracy, it is advised that future studies use GPU training with more epochs and hyperparameter adjustment. Because of its modular design, pretrained weights, and speed, YOLOv8 is a great option for real-time applications such as smart surveillance systems on embedded devices or outdoor navigation for the blind and visually impaired.
References
[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015, doi: 10.1038/nature14539.
[2] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. Adv. Neural Inf. Process. Syst., 2012, pp. 1097–1105.
[3] N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2005, pp. 886–893, doi: 10.1109/CVPR.2005.177.
[4] P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), vol. 1, pp. 511–518, 2001.
[5] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2014, pp. 580–587, doi: 10.1109/CVPR.2014.81.
[6] W. Liu et al., “SSD: Single shot multibox detector,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2016, pp. 21–37, doi: 10.1007/978-3-319-46448-0_2.
[7] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Advances in Neural Inf. Process. Syst. (NeurIPS), vol. 28, pp. 91–99, 2015.
[8] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” in Eur. Conf. Comput. Vis. (ECCV), pp. 21–37, 2016.
[9] S. N. Katkade, V. C. Bagal, R. R. Manza, and P. L. Yannawar, \"Advances in Real-Time Object Detection and Information Retrieval: A Review,\" Artificial Intelligence and Applications, vol. 1, no. 3, pp. 123–128, Mar. 2023, doi: 10.47852/bonviewAIA3202456
[10] G. Jocher et al., “YOLO by Ultralytics,” GitHub Repository, 2023. [Online]. Available: https://github.com/ultralytics/ultralytics.
[11] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 779–788, 2016
[12] J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv preprint arXiv:1804.02767, 2018
[13] A. Bochkovskiy, C. Y. Wang, and H. Y. M. Liao, “YOLOv4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934, 2020. [Online]. Available: https://arxiv.org/abs/2004.10934
[14] G. Jocher, “YOLOv5: A research project by Ultralytics,” 2020. [Online]. Available: https://github.com/ultralytics/yolov5
[15] M. Meituan, “YOLOv6: A single-stage object detection framework for industrial applications,” arXiv preprint arXiv:2209.02976, 2022.
[16] C. Y. Wang, A. Bochkovskiy, and H. Y. M. Liao, “YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors,” arXiv preprint arXiv:2207.02696, 2022. [Online]. Available: https://arxiv.org/abs/2207.02696
[17] Ultralytics, \"YOLOv8 Documentation,\" 2023. [Online]. Available: https://docs.ultralytics.com/
[18] G. Jocher et al., “YOLOv8: The latest evolution in real-time object detection,” https://github.com/ultralytics/ultralytics, 2023 (accessed May 2025).
[19] S. N. Katkade, R. R. Manza, P. S. Shinde, and B. A. Balande, \"Advances in Object Detection Algorithms Using Deep Learning,\" International Journal of Advanced Research in Computer and Communication Engineering, vol. 9, no. 5, pp. 1–5, May 2020, doi: 10.17148/IJARCCE.2020.951.
[20] K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, 2014. (Reference often cited for data augmentation including noise techniques
[21] R. Girshick, “Fast R-CNN,” Proc. IEEE Int. Conf. Comput. Vis. (ICCV), pp. 1440–1448, 2015.
[22] M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural networks,” Proc. Int. Conf. Mach. Learn. (ICML), pp. 6105–6114, 2019.
[23] T.-Y. Lin, M. Maire, S. Belongie, et al., “Microsoft COCO: Common objects in context,” in Eur. Conf. Comput. Vis. (ECCV), 2014.
[24] L. Liu , H. Ouyang, X. Wang, et al., “Swin Transformer: Hierarchical vision transformer using shifted windows,” Proc. IEEE/CVF Int. Conf. Comput. Vis. (ICCV), pp. 10012–10022, 2021.
[25] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), pp. 770–778, 2016
[26] I. Loshchilov and F. Hutter, “Decoupled weight decay regularization,” arXiv preprint arXiv:1711.05101, 2017.
[27] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
[28] N. Narwal, A. Malviya, and S. K. Dwivedi, “YOLOv5: A performance evaluation,” in 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), pp. 1776–1781, doi: 10.1109/ICICCS51141.2021.9432097.