In object detection system we implement various convolutional networks to acquire a precise recognition and localization of objects in digital images and videos. This system sees real time applications in security, diagnosis, surveillance, automation, sports and many others. YOLO is one of the available models of deep learning which is used for quick implementation of object detection system. We will have a brief study of deep learning for object detection in this paper.
Object detection is a technology, combined with computer vision and deep learning, provides advance features in various fields of automation. A wide range of applications in machine learning uses computer vision for object recognition tasks. These computer vision and object recognition tasks enhances automatic robot machines carrying out large amount of work in a small or no time reducing human effort. Machine Learning has a key application in medical fields also when it comes to X-ray scanning, heart image classification, etc. A trained machine learning algorithm helps in the process of diagnosis. But when it comes to real time usage of such systems, like autonomous driving systems, accident prevention system, abnormal activity detection, etc. a faster and efficient object detection system is needed to recognize and detect object within the view of a machine. In this paper, we are talking about the implementation of such object detection system using deep learning that runs in real time.
Humans can perceive their environment without much effort and can justify what are they looking at easily.Human visual senses are fast enough to be able to perform complex tasks like riding a vehicle using their consciousness. Machines, on the other hand requires much programming and training to perform the desired task. Even though this cannot match the visual precision of humans we can use fast algorithms for object detection systems.
II. DEEP LEARNING IN OBJECT DETECTION
Deep learning plays a significant role in real-time object detection systems due to its ability to effectively handle the complexity and variability of object detection tasks. Convolutional Neural Networks (CNNs) are the most widely used deep learning architecture for object detection. CNNs extract meaningful features from images during training process, which are used to detect objects. Transfer learning, a technique that involves reusing pre-trained deep learning models to solve new tasks, is often used in real-time object detection to fine-tune pre-trained CNNs on new datasets or to adapt them to new object detection task. Several deep learning-based object detection frameworks, such as YOLO (You Only Look Once) and SSD(Single Shot Detector), use CNNs for object detection. These frameworks are designed to be fast and efficient, making them ideal for real-time object detection. Data augmentation is a technique that artificially generates new training data by applying transformations to existing data. It is often used in deep learning-based object detection to improve the robustness of the model and reduce overfitting.
Overall, deep learning has revolutionized real-time object detection by enabling faster and more accurate detection of objects in images and videos. It has made it possible to develop sophisticated object detection systems that can be used in a variety of applications, such as surveillance, autonomous vehicles, and robotics.
III. PROPOSED METHODOLOGY
Object detection is a task of locating objects within an image. YOLO is an object detection architecture that makes predictions for bounding boxes using grid based approach. It divides an input image into NxN grid cells assuming that the boxes may contain the objects. It then classifies those boxes with high probabilities of containing that object. There may be a possibility that more than one box have high probabilities for an object, so multiple boxes may be generated around one object. This problem is solved by running a process called Non-maximum suppression. It removes multiple boxes by predicting the best fitted box for the object. OpenCV image processing library is used to produce outputs for object detection system.
YOLO is simple and faster and doesn’t require complex methods like other region based techniques.This model is available in various versions, we are referring v3 (version 3) of the model. The YOLO (You Only Look Once) object detection algorithm uses a convolutional neural network (CNN) architecture to detect objects in images. Specifically, YOLO uses a deep CNN with 24 convolutional layers followed by 2 fully connected layers. These networks are more powerful and capable of detecting objects with higher accuracy and in real-time. Overall, YOLO has demonstrated state-of-the-art performance on a variety of object detection benchmarks, achieving high accuracy andfast inference times. It has been widely adopted in computer vision applications, such as self-driving cars, robotics, and surveillance systems. This model is trained on MSCOCO dataset containing more than 100k images and annotations. It can detect upto 80 different classes under this dataset.
Yolo object detection model is trained on COCO dataset, that contains more than 100K images and annotations labelled on 80 different classes. Based on its state-of-the-art performance it has achieved an accuracy of 81.2% which makes it a suitable model for real time application. The model is able to detect objects and label them according to their predicted classes. The mAP (mean Average Precision) is about 0.5 that is optimal for a detection model. Here are some output examples of the model.
 Xiao, Y., Tian, Z., Yu, J., Zhang, Y., Liu, S., Du, S., & Lan, X. (2020). “A review of object detection based on deep learning”. Multimedia Tools and Applications, 79(33), 23729-23791,
 Younis, Ayesha, et al. \"Real-time object detection using pre-trained deep learning models MobileNet-SSD.\" Proceedings of 2020 the 6th International Conference on Computing and Data Engineering, 2020.
 J. Redmon, S. Divvala, R. Girshick and A. Farhadi, “You Look Only Once: Unified Real-Time Object Detection”, Proc. Of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2016), pp I-511-518, Jun. 2016.
 Howard, A., Sandler, M., Chu, G., Chen, L. C., Chen, B., Tan, M., ... & Adam, H. (2019). Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision.
 Kanimozhi, S., Gayathri, G., & Mala, T. (2019, February). Multiple Real-time object identification using Single shot Multi-Box detection. In 2019 International Conference on Computational Intelligence in Data Science (ICCIDS) (pp. 1-5). IEEE.
 Vaishali Shilpi Singh, “Real-Time Object Detection System using Caffee Model.” International Research Journal of Engineering and Technology (IRJET) Volume 6(2016).
 Object Detection using Deep Learning Approach, Moloy Dhar, Paromita Saha, Nirupam Saha, Sourish Mitra, Bidyutmala Saha, Pallabi Das Rafiqul Islam, Sutapa Sarkar, Department of Computer Science and Engineering, Guru Nanak Institute Of Technology.
 D. Wu, N. Sharma and M. Blumenstein, \"Recent advances in video-based human action recognition using deep learning: A review,\" 2017 International Joint Conference on Neural Networks (IJCNN), 2017, pp. 2865-2872, doi: 10.1109/IJCNN.2017.7966210.
 Akshay Raghunandan, Pakala Raghav Mohana and H.V.Ravish Aradhya, “Object Detection Algorithms for video surveillance applications”, International conference on communication and signal processing (ICCSP), pp, 0676-0680, 2018.
 Arka Prava Jana, Abhiraj Biswas and Mohana, “YOLO based Detection and Classification of Objects in video records”, 2018 IEEE International Conference On Recent Trends In Electronics Information Communication Technology (RTEICT), 2018.
 A Study on Real Time Object Detection Using Deep Learning, Pradyuman Tomar, Sagar, Sameer Haider, Dept. of Electronics and Communication Engineering, Meerut Institute of Engineerring and Technology, Meerut, India.
 H.H. Tsang, ‘Review: YOLOv3- You Look Only Once (Object Detection)’, Feb 2019.
 S. Luo, C. Xu and H. Li, “An Application of Object Detection Based on YOLOv3 in Traffic”, Proc. the 2019 International Conference on Image Video and Signal Processing (ISVP 2019), Feb. 2019.