Combining YOLO and R-CNN for Robust Object Detection

Authors: Khushi Singh , Miss. Ankita Dubey (Asst Prof), Somil Singh, Anand Kumar Verma, Abhishek Pathak

DOI Link: https://doi.org/10.22214/ijraset.2025.71512

Abstract

This research focuses on the development of a real-time object detection system using deep learningtechniques. The system is designed toaccurately identify and localize multiple objects within video frames in real-time, making it suitable for applications such as surveillance, autonomous vehicles, and smart environments. Real-time object is a complexarea and fundamental of computer vision. Due to its increased utilization in face recognition, tracking system, robotics, augmented reality and surveillance used in security and many others applications like live streaming filters (Snapchat, Instagram). The goal is to identify objects Which is done with the help of YOLO to locate object usingbounding boxes [1]. YOLO (You Only Look Once) is a new approach to object detect, it outperforms other detection methods, including DPM and R-CNN when generalizing from natural images to other domains like artwork. In recent years, the integration of real-time object detection with edge computing and IoT has gained traction. This combination allows smart devices to make on-the spot decisions without relying heavily on cloud services enhancing privacy and reducing latency. [2]It is briefly describes the development process of the YOLO algorithm, summarizes the methods of target recognition and feature selection, Besides, this paper contributes a lot to YOLO and other object detection literature.

Introduction

Object detection is a key computer vision technology focused on quickly locating and identifying objects within images or video streams using bounding boxes. It involves two main tasks: classification (identifying what the object is) and localization (determining where it is). Real-time object detection relies on advanced deep learning models and powerful hardware, with popular approaches including YOLO (You Only Look Once), R-CNN, and Single Shot Detector (SSD).

YOLO is particularly noted for its speed and efficiency, processing images in one pass and enabling real-time applications like surveillance, autonomous driving, and robotics. YOLO has evolved through multiple versions (YOLOv1 to YOLOv10), each improving accuracy, speed, and adaptability for edge devices.

R-CNN introduced a two-stage detection process by generating region proposals and then classifying each, achieving high accuracy but with greater computational cost. In contrast, single-stage detectors like YOLO and SSD predict object classes and bounding boxes in one forward pass, making them faster and more suitable for real-time use.

Real-time object detection systems typically preprocess video frames, run detection models, and annotate detected objects instantly. Frameworks like TensorFlow, PyTorch, and OpenCV facilitate building these systems. This technology is crucial across many industries, enabling smarter, faster, and automated visual recognition tasks in dynamic environments.

Conclusion

This paper gives us a review of the YOLO version and the work of CNN, R-CNN. Here we draw the following remarks. In which First CNN work in object detection and after that know about R-CNN, the step of work, aim and function. YOLO version has a lot of differences. YOLO stands out as the most practical solutionfor real-time object detection, making it ideal for applications like autonomous driving, surveillance, and robotics and it is more adopted in modern applications. Future advancements may see a convergence of these approaches, combiningthe accuracy of region based methods with the speed of single-shot detectors to meet the ever increasing demands of intelligent systems.

References

[1] V. S. K.NMittal,“Objectdetectionandclassificationusing Yolo,”academia, 2019. [2] S.D.R.G.JRedmon,“Youonlylookonce:Unified,real-timeobjectdetection,”foundation,2016. [3] Y.J.W.C.Y.H.Y.Z.XXu,“DAMO-YOLO:AReportonReal-TimeObjectDetection Design,”arxiv.,2023. [4] C.R.K.R.A.ViswanathaV,“RealTimeObjectDetectionSystemwith YOLOand CNN Models: A Review,” cs.CV], 2022. [5] H.L.CYWang,“YOLOv1toYOLOv10:Thefastestandmostaccuratereal-time objectdetectionsystems,”nowpublishers,2024. [6] K.M.SVKothiya,“SVKothiya,KB Mistree,”researchgate,2015. [7] K.H.R.G.J.S.ShaoqingRen,“FasterR-CNN:TowardsReal-TimeObjectDetection with Region Proposal Networks,” .neurips, 2015. [8] M. I.-K. -. I. A. S Sambolek, “Automatic person detection in search and rescueoperations using deep CNN detectors,” ieeexplore, 2012. [9] L.K.HDIUpulie,“Real-timeobjectdetectionusingYOLO:areview,” researchgate., 2021. [10] A.R.RKChandana,“RealtimeobjectdetectionsystemwithYOLOandCNN models,”arxiv,2022. [11] Y.W.H.S.R.F.J.X.T.H.BowenCheng,“RevisitingRCNN:OnAwakeningthe Classification Powerof FasterRCNN,”thecvf,2018. [12] G.Z.D.S. WZhang,“Real-timeaccurateobjectdetectionusing multipleresolutions,” ieeexplore, 2007. [13] A.P.B.B.ATalele,“DetectionofrealtimeobjectsusingTensorFlowandOpenCV,”asianssr,2019. [14] K.S.GKhekare,“Realtime object detectionwith speech recognitionusingtensorflow lite,” researchgate, 2022. [15] C. P. H. T. J. P. R. V. D Bhatt,“ CNN variants for computer vision: History,architecture, application, challenges and future scope,” mdpi., 2021.

Copyright

Copyright © 2025 Khushi Singh , Miss. Ankita Dubey (Asst Prof), Somil Singh, Anand Kumar Verma, Abhishek Pathak. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET71512

Publish Date : 2025-05-23

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here

A PHP Error was encountered