Real-Time Object Detection Using MobileNet-SSD and OpenCV

Authors: Prof. Suvarna Thakur, Ashwini Lawande, Atharva Salunkhe, Tanmay Wak, Aniket More, Dr. Shivajirao Jadhav

DOI Link: https://doi.org/10.22214/ijraset.2025.73522

Abstract

Object detection is a crucial task in computer vision, finding applications in various fields such as autonomous vehicles, surveillance, and augmented reality. This study explores the implementation of object detection using the OpenCV library, a popular computer vision toolkit. The approach employs a pre-trained deep learning model, specifically the Single Shot MultiBox Detector (SSD), for real-time object detection. The visualization component of the this system utilizes OpenCV’s drawing functions to overlay bounding boxes and labels onto the original frames, providing a visually informative output. The implementation ensures real time performance, making it suitable for applications that demand swift object detection. In addition to real-time video processing, this system demonstrates adaptability to image-based object detection. It accommodates various object classes, making it versatile for diverse use cases. The codebase is modular and well-documented, encouraging extensibility and customization for specific application requirements.

Introduction

Object detection is a key computer vision task that has advanced through powerful models, libraries, and datasets. This study integrates OpenCV, a versatile open-source computer vision library, with the Single Shot Multibox Detector (SSD) model and the COCO dataset to create an accurate, efficient real-time object detection system.

Key Components:

OpenCV: Provides essential tools for image/video processing and model integration.
SSD Model: A deep learning architecture that detects objects in a single pass, offering high accuracy and real-time speed by predicting bounding boxes and class labels simultaneously.
COCO Dataset: A large, diverse dataset containing images with 80 common object categories, enhancing model robustness and real-world applicability.

Literature Insights:
The survey covers various object detection and tracking techniques, including advanced 3D sensing, convolutional and recurrent neural networks, and feature-based tracking algorithms. Challenges like object motion, occlusions, and illumination changes are addressed in prior research, guiding improvements in detection accuracy and efficiency.

Methodology:

Real-time video frames are captured via webcam or video files using OpenCV.
The pre-trained MobileNet-SSD model performs detection and classification on each frame.
Detected objects are labeled and bounded visually in real time, leveraging COCO categories.
Post-processing filters overlapping detections to refine results.

Results:
The system successfully detects and labels diverse objects in real time, providing clean, user-friendly visual outputs. The combination of OpenCV, SSD, and COCO delivers high accuracy and speed across various scenarios.

References

[1] Bhumika Gupta, PhD Assistant Professor, C.S.E.D G.B.P.E.C, Pauri Uttarakhand, ”India International Journal of Computer Applications”, (0975 – 8887) Volume 162 – No 8, ”Study on Object Detection using Open CV - Python”, March 2017. [2] Richard Socher, Brody Huval, Bharath Bhat, Christopher D. Manning, Andrew Y. Ng, ”International Conference on Computational Intelligence and Communication Networks”,”Convolutional-Recursive Deep Learn- ing for 3D Object Classification”, 2018. [3] Mukesh Tiwari, Dr. Rakesh Singhai, ”International Journal of Computa- tional Intelligence Research”, Volume 13, Number 5 (2017). “A Review of Detection and Tracking of Object from Image and Video Sequences”, 2017. [4] Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S. (2016). ”In European Conference on Computer Vision (ECCV)”, 2016. ”SSD: Single Shot MultiBox Detector, 2016. [5] Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, Fei-Fei, L. (2015). ”International Journal of Computer Vision”, 115(3), 211-252. ”ImageNet Large Scale Visual Recognition Challenge”, 2015. [6] Cadena, C., Dick, A., and Reid, I. (2015). ”IEEE International Con- ference on (Seattle, WA)”. “A fast, modular scene understanding sys- tem using context -aware object detection, in Robotics and Automation (ICRA)”. [7] Azzopardi, G., and Petkov, N. (2014). Ventral-stream-like shape repre- sentation:from pixel intensity values to trainable object-selective cosfire models. Front.Comput. Neurosci. 8:80. doi:10.3389/fncom.2014.00080. [8] Azzopardi, G., and Petkov, N. (2013). ”IEEE Trans. Pattern Anal. Mach. Intell”. 35, 490–503.doi:10.1109/TPAMI.2012.106, ”Trainable cosfire filters for keypoint detectionand pattern recognition”. [9] Azizpour, H., and Laptev, I. (2012). “Object detection using strongly- superviseddeformable part models,” in Computer Vision-ECCV 2012 (Florence: Springer),836–849. [10] Dalal, N., and Triggs, B. (2005). ”IEEE Computer Society Conference on, Vol. 1 (San Diego, CA: IEEE)”, 886–893. doi:10.1109/CVPR.2005.177, “Histograms of oriented gradients for humandetection”. [11] Agarwal, S., Awan, A., and Roth, D. (2004). ”IEEE Trans. Pattern Anal. Mach. Intell. 26,1475–1490.doi:10.1109/TPAMI.2004.108”, ”Learning to detect objects in images via a sparse, part-based representation”.

Copyright

Copyright © 2025 Prof. Suvarna Thakur, Ashwini Lawande, Atharva Salunkhe, Tanmay Wak, Aniket More, Dr. Shivajirao Jadhav. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET73522

Publish Date : 2025-08-02

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here