The rapid evolution of deep learning techniques has greatly enhanced the capabilities of object detection systems. This paper investigates the integration of MobileNet with the Single Shot MultiBox Detector (SSD) for efficient and accurate object detection. MobileNet, designed with depthwise separable convolutions, significantly reduces computational complexity while maintaining high performance. By combining MobileNet’s lightweight architecture with the SSD framework, which performs object localization and classification in a single pass, we achieve an effective balance between speed and accuracy. Our study benchmarks MobileNetV3 SSD against established object detection models on various datasets, highlighting its strengths in real-time applications. We demonstrate that MobileNetV3 SSD offers notable improvements in processing time and resource utilization without compromising detection quality. The findings underline MobileNetV3 SSD’s suitability for deployment in environments with limited computational power, such as mobile devices and edge computing platforms.
Introduction
Object detection is a core challenge in computer vision, involving identifying and localizing objects within images or video frames. It plays a vital role in fields such as autonomous driving, surveillance, and interactive systems. Although deep learning has greatly advanced detection accuracy, achieving real-time performance on mobile or embedded devices remains difficult due to computational constraints.
To address this, the proposed study integrates MobileNetV3 with the Single Shot MultiBox Detector (SSD) to create a lightweight, high-performance object detection framework. MobileNetV3, enhanced through Neural Architecture Search (NAS) and Squeeze-and-Excitation (SE) modules, serves as an efficient backbone for feature extraction, while SSD enables multi-scale detection of objects in a single forward pass. The pipeline includes image preprocessing, feature extraction, bounding box prediction, and Non-Maximum Suppression (NMS) to refine detections.
The model is further optimized through quantization and pruning for deployment via TensorFlow Lite, achieving real-time inference on edge devices. Experimental results show accurate detection of diverse objects—such as cars, bicycles, and birds—even under challenging lighting and resolution conditions, confirming the system’s effectiveness for resource-constrained, real-time applications.
Conclusion
In this study, we presented an object detection approach based on the MobileNetV3-SSD architecture. The model achieved an optimal balance between speed and accuracy, making it highly suitable for deployment in real-time applications. Experimental results confirmed that MobileNetV3-SSD delivers robust detection performance even on devices with limited computational capacity. Future work may involve integrating attention mechanisms and feature pyramid networks to further improve detection accuracy for small and overlapping objects.
References
[1] R. S. Sukthanker, A. Zela, B. Stafler, S. Dooley, J. Grabocka, and F. Hutter, “Multi-objective Differentiable Neural Architecture Search,” arXiv preprint arXiv:2402.XXXX, Feb. 2024
[2] M. Koteswararao and P. R. Karthikeyan, “Accurate and Real-Time Object Detection System using YOLO v3-320 in Comparison with MobileNet SSD Network,” International Journal of Advanced Computer Science and Applications, 2023.
[3] S. Guo, Y. Liu, and W. Ni, “Lightweight SSD: Real-Time Lightweight Single Shot Detector for Mobile Devices,” IEEE Transactions on Neural Networks and Learning Systems, 2023.
[4] S. Liu, L. Huang, and X. Wu, “Lightweight Single Shot Multi-Box Detector: A Fabric Defect Detection Algorithm Incorporating Parallel Dilated Convolution and Dual Channel Attention,” Journal of Visual Communication and Image Representation, Oct. 2023.
[5] X. Deng and S. Li, “An Improved SSD Object Detection Algorithm Based on Attention Mechanism and Feature Fusion,” Sensors, Mar. 2023.
[6] N. Anggraini, S. H. Ramadhani, L. K. Wardhani, N. Hakiem, I. M. Shofi, and M. T. Rosyadi, “Development of Face Mask Detection using SSDLite MobileNetV3 Small on Raspberry Pi 4,” International Journal of Advanced Computer Science and Applications, 2022.
[7] T. Repák, “Fast Object Detection on Mobile Platforms using Neural Networks,” Brno University of Technology, Spring 2021.
[8] K. G. Mills, F. X. Han, M. Salameh, S. S. C. Rezaei, L. Kong, W. Lu, S. Lian, S. Jui, and D. Niu, “L2NAS: Learning to Optimize Neural Architectures via Continuous-Action Reinforcement Learning,” arXiv preprint arXiv:2109.XXXX, Sept. 2021.
[9] R. V. Iyer, P. S. Ringe, and K. P. Bhensdadiya, “Comparison of YOLOv3, YOLOv5s and MobileNet-SSD V2 for Real-Time Mask Detection,” Proceedings of the International Conference on Intelligent Systems and Communication Networks, July 2021.
[10] M. Hajizadeh, M. Sabokrou, and A. Rahmani, “MobileDenseNet: A New Approach to Object Detection on Mobile Devices,” Pattern Recognition Letters, 2020.
[11] X. Chu, B. Zhang, and R. Xu, “MoGA: Searching Beyond MobileNetV3,” arXiv preprint arXiv:1908.XXXX, Aug. 2019.
[12] A. Howard, M. Sandler, G. Chu, L.-C. Chen, B. Chen, M. Tan, W. Wang, Y. Zhu, R. Pang, V. Vasudevan, Q. V. Le, and H. Adam, “Searching for MobileNetV3,”