Effective SSD Based Multiple Object Detection in Videos

Authors: Jehu Sheran, Sahana NS, ChanduShree M, Harshavardhan V, K. B Bini

DOI Link: https://doi.org/10.22214/ijraset.2025.70738

Abstract

This paper aims to analyze the objects in any format of video or filmography. The web application is just simple like uploading any format of video, it processes the video and gives the finest progressive output with a bounding box. For processing the video for object detection, the SSD [algorithm] is used because the SSD [algorithm] has additional accuracy than the YOLO [algorithm]. And with SSD no other video format or other object detection processes are done, only for image classification SSD [algorithm] is used. In this project, introduced the framework of the video object detection process using the SSD [algorithm]. This technique works for surveillance video and any format of the video.

Introduction

The project aims to improve video object detection by applying the Single Shot MultiBox Detector (SSD) algorithm, known for its high accuracy and speed in detecting multiple bounding boxes in images, to video frames. While YOLO and RCNN algorithms are widely used for object detection, SSD offers a superior balance of speed and accuracy, especially for real-time applications.

Since SSD is primarily designed for image classification, the approach involves extracting each frame from a video, applying SSD individually to these frames, and then recombining them to produce accurate video object detection with bounding boxes and class labels.

The literature review highlights SSD’s efficiency compared to Faster RCNN, its use of multiple feature maps for detecting objects of various sizes, and its suitability for real-time processing. Tools like TensorFlow and FFmpeg are integral to the implementation: TensorFlow provides pre-trained SSD models and facilitates frame-by-frame processing, while FFmpeg handles video format conversion to ensure compatibility.

The proposed system processes videos by breaking them into frames (24 per second), running SSD object detection on each frame, and then reconstructing the video with detected objects marked. Non-maximum suppression is used to eliminate duplicate detections. The approach addresses current gaps in video-based detection accuracy and efficiency, enabling advanced and adaptable real-time object detection across various video formats.

Conclusion

This project successfully demonstrates the application of the SSD[algorithm] for video object detection, addressing the challenges of adapting an images based detection model to video content. By processing each frame of the video individually and applying the SSD[algorithm]. The system achieves accurate object-detection with bounding boxes, enhancing the utility of SSD in video applications. The integration of FFmpeg ensures compatibility across various video formats. This approach not only fills a gap in existing object detection methodologies but also sets a foundation for future advancements in video based object detection systems, offering a balance between speed and accuracy that is crucial for practical deployment in diverse domains such as surveillance and multimedia processing.

References

[1] Wei Liu; Dragomir Angular; Dumitru Esha; Christian Szegedy; Scott Read; Cheng-Yang Fu: 2016, “SSD: Single shot multiBox Detector” arXiv: 1512.0232 [2] Zeiler; Matthew D; Rob Fergus: 2014, “Visualizing and understanding convolutional networks” in European conference on computer vision, pp.818-833 [3] Xlaohua Lei; Xiahhua jiang; Cashong Wang: 2013, “ Design and Implementation of a Real-time video stream analysis System Based On FFMPEG”, 10.1109/WISE.2013.38 [4] H Sumesh Singha; Dr. Bhuvana : 2021, “A Study On FFmpeg MultiMedia Framework”, ISSN-2456-6470. [5] Jonathan Heri: 2018, “SingleShot MultiBox Detector for real-time processing” [6] Dang HaThe Hien: A Guide to receptive field arithmetic for CNN [7] Suramya Tomas: 2006, “Converting Video formats with FFMPEG” [8] FFmpeg :[online] https://www.ffmpeg.org/ [9] Howard Jeremy: 2019, Lesson 09: “Deep learning part-2 multi object detection” [10] J. Ba, V. Mnih, and K. Kavukcuoglu: 2014,”Multiple Object Recognition with visual attention”, arXiv:1412.7755, 2014. [11] Hafiz Nur: 2022, “Research on TensorFlow with a system for large-scale machine learning”, Researchgate.

Copyright

Copyright © 2025 Jehu Sheran, Sahana NS, ChanduShree M, Harshavardhan V, K. B Bini. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET70738

Publish Date : 2025-05-10

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here