Video Summarization Using Deep Learning

Authors: K Mohan Kumar, K Sai Teja Reddy, Malathesh C, P Rajiv Dikshith, Nagaraj M

DOI Link: https://doi.org/10.22214/ijraset.2025.67254

Abstract

This project presents a Video Summarization Tool that processes and summarizes videos by identifying and emphasizing objects of interest. Implemented with Python, OpenCV, and Tkinter, the tool has a simple graphical user interface for video processing.The main functionality is based on the YOLOv3-tiny object detection model, which is optimized for quicker processing. The model, which is trained using OpenCV\'s DNN module, finds objects in video frames and outlines them with rectangles.The application analyzes videos by bypassing frames and resampling them, storing processed frames in an output video file. It reads and writes in several video formats and outputs the MP4 codec.The most prominent feature is the GUI based on Tkinter, enabling users to load input videos and see the output summary. The interface provides video selection and output buttons for ease of use by users with little technical knowledge.

Introduction

Video summarization is the process of condensing lengthy videos into shorter versions without losing key information. With the surge in video content online and in surveillance, manual review is impractical. Machine learning (ML) techniques automate this task by analyzing visual, audio, and contextual cues to identify important segments.

Traditional methods rely on manual feature extraction, such as keyframe extraction and shot boundary detection, while ML-based methods use models like CNNs, RNNs, and transformers to learn from large labeled datasets and improve accuracy.

Applications of video summarization include content recommendation, video indexing, surveillance, and media editing.

Problem Definition:
The project aims to develop a video summarization tool that uses the YOLOv3-tiny model for real-time object detection to identify and highlight primary objects in video frames.

Methodology:

Load the pre-trained YOLOv3-tiny model.
Process input video frame-by-frame, extracting properties like resolution and FPS.
Perform object detection on each frame, filtering results by confidence and drawing bounding boxes.
Save the processed frames to an output video file and display them in real-time.
Release resources after processing.

Results and Evaluation:
The developed tool successfully summarizes videos by detecting and marking objects in real-time with the YOLOv3-tiny model, balancing speed and accuracy. The GUI is user-friendly, enabling easy video upload, processing, and output viewing. The tool meets functional requirements and provides an efficient, accurate means to summarize and analyze video content based on detected objects.

Conclusion

The Video Summarization Tool project can efficiently overcome the obstacle of summarizing and processing videos well by identifying and highlighting target objects. Dependent on Python, OpenCV, and Tkinter, utilizing the YOLOv3-tiny model to detect objects precisely and in real-time, the tool helps highlight objects of interest. The project features an organized video processing technique with the addition of a graphical user interface that is easy to use, facilitating easy uploading, processing, and displaying of summarized output. The ability to handle different video formats, optimizing processing by frame skipping and resizing, and providing instant feedback through real-time display enhances its efficiency and usability. Comprehensive testing, including unit, integration, and performance tests, ensures the stability and reliability of the tool. Overall, the Video Summarization Tool offers a useful solution for applications such as surveillance, video analysis, and content abstraction, and thus is a useful and effective tool for users who want to summarize video content effectively and accurately.

References

[1] U. Cisco. (2018). Cisco Annual Internet Report (2018–2023) White 923 Paper. [Online]. Available: https://www.cisco.com/c/en/us/solutions/ 924 collateral/executive-perspectives/annual-internet-report/white-paper-c11- 925 741490.html 926 [2] J. Lei, Q. Luan, X. Song, X. Liu, D. Tao, and M. Song, ‘‘Action 927 parsing-driven video summarization based on reinforcement learning,’’ 928 IEEE Trans. Circuits Syst. Video Technol., vol. 29, no. 7, pp. 2126–2137, 929 Jul. 2019. 930 [3] S. S. Thomas, S. Gupta, and V. K. Subramanian, ‘‘Context driven optimized 931 perceptual video summarization and retrieval,’’ IEEE Trans. Circuits Syst. 932 Video Technol., vol. 29, no. 10, pp. 3132–3145, Oct. 2019. 933 [4] C. Huang and H. Wang, ‘‘A novel key-frames selection framework for 934 comprehensive video summarization,’’ IEEE Trans. Circuits Syst. Video 935 Technol., vol. 30, no. 2, pp. 577–589, Feb. 2020. 936 [5] M. Ma, S. Mei, S. Wan, Z. Wang, D. D. Feng, and M. Bennamoun, 937 ‘‘Similarity based block sparse subset selection for video summarization,’’ 938 IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 10, pp. 3967–3980, 939 Oct. 2021.

Copyright

Copyright © 2025 K Mohan Kumar, K Sai Teja Reddy, Malathesh C, P Rajiv Dikshith, Nagaraj M. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET67254

Publish Date : 2025-03-05

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here