EDUVID: An AI-Powered Educational Video Generation System

Authors: S Veerendra Swamy, S M Saranya , Sushma Kulkarni , Vaishnavi Prabhakar , G. Chandrakala

DOI Link: https://doi.org/10.22214/ijraset.2026.82212

Abstract

Traditional textbook learning often makes it hard for students to understand long and complicated ideas. Online videos are not always in line with the exact content of the textbook or syllabus. Many learners spend extra time looking for good explanations, which can cause confusion and make learning less efficient. This paper introduces EDUVID, an AI-powered system that creates educational videos from selected textbook pages. The system uses automated techniques to make simple and engaging videos. Users can upload a textbook, choose specific pages, and generate videos that match their needs. EDUVID uses OCR to extract text, NLP to summarize content, TTS for audio narration, and AI to create slides for visual explanations. A rule-based chatbot helps users upload files and select pages, making the platform easy to use for those who aren’t tech-savvy. Experimental results show that EDUVID greatly cuts down the time needed to process content—from several minutes of manual work to just a few seconds—while improving learning efficiency and reducing the effort required from users through automated video creation.

Introduction

This text describes the design and development of EDUVID, an AI-powered system for automatically generating educational videos from textbook content.

The main problem addressed is that traditional learning methods and existing online resources are often time-consuming, generic, and not aligned with specific syllabus content. Students struggle to quickly understand complex material, while manual video creation is also labor-intensive and not scalable.

EDUVID solves this by creating an end-to-end automated pipeline where users upload a textbook PDF, select specific pages, and receive a generated educational video. The system uses OCR to extract text, NLP techniques to summarize content, Text-to-Speech to generate narration, and AI-based visuals to build slide-style explanations. A rule-based chatbot assists users during the process, making the system easy to use.

The literature review shows that while existing AI and e-learning systems improve accessibility and personalization, they typically lack full integration of textbook processing and automated video generation. EDUVID fills this gap by combining document extraction, summarization, and multimedia generation in one workflow.

The system architecture is modular, consisting of frontend input, backend processing, OCR, NLP summarization, TTS, visual generation, and video rendering. The workflow includes uploading PDFs, selecting pages, extracting text, summarizing it, converting it into speech, generating visuals, and producing a final video with download options.

The implementation uses Python and Flask along with OCR libraries (like Tesseract), NLP tools, Text-to-Speech engines, and video processing libraries (OpenCV/MoviePy). The system is software-based, requiring no specialized hardware, making it lightweight and easily deployable.

Conclusion

This paper presented EDUVID, an AI-powered educational video generation system developed to provide simple and effective learning support by converting textbook content into engaging educational videos. The proposed framework integrates document upload, selective page extraction, OCR- based text extraction, NLP-driven summarization, Text-to- Speech narration, and automated visual generation to create a reliable and accessible learning solution. By incorporating multiple processing stages—including text extraction, content simplification, audio narration, and visual slide creation—the system ensures that learners can understand complex topics through structured multimedia explanations. The framework further enhances usability by allowing users to generate videos from selected textbook pages without requiring advanced technical skills. The EDUVID system plays a significant role in enabling personalized and focused learning. By allowing learners to choose specific pages from textbooks, the system generates targeted educational videos that match the exact syllabus or topics required. Through automated content processing and narration, users can access learning material without spending time searching for suitable external videos. The system oper- ates as a standalone educational support platform and does not depend on manual content creation, making it suitable for use in resource-constrained academic environments. The design and development of EDUVID followed a structured engineering approach involving system architec- ture modeling, modular implementation, and comprehensive testing. Extensive unit, integration, and system-level testing demonstrated that the framework consistently performs as expected across various document sizes and usage scenarios. Performance analysis indicated that the system significantly reduces learning preparation time compared to manual meth- ods while maintaining stable processing and reliable video generation. Additionally, the lightweight software-based de- sign ensures low deployment cost and ease of use, making the framework practical for everyday academic use.

References

[1] I. Sommerville, Software Engineering, 9th ed., Pearson Education, 2011. [2] L. Nyame and S. Bengesi, “Generative Artificial Intelligence Trend on Video Generation,” Preprints.org, Sep. 2024, doi: 10.20944/preprints.202409.0195.v1. [3] F. Amato et al., “AI-Powered Learning: Personalizing Education for Each Student,” Unpublished Manuscript, University of Naples Federico II, Italy, 2024. [4] O. Koraishi, “Teaching English in the Age of AI: Embracing ChatGPT to Optimize EFL Materials and Assessment,” Lang. Educ. Technol., vol. 3, no. 1, pp. 55–72, May 2023. [5] T. Williams and J. Carter, “Enhancing Learning Experiences Through Automated Video Creation Using NLP and TTS Technologies,” IEEE Trans. Learn. Technol., vol. 17, no. 2, pp. 145–156, 2024. [6] S. Das and P. Mehta, “Integrating Artificial Intelligence in Digital Education: A Review of Emerging Tools and Techniques,” Educ. Inf. Technol., vol. 29, no. 1, pp. 112–130, Jan. 2024. [7] A. Kumar and V. Rao, “Automated Educational Content Generation Using Machine Learning Techniques,” Int. J. Adv. Res. Comput. Sci., vol. 15, no. 6, pp. 320–328, Jun. 2024. [8] H. Tanaka and Y. Sato, “Advances in Text-to-Speech and Summarization Models for E-Learning Systems,” J. Artif. Intell. Res., vol. 59, no. 3, pp. 450–468, 2023. [9] Python Software Foundation, “Python Documentation,” [Online]. Avail- able: https://docs.python.org/3/. [10] Google Developers, “Text-to-Speech Overview,” 2024. [Online]. Avail- able: https://cloud.google.com/text-to-speech/docs.

Copyright

Copyright © 2026 S Veerendra Swamy, S M Saranya , Sushma Kulkarni , Vaishnavi Prabhakar , G. Chandrakala. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET82212

Publish Date : 2026-05-09

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here