The volume of educational content on video platforms continues to multiply, yet extracting useful information from lengthy recordings remains a time-intensive process for learners. Viewers often confront three primary problems: extended video durations, high information density, and a scattered presentation of ideas—factors that collectively slow the learning process. We present WidViz, a desktop application that uses artificial intelligence to convert a YouTube educational clip into a structured learning module. The software integrates OpenAI\'s Whisper engine, which works offline, with the Mistral-7B language model served through Ollama to produce concise summaries and quiz items. An Electron front-end connects to a Flask back-end and a MySQL database, providing a package that delivers summarization, quiz generation, note storage, document export, progress tracking, and secure login. WidViz builds a private study space that does not require a live internet connection, helping users grasp long-form content more effectively. Initial tests show higher engagement, lower mental strain, and faster mastery compared to traditional, passive viewing.
Introduction
WidViz is a desktop-native AI tool designed to improve video-based learning by addressing the inefficiencies of traditional platforms like YouTube. Learners often struggle with discovering relevant videos, extracting usable notes, privacy concerns, and offline accessibility. WidViz solves these issues by running entirely on the user’s machine, offering offline transcription via Whisper, AI-generated summaries and quizzes via Mistral-7B, and integrated study tools including annotations, goal tracking, and document export, all with cross-platform support.
The system follows a modular architecture: content acquisition, preprocessing, local AI processing, and interaction layers. Audio is extracted from videos, transcribed locally, summarized, and converted into quizzes, with all data stored on the user’s device to ensure privacy. Pilot studies show students save significant time and retain information better, with qualitative feedback highlighting offline capability, reduced distractions, and fast access to key concepts.
Limitations include dependency on YouTube’s API, hardware constraints for older machines, inability to process visual content, and reduced transcription accuracy for non-English videos or heavy accents. Despite these trade-offs, WidViz demonstrates the potential of a privacy-preserving, offline-first AI learning tool that accelerates comprehension and active recall.
Conclusion
We built WidViz to prove a simple point: you do not need a massive cloud server to run useful AI tools. By connecting the Whisper engine directly to Mistral-7B on a standard laptop, we managed to turn passive video watching into an active study session without sending any data to the internet.
Our results show that the trade-off is worth it. While local processing is slower than the cloud, the benefits—total privacy, zero cost, and offline access—make it a viable option for students. We demonstrated that \"privacy-first\" does not mean \"dumb.\" You can have smart summaries and privacy at the same time.
References
[1] H. U. Senevirathne, K. M. D. Perera, and R. G. N. Meegama, “Transformer-based approaches for automatic text summarization,” in Proc. IEEE SCSE, 2024, pp. 134–139.
[2] Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever, “Robust speech recognition via large-scale weak supervision,” OpenAI Technical Report, 2023.
[3] Burik, “Digital tools supporting goal-setting and self-monitoring in adult education,” Adult Literacy Education, vol. 3, no. 2, pp. 25–41, 2021.
[4] S. Madkaikar, P. Joshi, and A. Sharma, “Automated video summarization using Whisper-based speech-to-text conversion,” in Proc. ICACTA, 2023, pp. 287–292.
[5] Z. Jiang et al., “Efficient large language models: The Mistral approach,” Mistral AI Technical Report, 2023.