Accessibility to video-based digital content remains a persistent challenge for the visually impaired population exceeding 2.2 billion individuals worldwide. While video has become the principal medium for education and knowledge dissemination, no unified pipeline currently exists that automatically transforms spoken video content into tactile Braille output. This paper introduces an AI-based multilingual video summarization system with text-to-Braille conversion designed exclusively for visually impaired users. The framework extracts audio from input video using FFmpeg, transcribes spoken content into English text via OpenAI Whisper Large-v2, and generates a concise abstractive summary through a hybrid pipeline combining TextRank extractive pre-filtering and T5 transformer-based abstractive generation. The summarized text is subsequently encoded into both Grade 1 (uncontracted) and Grade 2 (contracted) Braille conforming to Unified English Braille standards, yielding output compatible with refreshable Braille displays and Braille embossers in BRF format. Multilingual support for English, Hindi, Marathi, and Spanish is incorporated through neural machine translation. The entire system is deployed as a WCAG 2.1 AA-compliant web application ensuring independent operability by visually impaired users. Evaluation on real-world educational video content yields ROUGE-1: 0.71, BERTScore F1: 0.84, ASR Word Error Rate: 4.8%, and Braille Character Error Rate: 1.3% for Grade 1 and 2.7% for Grade 2, confirming the system\'s effectiveness across all pipeline stages.
Introduction
The text presents a research study that addresses the accessibility gap faced by visually impaired users in accessing the rapidly growing volume of video-based educational content.
It highlights that although tools like screen readers, captioning systems, and Braille converters exist, they operate separately and do not provide a complete end-to-end solution from video to Braille. To solve this, the paper proposes an integrated AI system that converts video content into Braille output automatically.
The proposed pipeline works in four main stages:
Video processing and audio extraction
Speech-to-text transcription using OpenAI Whisper
Text summarization using a hybrid TextRank + T5 model
Braille conversion supporting both Grade 1 and Grade 2 (UEB standard)
The system also includes multilingual translation (Hindi, Marathi, Spanish) and provides a WCAG 2.1 compliant interface for accessibility.
The literature review shows that while previous research has advanced video captioning, NLP summarization, ASR, and Braille translation individually, none combine all these technologies into a single unified system.
Conclusion
This paper presented the design, implementation, and evaluation of an AI-based multilingual video summarization system with text-to-Braille conversion for visually impaired users. The proposed pipeline unifies four previously isolated technology components Whisper ASR, hybrid TextRank+T5 summarization, neural machine translation, and UEB Grade 1 and Grade 2 Braille encoding into a single automated and accessible framework. Empirical evaluation confirmed strong performance across all measured dimensions without reliance on any external proprietary training dataset, and the WCAG 2.1 AA-compliant interface ensures that the system is itself operable by the users it is designed to serve.
The system makes three distinct contributions to the field of accessible AI: it introduces the first published pipeline combining transformer-based NLP summarization with dual-grade UEB Braille encoding; it supports four languages within a single framework without requiring separate pipeline instantiations; and it demonstrates that near-production-quality Braille-accessible video summarization is achievable with lightweight, open-source components deployed without GPU infrastructure.
Future development will pursue four extensions: incorporating OCR to extract on-screen text, diagrams, and slide content from video frames; extending multilingual Braille support to Bharati Braille for Devanagari-based Indian scripts; deploying a mobile application interface for portable real-time use; and conducting formal user studies with visually impaired participants to evaluate Grade 1 versus Grade 2 preference and overall system usability in authentic learning environments.
References
[1] World Health Organization, \"Blindness and vision impairment,\" WHO Fact Sheets, 2023. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/blindness-and-visual-impairment
[2] B. Sridhar, G. Saivishnu, V. ManiShanker, D. D. Lakshmi, and S. Hariharan, \"Summarization of Video into Text and Text to Braille Script,\" in Proc. IEEE Int. Conf. on Knowledge Engineering and Communication Systems, 2024.
[3] A. Jain, R. Sharma, and P. Verma, \"Video captioning using CNN-LSTM for accessibility applications,\" Int. J. Computer Vision and Applications, vol. 11, no. 2, pp. 45–58, 2021.
[4] S. Kumar and N. Patel, \"Transformer-based summarization of educational video transcripts: A comparative study,\" in Proc. IEEE Int. Conf. Intelligent Systems, 2022, pp. 234–241.
[5] L. Zhang, W. Chen, and H. Liu, \"Multimodal video summarization integrating ASR and visual keyframe analysis,\" IEEE Trans. Multimedia, vol. 24, pp. 3112–3124, 2022.
[6] M. Gupta, A. Singh, and R. Joshi, \"Deep learning approaches to Braille translation for assistive technology,\" in Proc. ACM SIGACCESS Conf. Computers and Accessibility, 2023, pp. 89–97.
[7] C. Raffel et al., \"Exploring the limits of transfer learning with a unified text-to-text transformer,\" J. Machine Learning Research, vol. 21, no. 140, pp. 1–67, 2020.
[8] R. Mihalcea and P. Tarau, \"TextRank: Bringing order into text,\" in Proc. EMNLP, 2004, pp. 404–411.
[9] A. Radford et al., \"Robust speech recognition via large-scale weak supervision,\" in Proc. ICML, 2023, pp. 28492–28518.
[10] Liblouis Development Team, \"Liblouis: Open-source Braille translator and back-translator, Version 3.23.0,\" 2022. [Online]. Available: https://liblouis.io
[11] V. Sharma and K. S. Rao, \"Accessible video content delivery for visually impaired learners: A systematic review,\" Universal Access in the Information Society, vol. 21, no. 3, pp. 701–720, 2022.
[12] D. Bhatt, M. Joshi, and A. Kulkarni, \"Real-time assistive technology framework for audio-visual accessibility,\" in Proc. Nat. Conf. Emerging Technologies in Computer Engineering, 2023, pp. 112–118.