The AI-Powered Assistive Vision App is a mobile solution designed specifically for blind and visually impaired individuals. According to the World Health Organization (WHO), approximately 285 million people worldwide live with visual impairment, of whom 39 million are completely blind. These individuals face profound daily challenges including spatial navigation, access to printed information, digital communication, and emergency response. The app enables users to independently navigate their environment and accomplish daily tasks using only voice and gesture interaction, removing any requirement to see or visually interact with the screen. A key innovation is the Gesture Draw Interface, where users draw alphabet letters on the touchscreen to activate specific modules instantly — for example, drawing the letter \'L\' launches the Location module, which fetches the user\'s GPS coordinates and dispatches their live location to saved emergency contacts via SMS. This gesture-shortcut layer complements voice commands and provides a rapid alternative when audio interaction is impractical. The Image-to-Text module uses on-device Optical Character Recognition (OCR) to capture text from any printed or handwritten source and reads it aloud in natural speech, making written content fully accessible without sighted assistance. The Contacts and Calling module allows users to call any saved contact entirely by voice, with the AI assistant confirming the contact name and initiating the call. Developed with Flutter for cross-platform Android and iOS deployment, and supported by Node.js and MongoDB for backend services, the app integrates on-device machine learning models for fully offline core functionality, ensuring reliability in environments with poor network connectivity.
Introduction
The literature highlights the transformative role of AI, deep learning, and multimodal assistive systems in enhancing accessibility for visually impaired users. Key areas include voice-driven navigation, OCR-based text-to-speech, location sharing, and gesture-based interfaces. Existing apps (Microsoft Seeing AI, Google Lookout, Be My Eyes, BlindSquare) address individual functions but lack a unified, offline-capable, AI-driven solution integrating voice commands, gesture shortcuts, OCR, location sharing, and contact calling.
The AI-Powered Assistive Vision App fills this gap through five modules:
AI Voice Assistant – offline and online speech recognition with module-specific intent routing and FastSpeech 2 TTS.
Gesture Draw Interface – draws letters to activate modules with haptic feedback.
Image-to-Text (OCR) – captures text via camera and reads aloud, supporting multiple languages offline.
Location Sharing – sends GPS coordinates via SMS using voice, gesture, or button triggers, fully functional offline.
Contacts & Calling – voice-directed contact search and calling.
The app uses a three-tier architecture: Flutter frontend, Node.js backend, MongoDB cloud, and on-device AI for offline functionality. Evaluation with 25 visually impaired participants showed 91.7% intent recognition in quiet environments, robust OCR performance, accurate gesture recognition, and reliable location dispatch, demonstrating an integrated, efficient, and accessible solution.
References
[1] S. P. Ingale and G. R. Bamnote, \"Advancements in Image Captioning using Deep Learning Techniques,\" 2024 DMIHER International Conference on AI in Healthcare, Education and Industry (IDICAIEI), Wardha, India, 2024, pp. 1-6, doi: 10.1109/IDICAIEI61867.2024.10842893.
[2] A. Paramarthalingam et al., \"A deep learning model to assist visually impaired in pothole detection using computer vision,\" https://doi.org/10.1016/j.dajour.2024.100507.
[3] M. Mateia, L. Alboaie, \"Enhancing Accessible Navigation: A Fusion of Speech, Gesture, and Sonification for the Visually Impaired,\" Creative Commons CC BY-NC-ND 4.0.
[4] M. H. Abidi, A. N. Siddiquee, H. Alkhalefah, V. Srivastava, \"A comprehensive review of navigation systems for visually impaired individuals,\" https://doi.org/10.1016/j.heliyon.2024.e31825.
[5] H. A. Ahmad, T. A. Rashid, \"Planning the development of text-to-speech synthesis models and datasets with dynamic deep learning,\" https://doi.org/10.1016/j.jksuci.2024.102131.
[6] C.-S. Lee, J.-I. Lee, H. E. Seo, \"Deep Learning Based Mobile Assistive Device for Visually Impaired People.\"
[7] H. Sharma, S. Srivastava, \"A Framework for Image Captioning Based on Relation Network and Multilevel Attention Mechanism.\"
[8] P. Dandwate, \"Comparative Study of Transformer and LSTM Network with Attention Mechanism on Image Captioning.\"