ThisprojectintroducesanAI-basedreal-timeobjectdetectionandvoiceassistancesystemdesigned to support visually impaired individuals in navigating their surroundings independently and safely. The solution combines a mobile application with a portable camera module, enabling continuous environmental awareness without relying solely on the smartphone’s built-in camera. By utilizing advanceddeeplearningalgorithmssuchasYOLO orSSD,thesystemdetectsandidentifiesobjects in real time.Once an object is recognized, the application uses a Text-to-Speech (TTS) engine to provide immediate voice feedback, informing the user of the object’s identity. The integration of a portable camera enhances flexibility and usability, allowing for hands-free operation and more accurate object capture. The system is optimized for performance on mobile devices, ensuring low latency, high accuracy, and energy efficiency. This project aims to deliver a cost-effective, accessible,andpracticalassistivetechnologysolutionthatimprovesautonomy,mobility,andquality of life for visually impaired users
Introduction
Overview
The AI-Driven Vision Assistance mobile application is a smartphone-based assistive technology designed to support visually impaired users by enhancing their environmental awareness. It offers real-time object detection, text reading, and facial recognition, all conveyed through intuitive voice feedback. By leveraging deep learning and mobile-optimized computer vision algorithms, the system fosters user independence and provides an affordable, scalable alternative to expensive wearables and specialized tools.
Problem Statement
Despite advancements in AI and computer vision, most current assistive technologies for the visually impaired:
Depend on costly or bulky hardware
Lack real-time responsiveness
Are not optimized for mobile use
Do not offer natural, intuitive feedback
Thus, there is a critical need for a low-cost, mobile-based, fully offline-capable system that can perform accurate object recognition and deliver information through voice output, increasing autonomy and everyday usability.
Objectives
Build a real-time mobile app for object and text recognition using smartphone cameras.
Integrate efficient models like YOLO or SSD, optimized for mobile.
Use Text-to-Speech (TTS) for natural audio feedback.
Ensure low-latency, high-speed performance.
Design for accessibility, ease of use, and independence.
Test under various lighting and environmental conditions.
Ensure compatibility with widely available Android devices, avoiding specialized hardware.
Literature Survey Summary
A comparative review of 10 existing studies and systems reveals the following:
Common Approaches:
Wearable solutions using Raspberry Pi, Jetson TX2, or custom sensors
Object recognition, text reading, and navigation support via camera and audio feedback
Use of speech interfaces, cloud APIs, and even large vision-language models (VLMs)
Limitations Identified:
Heavy hardware dependence (glasses, smart sticks, backpacks)
High cost, poor portability, and user discomfort
Inconsistent performance under poor lighting or dynamic environments
Internet reliance for model inference or speech services
Lack of support for offline functionality and natural user interaction
Comparison to Proposed System:
The mobile app described in this project outperforms existing solutions by being:
Fully mobile-based
Offline-capable
Lightweight and portable
Cost-effective
More socially acceptable due to non-intrusive design
Research Gaps Identified
Over-reliance on Specialized Hardware: Most systems use custom or wearable setups that reduce mainstream accessibility.
Inadequate Real-Time Performance: High computational models like GPT-4 or YOLOv5 may lag on mobile devices.
Lack of Offline Usability: Many apps depend on cloud-based APIs for essential functions.
Neglect of User Comfort & Social Stigma: Wearables are less socially acceptable and less user-friendly than mobile devices.
Overcomplexity for Daily Tasks: Many systems aim for advanced scene analysis but fail at basic object or text recognition in everyday settings.
Conclusion
Thereviewofexistingliteraturehighlightssignificantadvancementsinthedomainsofcomputer vision, deep learning, and assistive technologies aimed at aiding visually impaired individuals.
Various object detection models such as YOLO, SSD, and Faster R-CNN have demonstrated high accuracyandefficiencyinreal-timeenvironments.Additionally,text-to-speech(TTS)systemsand mobile integration of these models have enabled practical and portable solutions for visual assistance.Despitetheprogress, many existing solutions either lack real-time performanceon mobile devices or fail to provide accurate contextual awareness in dynamic environments. Furthermore, accessibility,affordability,anduser-friendlinessremaincriticalchallengesinthedeploymentofsuch technologies at scale.This project seeks to bridge these gaps by developing a mobile application that leverages the smartphone camera to detect and identify objects in real time and provide audio feedback through voice commands. By integrating robust object detection algorithms with efficient TTS engines, the proposed solution aims to offer an effective, user-friendly, and real-time visual aid tool for the visually impaired community
References
[1] R.C.Joshi,N.Singh,A.K.Sharma,R.Burget,andM.K.Dutta,“AI-SenseVision:ALow- Cost Artificial-Intelligence-Based Robust and Real-Time Assistance for Visually ImpairedPeople,”IEEETransactionsonHuman-MachineSystems,vol.54,no.3,pp.325–336,Jun. 2024.
[2] M.A.Khan,P.Paul,M.Rashid,M.Hossain,andM.A.R.Ahad,“AnAI-BasedVisualAid With Integrated Reading Assistant for the Completely Blind,” IEEE Transactions on Human- Machine Systems, vol. 50, no. 6, pp. 507–516, Dec. 2020.
[3] C. Dheeraj, K. S. Reddy, and R. Rajalakshmi, “An Assistive Vision for a Visually ChallengedPersonusingAlexa, ”ProceedingsofInternationalConferenceonComputerScienceandEngineering,Sathyabama Instituteof ScienceandTechnology,Chennai,India.
[4] M.S.Qureshi,I.U.Khan,S.M.B.Qureshi,F.M.Khan,andS.Aleshaiker,“Empowering the Blind: AI-Assisted Solutions for Visually Impaired People,” Proceedings of the International Conference on Computer Engineering and Technology, 2023.
[5] S. R. Katke and U. Pacharaney, “Smart Solutions for Visual Impairment by AI-Based AssistiveDevices, ”FacultyofEngineering andTechnology,DMIHER(DU),India,2024.
[6] ]Y.S.Afridi,M.Sher,andI.Jr,“Visually:AssistingtheVisuallyImpairedPeopleThrough AI-Assisted Mobility,” International Journal of Scientific and Technology Research, vol. 13, no. 4, pp. 115–122, May 2024
[7] R.Xiang,Y.Zhao,Y.Zhang,J.Li,M.Liao,andY.Li,“VisuallyImpairedAssistancewith Large Models,” IEEE Smart World Congress (SWC), 2024.
[8] A. M. Norkhalid, M. A. Faudzi, A. A. Ghapar, and F. A. Rahim, “Mobile Assistance for VisuallyImpairedPeople–SpeechInterfaceSystem (SIS),”20208thInternationalConference on Information Technology and Multimedia (ICIMU), pp. 329–334, 2020.
[9] AbdulMajidNorkhalid1,a,MasyuraAhmadFaudz“WearableVisionAssistanceSystemBased on Binocular Sensors for Visually Impaired” IEEE Sensors Journal,2022.
[10] A. Aladrén, G. López-Nicolás, L. Puig, and J. J. Guerrero, “Navigation Assistance for the VisuallyImpairedUsingRGB-DSensorWithRangeExpansion ,”IEEESystemsJournal,vol. 10, no. 3, pp. 922–933, Sep. 2016.