Visual impairment is a significant disability that restricts an individual’s ability to perceive, navigate, and interact with the surrounding environment independently. The rapid advancement of deep learning and computer vision has led to a growing body of research on AI-assisted perception systems for visually impaired users. However, existing studies predominantly rely on single-model object detection pipelines, which exhibit an inherent trade-off between detection accuracy and inference speed. This survey reviews ten recent works in the domain of assistive visual perception systems (compared in Table I), with a focus on object detection architectures, multimodal integration strategies, and edge deployment approaches. The reviewed works encompass a variety of methods, including YOLO-based detectors (YOLOv5 through YOLOv11), transformer-based detectors such as RF-DETR Nano, Optical Character Recognition (OCR), Speech-to-Text (STT), and Text-to-Speech (TTS) modules, as well as dedicated edge AI accelerators such as the Google Coral Edge TPU and augmented reality platforms such as the Vuzix Blade 2. The survey examines the strengths and limitations of each approach and identifies a recurring research gap in the adaptive, context-aware selection of detection models based on real-time system metrics. The analysis reveals that combining complementary detection architectures such as YOLOv8 and RF-DETR Nano under dynamic switching logic represents a promising direction for balancing detection accuracy and inference latency in real-world assistive applications. This survey aims to consolidate current knowledge, highlight technological trends, and guide future research toward more adaptive, efficient, and inclusive assistive vision solutions.
Introduction
The text reviews AI-based assistive technologies for visually impaired individuals, focusing on real-time object detection systems that help with navigation, object recognition, and reading text in the environment. It highlights the growing global need for such systems due to widespread vision impairment and the recent advances in deep learning, embedded hardware (like Raspberry Pi and Google Coral Edge TPU), and wearable devices such as smart glasses.
Most existing solutions rely on YOLO-based object detection combined with OCR, depth sensing, and audio feedback to create multimodal assistive tools. These systems are typically deployed on edge devices and aim to provide real-time environmental understanding. While newer models and hardware improve accuracy and speed, a persistent challenge remains the trade-off between computational cost and real-time performance.
The survey analyzes ten recent studies (2024–2026) and finds that although many systems achieve high detection accuracy and useful multimodal functionality, they commonly suffer from limitations such as reliance on a single detection model, lack of adaptive model switching, poor performance in low-light or complex scenes, limited battery life, and insufficient real-world user testing.
A key gap identified is the absence of context-aware, adaptive systems that can dynamically switch between lightweight and high-accuracy models (e.g., YOLO variants and transformer-based detectors like RF-DETR Nano) based on environmental and computational conditions.
Conclusion
This survey has reviewed ten recent works in the domain of AI-based assistive visual perception systems for visually impaired users, spanning the period 2024–2026. The reviewed works collectively demonstrate significant progress in the application of deep learning, edge computing, and multimodal interaction to assistive technology. Several important trends are observed from the literature.
First, YOLO-family models—from YOLOv5 to YOLOv11—have become the undisputed standard for real-time object detection in assistive systems, offering a scalable trade-off between accuracy and speed. The systematic three-way com-parison of YOLOv8 variants by Kumari and Hammady [10] confirms that YOLOv8-S is the optimal deployment choice for resource-constrained wearable hardware, achieving mAP 0.877 at an inference time of 18–21 ms per frame on aquad-core ARM platform. Second, multimodal integration—combining detection with OCR, TTS, STT, and depth sensing—is increasingly recognized as essential for delivering a complete and usable assistive experience [1], [2], [10]. Third, edge deployment on affordable single-board computers such as the Raspberry Pi has been validated as a technically feasible and cost-effective approach, with detection accuracy approaching 99% on such hardware [6]. Fourth, dedicated AI accelerators such as the Google Coral Edge TPU [9] and commercial AR wearables such as the Vuzix Blade 2 [10] are emerging as viable hardware platforms that extend beyond the Raspberry Pi ecosystem, enabling sub-100 ms inference and richer user interfaces. Fifth, the dual-processor architecture of Sharma et al. [9]—separating on-glasses time-critical inference from belt-mounted higher-level processing—provides a practical and aesthetically non-intrusive deployment pattern that achieves 90% obstacle detection accuracy at sub-100 ms latency for a device weighing under 60 g. The hardware and software components underpinning these systems are consolidated in Table II.
However, the comparative analysis of Table I reveals a consistent and critical limitation: no reviewed system incorpo-rates adaptive, context-aware model selection that dynamically responds to real-time computational and environmental con-ditions. All reviewed systems fix a single detection model at design time, precluding any runtime optimization of the accuracy-latency trade-off. Addressing this gap through hybrid detection architectures—such as a dynamic YOLOv8 / RF-DETR Nano switching controller inspired by the dual-processor pattern of Sharma et al. [9] and benchmarked against the empirical baselines of Kumari and Hammady [10]—represents the most significant open research direction identified by this survey.
References
[1] V. Moram, S. Zahruddin, and S. Kumar, “Multifunctional assistive smart glasses for visually impaired,” SN Computer Science, vol. 6, no. 2, pp. 1–12, Mar. 2025. DOI: 10.1007/s42979-025-03456-x
[2] K. Ruparelia, P. Parikh, and P. Shah, “Integrated assistive system using YOLO-based detection, depth estimation, and OCR,” American Journal of Computer Science and Technology, vol. 8, no. 1, pp. 45–58, Jan. 2025.
[3] D. C. S. Falcone, A. Brown, F. Salim, et al., “Eye-Assist: Real-time visual assistance system for the visually impaired,” in Proc. IEEE 16th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON), New York, NY, USA, 2025, pp. 112–118.
[4] P. A. Bailke, A. Gorave, O. Bhojane, et al., “VisionGuardian: Real-time multi-task detection system for visually impaired users,” in Lecture Notes in Networks and Systems, vol. 1012, Springer, Singapore, 2025, pp. 233–245.
[5] R. Kharat, S. Thepade, A. Kothawade, et al., “AI-powered smart glasses for outdoor navigation using computer vision,” in Lecture Notes in Networks and Systems, vol. 1085, Springer, Singapore, 2026, pp. 317–328.
[6] A. Noor, H. Almukhalfi, A. Souza, and T. H. Noor, “Real-time indoor object detection using YOLOv11 on Raspberry Pi 4,” Computer Modeling in Engineering & Sciences, vol. 142, no. 1, pp. 211–228, 2025. DOI: 10.32604/cmes.2025.058312
[7] M. I. Badawi, A. J. Al-Nagar, R. S. Mansour, et al., “Smart bionic vision assistive system for visually impaired individuals,” Biomedical Journal of Scientific & Technical Research, vol. 58, no. 2, pp. 47843–47851, Jun. 2024.
[8] N. Varghese, S. Agrawal, and M. K. Gupta, “Real-time object detection and navigation assistance using SSD for the visually impaired,” in Proc. 2024 International Conference on Advances in Computing and Communications (ICACC), Kochi, India, 2024, pp. 1–6.
[9] P. Sharma, A. S. Babu, M. Sadasivan, K. T. A. Robert, M. T. Vasumathi, and A. K. Ashok Kumar, “Smart glasses for the blind: A real-time AI-driven wearable system for autonomous navigation,” in Proc. 2025 International Conference on Computing and Communications (COM-PUTINGCON), Pune, India, Sep. 2025, pp. 1–7. DOI: 10.1109/COM-PUTINGCON64838.2025.11379829
[10] P. Kumari and R. Hammady, “Assisting blind people with AI and audio using smart glasses: system design with YOLOv8 variants comparisons,” Multimedia Systems, vol. 32, no. 73, Jan. 2026. DOI: 10.1007/s00530-025-02139-z