Vision Assist System based on Deep Learning Using Object Detection and Text Recognition with Voice Alert

Authors: P. Dinesh Kumar, D. Sathiya

DOI Link: https://doi.org/10.22214/ijraset.2025.73345

Abstract

For blind people, recognizing a product in their daily routine is critical, so we proposed a method to identify products in their daily routine using a camera. A mobility method is employed to spot item of worry from the photo by advising person about recognized objects in order to separate an object from unnecessary background. The current project\'s goal is to model an object detector for visually impaired people and other commercial purposes by recognizing objects at a specific distance. Old techniques for object recognition required a large amount of training data, which takes more time and is quite complicated. Detection is employed in a variety of situations. Traditional methods of object detection rely on a large number of datasets, and training these datasets takes a long time. Training tiny or unnoticed objects is a more difficult task. People\'s minds and visual systems detect objects more accurately and quickly in real time, and they have conscious thoughts when detecting obstacles. Because of the abundance of data available, in addition to more advanced technologies and better working algorithms, classification and detection of multiple objects within the same frame has become simple and accurate. The project\'s main objective is to develop and implement a real-time object recognition system using a real-time camera. We can use Python as the front end to implement the system in real time. The experimental results show that the proposed system improves product identification accuracy.

Introduction

Globally, visual impairment is a growing concern, with around 285 million affected individuals—39 million completely blind and 217 million with low vision. Visually impaired people face significant challenges in mobility, navigation, recognizing objects, identifying currency, and performing daily tasks independently. Issues like similar-looking currency notes (e.g., 50 BDT and 200 BDT) and difficulties identifying staircases or washrooms contribute to their dependence on others.

To address these issues, the Blind Assistant System has been proposed. This system integrates deep learning, computer vision, speech synthesis, OCR, and Convolutional Neural Networks (CNNs) to empower visually impaired users by identifying objects, reading printed text aloud, navigating environments, and recognizing products and people in real-time. It aims to enhance safety, autonomy, and quality of life.

Related Works Overview:

Several researchers have contributed to the development of assistive technologies:

Fahad Ashiq et al. proposed a real-time object recognition system using MobileNet, offering audio feedback and live tracking via a web interface.
Wenguan Wang et al. conducted a comprehensive survey on salient object detection (SOD) using deep learning and evaluated robustness and generalization across datasets.
Mehul Mahrishi et al. explored video indexing using YOLOv4 and SSIM, focusing on educational content and keyword extraction.
Usman Masud et al. developed a portable obstacle detection system with Raspberry Pi and ultrasonic sensors, achieving about 91% accuracy.
Junhyung Kang et al. reviewed object detection in aerial/satellite imagery, emphasizing real-world applications such as disaster monitoring.
Supriya V. Mahadevkar et al. reviewed various machine learning styles in computer vision, highlighting their applicability and challenges.
Xufei Wang et al. improved bounding box localization in YOLOv4 through a new loss function (LICIoU), enhancing object detection accuracy.
Sadia Zafar et al. analyzed existing assistive devices for visually impaired people, concluding that no single device offers all necessary features and encouraging the development of a more holistic solution.
Kalyani Kadam et al. proposed a Mask R-CNN with MobileNet V1 model for detecting image splicing forgeries, offering better performance than traditional ResNet variants.
Wenfeng Zheng et al. focused on enhancing image segmentation and visual reasoning using semantic representation and improved U-Net architecture.

Existing System:

Current blind-assistance systems involve:

Use of dual cameras mounted on glasses for depth perception.
GPS for grouping objects by location.
Ultrasonic sensors for obstacle detection.
Object recognition using SURF (Speeded-Up Robust Features).
Conversion of images to sound (via MATLAB), allowing blind users to "hear" their surroundings.

However, these systems have limitations in flexibility, recognition precision, and adaptability to dynamic environments.

Proposed System:

The proposed system enhances the existing model by:

Image Capture using webcams to continuously record and preprocess video frames.
Object Detection via a two-step process:
- Top-down hypothesis generation using an improved Shape Context feature for robustness.
- Verification step using CNNs for accurate object recognition.

Additionally, OCR is used to read text from images, which is then converted into speech. The system aims to offer a real-time, intuitive solution for helping blind individuals with navigation, object recognition, and text reading.

Conclusion

We display a vision system for blind people utilizing object-like images and a video scene in this project. For object recognition, this system employs machine learning. In order to identify some objects under various conditions. Detection is concerned with detecting objects within an image or video. The Object Detection API, which is based on machine learning, makes it simple to create and use a object detection model. Blind people have little data on self-velocity objects and direction, both of which are necessary for travel. The navigation systems are expensive and out of reach for most blind people. As a result, the primary goal of this project is to assist blind people. This method effectively distinguishes the target object from the back story or other objects in the camera\'s field of view.We proposed a new object localization method based on mathematical models of injury alignment and edge distributions to retrieve object areas from complex backgrounds. From captured images, the corresponding charts estimate the overall structural feature of the object. To design the feature point layouts of an enhanced image into a feature vector, block patterns are defined. This system is based on real-time camera analysis and interpretation, and it can assist blind people in recognising objects in their surroundings.

References

[1] Ashiq, Fahad, et al. \"CNN-based object recognition and tracking system to assist visually impaired people.\" IEEE access 10 (2022): 14819-14834. [2] Wang, Wenguan, et al. \"Salient object detection in the deep learning era: An in-depth survey.\" IEEE Transactions on Pattern Analysis and Machine Intelligence 44.6 (2021): 3239-3259. [3] Mahrishi, Mehul, et al. \"Video index point detection and extraction framework using custom YoloV4 Darknet object detection model.\" IEEE Access 9 (2021): 143378-143391. [4] Masud, Usman, et al. \"Smart assistive system for visually impaired people obstruction avoidance through object detection and classification.\" IEEE access 10 (2022): 13428-13441. [5] Kang, Junhyung, et al. \"A survey of deep learning-based object detection methods and datasets for overhead imagery.\" IEEE Access 10 (2022): 20118-20134. [6] Mahadevkar, Supriya V., et al. \"A review on machine learning styles in computer vision—techniques and future directions.\" Ieee Access 10 (2022): 107293-107329. [7] Wang, Xufei, and Jeongyoung Song. \"ICIoU: Improved loss based on complete intersection over union for bounding box regression.\" IEEE Access 9 (2021): 105686-105695. [8] Zafar, Sadia, et al. \"Assistive devices analysis for visually impaired persons: A review on taxonomy.\" IEEE Access 10 (2022): 13354-13366. [9] Kadam, Kalyani, et al. \"Detection and localization of multiple image splicing using MobileNet V1.\" IEEE Access 9 (2021): 162499-162519. [10] Zheng, Wenfeng, et al. \"Improving visual reasoning through semantic representation.\" IEEE access 9 (2021): 91476-91486. [11] Li, Zhichun, et al. \"Enhancing Revisitation in Touchscreen Reading for Visually Impaired People with Semantic Navigation Design.\" Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 6.3 (2022): 1-22. [12] Afif, Mouna, et al. \"Recognizing signs and doors for Indoor Wayfinding for Blind and Visually Impaired Persons.\" 2020 5th International Conference on Advanced Technologies for Signal and Image Processing (ATSIP). IEEE, 2020. [13] Bhole, Swapnil, and Aniket Dhok. \"Deep learning based object detection and recognition framework for the visually-impaired.\" 2020 Fourth International Conference on Computing Methodologies and Communication (ICCMC). IEEE, 2020. [14] Dahiya, Dhruv, Hardik Gupta, and Malay Kishore Dutta. \"A Deep Learning based Real Time Assistive Framework for Visually Impaired.\" 2020 International Conference on Contemporary Computing and Applications (IC3A). IEEE, 2020. [15] Chang, Wan-Jung, et al. \"MedGlasses: A wearable smart-glasses-based drug pill recognition system using deep learning for visually impaired chronic patients.\" IEEE Access 8 (2020): 17013-17024. [16] Chang, Wan-Jung, et al. \"A deep learning based wearable medicines recognition system for visually impaired people.\" 2019 IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS). IEEE, 2019. [17] Sufri, N. A. J., et al. \"Vision based system for banknote recognition using different machine learning and deep learning approach.\" 2019 IEEE 10th Control and System Graduate Research Colloquium (ICSGRC). IEEE, 2019. [18] Svaigen, Alisson Renan, Lailla M. Siqueira Bine, and Linnyer Beatrys Ruiz Aylon. \"An Assistive Haptic System Towards Visually Impaired Computer Science Learning.\" Proceedings of the 11th PErvasive Technologies Related to Assistive Environments Conference. 2018. [19] Tange, Yutaka, Tomohiro Konishi, and Hideaki Katayama. \"Development of vertical obstacle detection system for visually impaired individuals.\" Proceedings of the 7th ACIS International Conference on Applied Computing and Information Technology. 2019. [20] Rahman, Sami ur, Sana Ullah, and Sehat Ullah. \"A mobile camera based navigation system for visually impaired people.\" Proceedings of the 7th international conference on communications and broadband networking. 2019.

Copyright

Copyright © 2025 P. Dinesh Kumar, D. Sathiya. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET73345

Publish Date : 2025-07-24

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here