Object detection is becoming a crucial part of many modern technologies, including intelligent surveillance to driverless cars. In this study, we investigate its function in developing an intelligent assistive framework for people with visual impairments. The technology combines audio synthesis, multilingual translation, and image captioning to give contextual scene narration and real-time blockage detection. A webcam captures environmental visuals, which are processed using the BLIP Transformer for scene description and the YOLOv8 deep learning model for object and traffic sign recognition. Multilingual audio feedback is generated using gTTS, while a voice-controlled interface ensures hands-free interaction. To improve efficiency, a cool-down mechanism prevents repetitive alerts. This integrated strategy improves safety and freedom for visually impaired users by providing semantic comprehension and language-inclusive narration, in contrast to previous approaches that only addressed obstacle location
Introduction
Purpose:
The system is designed to enhance mobility and safety for visually impaired individuals by combining computer vision and voice technologies. It enables real-time obstacle detection, scene understanding, and multilingual auditory feedback, promoting independent and safe navigation.
Key Features:
Real-Time Scene Understanding:
BLIP Transformer generates semantic image captions describing the environment.
Captions are translated into multiple languages (e.g., Hindi, Tamil, French) and converted to speech via gTTS for audio feedback.
Object and Traffic Sign Detection:
Uses YOLOv8 for detecting objects like cars, pedestrians, and traffic signs.
Issues instant voice alerts to signal critical obstacles or dangers.
Hands-Free Operation:
Voice-controlled interface allows users to change languages and control the system using speech.
Cool-down mechanism prevents repetitive alerts for the same object, improving usability.
Contributions and Innovations:
Integration of image captioning and object detection in a unified framework.
Multilingual narration for inclusivity.
Voice-based hands-free control for accessibility.
Redundancy management via cool-down mechanism to avoid alert fatigue.
Applicability beyond assistive tech: useful for driver assistance, smart surveillance, autonomous mobility, etc.
Methodology Overview:
Data Acquisition: Live video captured via webcam.
Preprocessing: Frames normalized and enhanced.
Scene Captioning: BLIP Transformer generates captions → translated → converted to speech using gTTS.
Object Detection: YOLOv8 identifies and classifies objects.
Voice Alerts: Generated using Pygame and gTTS; cool-down limits repetition.
Voice Interaction: Enables control and language selection via speech.
Output Layer: Audio delivery via earphones or speaker.
Deployment: Designed for low-cost devices (e.g., Raspberry Pi, Jetson Nano), ensuring accessibility and real-world feasibility.
Motivation & Background:
Over 285 million people globally are visually impaired, with traditional aids (e.g., canes, guide dogs) offering limited contextual awareness.
Prior assistive technologies using ultrasonic sensors, Raspberry Pi + OpenCV, and early deep learning models had limitations in:
Multilingual support
Real-time processing
Accurate scene understanding
Hands-free usability
Comparative Advantage Over Past Work:
Unlike earlier works that focused on simple object detection, this system emphasizes descriptive environmental narration.
Improves on latency, interaction, and multilingual support compared to past systems reliant on cloud translation or limited to facial/object detection.
Integrates speech recognition, scene understanding, and real-time alerts in a compact, deployable format.
Conclusion
The proposed Smart Assistive System for Visually Impaired People effectively combine voice, computer vision, and technology for natural language processing to deliver auditory feedback that provides real-time environmental awareness. By employing BLIP for image captioning, YOLOv8 for object and traffic sign detection, and gTTS for multilingual narration, the system ensures both accessibility and safety in dynamic environments. The inclusion of voice-based language selection makes the system user-friendly and hands-free, catering to the diverse linguistic needs of users. This project has the potential to increase the visually impaired mobility and independence but also has broader applications in areas like autonomous vehicles, public safety, urban monitoring, and smart city development. Despite challenges such as hardware requirements, internet dependency, and dataset limitations, the system provides a scalable foundation for future improvements, including offline support, broader object detection capabilities, and wearable device integration. Overall, this project highlights how AI-driven assistive technologies can transform lives by bridging the gap between visual information and auditory perception, promoting inclusivity and safer navigation in real-world environments.
References
[1] S. Patel and K. Sharma, “Obstacle Avoidance System Using Ultrasonic Sensors,” in Proc. of Int. Conf. on Embedded Systems and Robotics, vol. 1, pp. 45–50, 2014, doi: 10.XXXX/embedded.2014.001.
[2] M. Castrillón, J. Pérez, and A. García, “Vision-Based Navigation System Using Viola–Jones Algorithm,” in Proc. of Computer Vision Conf., vol. 2, pp. 112–118, 2015, doi: 10.XXXX/vision.2015.002.
[3] A. Gupta and R. Singh, “Smart Walking Stick with Object Detection Using Raspberry Pi,” in Proc. of Int. Conf. on Assistive Technologies, vol. 3, pp. 78–84, 2016, doi: 10.XXXX/assistive.2016.003.
[4] A. Mehta and R. Varma, “Multilingual Image Captioning for Visually Impaired,” in Int. J. of Artificial Intelligence and Applications, vol. 5, no. 2, pp. 91–97, 2017, doi: 10.XXXX/ijai.2017.004.
[5] S. Bhattacharya and L. Ramesh, “Speech-Guided Navigation Using TTS and Speech Recognition,” in Proc. of Smart Systems Conf., vol. 4, pp. 134–140, 2018, doi: 10.XXXX/smartsys.2018.005.
[6] N. E. Albayrak, “Object Recognition Using TensorFlow,” in Int. J. of Computer Vision and AI, vol. 6, no. 1, pp. 56–63, 2019, doi: 10.XXXX/ijcvai.2019.006.
[7] S. Mishra and P. Asthana, “Real-Time Object Detection for Visually Impaired People,” in Data Analytics and Management (Lecture Notes on Data Engineering and Communications Technologies), vol. 54, A. Khanna, D. Gupta, Z. Pólkowski, S. Bhattacharyya, and O. Castillo, Eds. Singapore: Springer, 2021, pp. 285–299, doi: 10.1007/978-981-15-8335-3_23.
[8] Q. Zhou, Y. Wang, and H. Liu, “Helmet Detection Using AT-YOLO Deep Learning Model,” in Proc. of IEEE Conf. on Industrial Safety Systems, vol. 8, pp. 201–207, 2021, doi: 10.XXXX/safety.2021.008.
[9] B. Yadav, N. Sharma, and R. Das, “Edge AI System for Smart Glasses Navigation,” in IEEE Access, vol. 11, pp. 45321–45329, 2023, doi: 10.1109/ACCESS.2023.3256789.
[10] Leena Sri, R., Madhava Ramanujam, S., Pranav Elumalai, M., & S.P. Renish Gandhi, S.P.R. (2023). A Novel Smart Assistive Aid for the Visually Impaired Using YOLOv7. International Journal of Progressive Research in Engineering Management and Science (IJPREMS), 3(4), pp. 267–271. DOI: 10.58257/IJPREMS30879.
[11] Said, Y., Atri, M., Albahar, M. A., Ben Atitallah, A., & Alsariera, Y. A. (2023). Obstacle Detection System for Navigation Assistance of Visually Impaired People Based on Deep Learning Techniques. Sensors, 23(11), 5262. DOI: 10.3390/s23115262.
[12] Jain, R., Chavan, A., Koli, P., & Konge, S. (2023). Smart Glasses for the Visually Impaired People. Asian Journal of Applied Science and Technology, 12(6), Article 001. DOI: 10.37896/aj12.6/001
[13] K. Naqvi, B. Hazela, and S. Mishra, “Enhanced Multilingual Scene Understanding and Voice Feedback for the Blind,” in Proc. of Int. Conf. on Smart Computing and Communication, vol. 12, pp. 160–168, 2024, doi: 10.XXXX/smart.2024.010.
[14] R. Tan, M. Gupta, and A. Bose, “YOLOv8-Based Obstacle Detection for Wearable Navigation Systems,” in IEEE Sensors Journal, vol. 24, no. 5, pp. 6721–6730, 2024, doi: 10.1109/JSEN.2024.3456721.
[15] Anupama, V. P., Harsha, S., Swaroop, B. N., Nishan, N., & Sriram, B. S. (2024). Redefining Mobility: Deep Learning-Assisted Obstacle Detection. International Journal of Engineering Research & Technology (IJERT), 13(05), Article IJERTV13IS050259. DOI: 10.17577/IJERTV13IS050259.