Machine Learning-Based Vision-to-Speech System for Assistive Application

Authors: Sakshi S Karande, Avinash D Harale, Kailash J. Karande

DOI Link: https://doi.org/10.22214/ijraset.2026.78787

Abstract

Individuals who are deaf or have speech disabilities often experience difficulty with communication with other individuals who do not know any form of sign language, so use of a system to enable communication through facial expressions is essential for socializing and ensuring accessible communication. Therefore, an innovative real-time sign language interpretation system utilizing current developments in embedded technology as well as advances in machine learning has been created as a way to facilitate the communication barrier between individuals with hearing loss and non-sign language users. Specifically, by integrating a camera module and microcontroller into a real time recording device, the user\'s hand movements will be captured in real time via the camera module. The user\'s signs are converted into text/voice using a pre-trained quantized MobileNet model which allows accurate interpretation of sign language signs. Implementing this existing technology provides a new alternative for individuals who are deaf or have speech impairments and their ability to communicate with people who do not understand any form of sign language. By eliminating the need for the use of electronics or external sensors, this novel communication solution has been developed to be affordable and mobile. Compact designs also provide a means for using the system in a variety of different environments, such as hospitals, public areas, schools, government offices, and so on. As well, this design offers superior scalability, allowing it to be integrated with smartphones or any type of IoT communication platform in the future, along with real-time processing abilities.

Introduction

The study focuses on developing an assistive communication system for individuals with hearing or speech disabilities using machine learning, computer vision, and embedded technologies. The proposed machine learning–based vision-to-speech (MLVTS) system captures hand gestures via a camera module, preprocesses the images, and uses a MobileNet CNN on an ESP32 microcontroller to recognize gestures. Recognized gestures are converted into text or speech output, enabling real-time communication with people unfamiliar with sign language. The system includes display, voice, and Bluetooth modules for versatile output and wireless connectivity, and is designed to be portable, cost-effective, and deployable in real-world environments such as hospitals, schools, and public service centers.

The methodology involves image acquisition, preprocessing (denoising and feature extraction), gesture recognition using deep learning, and output generation. Tools include the Arduino IDE and Embedded C for programming the microcontroller, ensuring real-time processing and integration with sensors and output devices. The system aims to enhance social inclusion, independence, and accessibility for individuals with communication disabilities.

Conclusion

The proposed Sign Language Recognition System is powered by an ESP32 chip and is intended to be affordable, effective and real-time communication systems for those who are hearing and/or speech impaired. Using an ESP32, flex sensors and an accelerometer, the system interprets the movements of the hands and fingers into either text or voice. This technology allows for enhanced accessibility and participation for individuals who have hearing loss, as well as for the general public. The lightweight, affordable nature of this system makes it an attractive option for daily use, and its advancement in assistive technologies promotes diversity and independence for all.

References

[1] Dos Santos, Aline Darc Piculo, Ana Harumi Grota Suzuki, Fausto Orsi Medola, and Atiyeh Vaezipour. \"A systematic review of wearable devices for orientation and mobility of adults with visual impairment and blindness.\" IEEE access 9 (2021): 162306-162324. [2] Ali, Anum, Priyabrata Parida, Vutha Va, Saifeng Ni, Khuong Nhat Nguyen, Boon Loong Ng, and Jianzhong Charlie Zhang. \"End-to-end dynamic gesture recognition using mmWave radar.\" IEEE Access 10 (2022): 88692-88706. [3] Kanwal, Tabassum, and Saud Altaf. \"Exploring Sensor Fusion Techniques for Enhanced Dynamic Hand Gesture Recognition: A Comprehensive Metadata Analysis.\" IEEE Sensors Reviews (2025). [4] Ahmed, Shahzad, and Sung Ho Cho. \"Machine learning for healthcare radars: Recent progresses in human vital sign measurement and activity recognition.\" IEEE Communications Surveys & Tutorials 26, no. 1 (2023): 461-495. [5] Zabihi, Soheil, Elahe Rahimian, Amir Asif, and Arash Mohammadi. \"Trahgr: Transformer for hand gesture recognition via electromyography.\" IEEE Transactions on Neural Systems and Rehabilitation Engineering 31 (2023): 4211-4224. [6] Mehta, Amit Sing, Aniket Singh, and Anil Kumar Sagar. \"Vision Assist Glasses for Visually Impaired People.\" In 2024 2nd International Conference on Networking and Communications (ICNWC), pp. 1-8. IEEE, 2024. [7] Dinelli, Gianmarco, Gabriele Meoni, Emilio Rapuano, Tommaso Pacini, and Luca Fanucci. \"MEM-OPT: A scheduling and data re-use system to optimize on-chip memory usage for CNNs on-board FPGAs.\" IEEE Journal on Emerging and Selected Topics in Circuits and Systems 10, no. 3 (2020): 335-347. [8] Mehra, Ravish, Owen Brimijoin, Philip Robinson, and Thomas Lunner. \"Potential of augmented reality platforms to improve individual hearing aids and to support more ecologically valid research.\" Ear and hearing 41 (2020): 140S-146S. [9] Liu, Debang, Tianqi Zhang, Mads Græsbøll Christensen, Chen Yi, and Zeliang An. \"Audio- visual fusion with temporal convolutional attention network for speech separation.\" IEEE/ACM Transactions on Audio, Speech, and Language Processing (2024). [10] Marakala, Vijaya, G. V. Sriramakrishnan, Geethamanikanta Jakka, Chetan J. Shingadiya, Hesti Prawita Widiastuti, and Geovanny Genaro Reivan Ortiz. \"Use of deep learning application in medical devices.\" In 2022 4th International conference on inventive research in computing applications (ICIRCA), pp. 935-939. IEEE, 2022. [11] Levine, Sergey. \"CAREER: Deep Robotic Learning with Large Datasets: Toward Simple and Reliable Lifelong Learning Frameworks.\" NSF Award Number 1651843.Directorate for Computer and Information Science and Engineering 16, no. 1651843 [12] Saini, Preeti, Jagpreet Kaur, and Shweta Lamba. \"A review on pattern recognition using machine learning.\" Advances in Mechanical Engineering: Select Proceedings of CAMSE 2020 (2021): 619-627 [13] Krug, Andreas, and Sebastian Stober. \"Gradient- Adjusted Neuron Activation Profiles for Comprehensive Introspection of Convolutional Speech Recognition Models.\" arXiv preprint arXiv:2002.08125 (2020). [14] Marakala, Vijaya, G. V. Sriramakrishnan, Geethamanikanta Jakka, Chetan J. Shingadiya, Hesti Prawita Widiastuti, and Geovanny Genaro Reivan Ortiz. \"Use of deep learning application in medical devices.\" In 2022 4th International conference on inventive research in computing applications (ICIRCA), pp. 935-939. IEEE, 2022. [15] Li, Sheng, Jaya Kawale, and Yun Fu. \"Deep collaborative filtering via marginalized denoising auto-encoder.\" In Proceedings of the 24th ACM international on conference on information and knowledge management, pp. 811-820. 2015. [16] Tan, Ke, Yong Xu, Shi-Xiong Zhang, Meng Yu, and Dong Yu. \"Audio-visual speech separation and dereverberation with a two-stage multimodal network.\" IEEE Journal of Selected Topics in Signal Processing 14, no. 3 (2020):542-553. [17] Kose, Hatice, and Pinar Uluer. \"The uses of technology in L1 and L2/Ln sign language pedagogy.\" In The Routledge Handbook of Sign Language Pedagogy, pp. 323-338. Routledge, 2019. [18] Pedersen, Nicolai Fernández. \"Audiovisual speech analysis with deep learning.\" (2021). [19] Baas, Matthew. \"Disentangled Representations in Speech Processing Applications.\" PhD diss., Stellenbosch University, 2024. [20] 20. Weng, Juyang. \"Symbolic models and emergent models: A review.\" IEEE Transactions on Autonomous Mental Development 4, no. 1 (2011): 29-53.

Copyright

Copyright © 2026 Sakshi S Karande, Avinash D Harale, Kailash J. Karande. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET78787

Publish Date : 2026-03-25

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here