Conversion of Sign Language to Speech Using CNN

Authors: G. Harikrishna, Karrolla Sangna , Meghana Kasula , Shaik Hajra Kauser , Shaik Kaleema Kouser

DOI Link: https://doi.org/10.22214/ijraset.2025.67104

Abstract

Gesture-based communication is a fundamental specialized device for people with hearing and discourse impedances, empowering them to offer viewpointsandfeelingsthroughmotions andlooks.Correspondenceobstaclesare createdwhilespeakingwith peoplewho need assistance grasping gesture-based communication,restrictingadmittanceto amazing open doors and organizations. Simulated intelligence, which changes signalbasedmotioncorrespondenceinto spokencorrespondence,caninterfacethis division,makingasmoothunderstanding cycle that grants endorsers and non- guarantors to participate logically.

Introduction

Importance of Indian Sign Language (ISL)

Indian Sign Language (ISL) is a vital communication medium for the deaf and hard-of-hearing communities, relying on hand gestures, facial expressions, and body language. However, a lack of public knowledge about ISL leads to social exclusion and limited inclusivity. Innovative solutions using artificial intelligence (AI) and machine learning (ML) aim to bridge this communication gap and promote greater accessibility.

2. Literature Survey Overview

Kumar et al. [1]

Developed a system using image processing and ML to translate sign language into speech.
Techniques like support vector machines (SVMs) and neural networks were employed.
Focused on real-time gesture recognition, but faced challenges with lighting conditions and dataset quality.

Lee et al. [2]

Proposed a real-time sign recognition app that learns from user interaction.
Aimed at bridging the communication gap between signers and non-signers in digital environments.
The system adapts to different signing styles over time but is computationally demanding.

Patel et al. [3]

Focused on translating ISL into spoken language using Recurrent Neural Networks (RNNs).
Emphasized handling ISL's complex grammar and regional nuances.
Proposed a progressive learning approach using large datasets for better context understanding.

Zhang et al. [4]

Highlighted the complexity of sign language (hand, face, body cues).
Developed a system using Convolutional Neural Networks (CNNs) to extract features from video frames.
Stressed the importance of preprocessing and real-time adaptation for better accuracy and inclusivity.

Lee et al. [5]

Created a gesture recognition system converting ISL to speech/text.
Focused on usability and accessibility, especially for people with varying tech skills.
Recognized the need for continuous feedback, iterative improvement, and addressed issues like user proficiency and symbol variation.

3. Technological Enhancements and Frameworks

The proposed framework uses deep learning for real-time ISL recognition and translation.
Utilizes the YOLO (You Only Look Once) model for fast and accurate gesture detection.
Supports fluid communication between ISL users and non-signers.

Zhang et al. [6]

Demonstrated how combining CNNs and LSTM (Long Short-Term Memory) improves continuous gesture recognition.
Emphasized spatial and temporal modeling for accurate sign interpretation.
Adapted YOLO for real-time hand gesture detection.

González et al. [7]

Developed a framework integrating gesture recognition with text-to-speech synthesis.
Highlighted the importance of multimodal systems (visual + audio input) for improved accuracy and user experience.

Tajima et al. [8]

Addressed real-world challenges like background noise, style variation, and obstructions in gesture recognition.
Called for robust, synchronized systems integrating gesture detection and speech output for seamless communication.

Conclusion

The paper employs sign language translation technologies, collectively reflecting substantial strides toward fostering communication for people who are deaf or hard of hearing. They all confirm the efficacy of machine learning and computer vision approaches to an accuracy that recognizes various sign language gestures well. Utilizing procedures like convolutional brain organizations (CNN), these frameworks catch the motions as such as well as the nuances of looks and body act, vividly adding to the semantics of communication through signing. Among the features of the examinations is thatongoinghandlingoffersframeworksthe possibility to empower interpretation during discussion. This results in real-time interactivity, one of the most significant elements in achieving understandable communication.Itreducesdelaysthatoften resultinbreaksthatdestroy theverynature of dialogue. Models are also quite flexible in learning and improving over time with new gestures to boost resilience and accuracyin specificcontexts.For instance, obliging different marking styles and varietiesin individual clientshelpssupport inclusivity, making these advances more open to a bottling works of people.Client testing results were empowering, as memberssaidtheyarenotdifficulttoutilize and gainful. Client bunches said innovations altogether improved their capacity to communicate with non- endorsers, in this way empowering great socialcooperationsandunderstanding.The investigations noted difficulties, for example, being delicate to natural circumstances, for example, lighting, foundation commotion, and different elements threatening motion acknowledgment.Othernormaldifficulties included preparing datasets of superior quality,repeatingthatthechanceoffurther developing framework execution laid solidlyontheshouldersofhavinginformation that was portrayed and isolated. The work shows that discoveries recommend a change in perspective in applying communication via gestures interpretation advancements in spanning correspondence boundaries.

References

[1] Zhang, Y., and Huang, T. S. (2008). Human Activity Acknowledgment In light of a Progressive Secret Markov Model. Human-Driven Figuring and Data Sciences,38(1),1-12.DOI:10.1186/2192-1962-1-38. [2] Diocesan, C. M. (2006). Design Acknowledgment and AI. Springer. [3] Goodfellow, I., Bengio, Y., and Courville, A. (2016). Profound Learning. MIT Press. [4] Pustokhina,I., &Pustokhin,D.(2016). Real-Time Gesture Recognition for Human-Robot Interaction: A Review. In 2016 IEEE International Conference on Robotics and Automation (ICRA), 21-26. DOI: 10.1109/ICRA.2016.7487644. [5] Wu,J.,Zhang,H.,Liu,X.,&Wang,Y. (2019). A Deep Learning Approach for Real-time Hand Gesture Recognition. In 201915thInternationalConferenceonSignal Processing (ICSP), 1-6. DOI: 10.1109/ICSP.2019.8840150. [6] Vinciarelli,A.,Dybkjaer,L.,&M.A. (2008). Incorporating Social Signals into Gesture Recognition. IEEE Transactions on Pattern Analysis and MachineIntelligence,30(10),1741-1754.DOI:10.1109/TPAMI.2008.141. [7] Sharma, P., & Singh, G. (2020). Real-Time Indian Sign Language Recognition using Deep Learning. Journal of Ambient Intelligence and HumanizedComputing,11(7),2941-2951.DOI:10.1007/s12652-020-02341-0. [8] Vail, D., Huber, L., &Nitz, E. (2015). Gesture Recognition with YOLO for Robotics Applications. Journal of Robotics, 2015, 1-12. DOI: 10.1155/2015/290371. [9] Shantharam, S. (2018). Gesture Recognition for Indian Sign Language using Deep Learning Techniques. Master\'s Thesis, Indian Institute of Technology, Delhi. [10] Kumar, A. (2020). Design and Development of an Indian Sign Language Recognition System. PhD Thesis,NationalInstituteofTechnology, Trichy. [11] What is YOLO (You Only Look Once) (2021) towards Data Science? [12] IntroductiontoSpeechRecognition: A Beginner’s Guide. (2023). Speech Technology. [13] Deep Learning for Computer Vision: A Brief Overview. (2020).Medium

Copyright

Copyright © 2025 G. Harikrishna, Karrolla Sangna , Meghana Kasula , Shaik Hajra Kauser , Shaik Kaleema Kouser. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET67104

Publish Date : 2025-02-25

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here