Real-Time Sign Language Translator for Specially Abled

Authors: Sayed Omar Aabid, Syed Ayaan Abbas, Sudhanshu Tyagi, Utkarsh Kumar, Dr. Sonia Deshmukh

DOI Link: https://doi.org/10.22214/ijraset.2025.71570

Abstract

Advance technologies, such as Mediapipe, Pytorch, and YOLOv5, Nvidia CNN, CUDA toolkit, were used in developing the real-time sign language translator. These technologies were used to create an exactly accurate light-weight model of CNN that achieved a success rate of 95.6%. Using the Nvidia CNN and CUDA toolkit, translation of sign language digital videos in real-time was accomplished with minimal latency by accelerating the processing of CNN model. The solution has been embodied in the form of a virtual camera that will be able to translate sign language into subtitles in any video conferencing platform using OBS software so it can be useful in actual scenarios where speedy and efficient communication between individuals who are deaf or hard to hear and otherwise is needed. Overall, real-time sign language translator can have a tremendous impact on communication and accessibility of the deaf and the hard of hearing people.”

Introduction

Purpose & Motivation:

Communication is fundamental, yet language barriers—especially between hearing and deaf communities—persist. Over 70 million deaf or hard-of-hearing people rely on sign language, but interactions with non-signers are often limited. This project proposes a real-time sign language translation system to bridge this gap using YOLOv5, PyTorch, and Nvidia technologies.

System Overview:

Uses webcams to capture sign language gestures.
Applies YOLOv5 (for object detection) and PyTorch (for model training).
Incorporates MediaPipe and OpenCV for real-time hand tracking.
OBS software routes the translated video (with subtitles) to platforms like Google Meet and Microsoft Teams.
The model leverages Nvidia CUDA & CNNs for fast, GPU-accelerated inference.
A Flask-based web interface makes the tool accessible online.

Literature Review Highlights:

Several recent works explored sign language recognition using AI:

Hybrid models (DenseNet201 + MediaPipe) show improved gesture recognition.
Multi-stream CNNs (with RGB, depth, and skeletal data) outperform single-stream models.
NLP + computer vision systems translate both to and from sign language.
Computer vision with CNNs achieves 83% accuracy for alphabet signs.
Comparative ML models (SVM, KNN, CNN) address variability in hand gestures.

Proposed Architecture:

Real-time input from a webcam is processed through a YOLOv5-based CNN model.
The model, trained on sign gestures, predicts gesture meanings and outputs text subtitles.
OBS integrates the result into communication platforms as a virtual camera feed.
Performance metrics (accuracy, latency) are tested against other state-of-the-art systems.

Hardware & Software Requirements:

Hardware:

Processor (CPU/GPU), high-performance Nvidia GPU, RAM, webcam.

Software:

VS Code, Python, TensorFlow, PyTorch, OpenCV, MediaPipe, YOLOv5, OBS, Nvidia CUDA/CNNs.

Results & Performance:

Achieved 95% accuracy in sign-to-text translation.
Limitations included tracking failures when hands were out of frame or occluded.
GPU usage (vs CPU) significantly improved model responsiveness.
Future work aims to improve occlusion handling and expand the gesture dataset for broader coverage.

Conclusion

The gestures of hands also have an immense potential of use in the sphere of human-computer interaction. Vision-based hand gesture recognition methods have shown a variety of benefits compared with those of old devices. Yet, recognizing hand gestures is a problem that is rather difficult, and this research works is an exceedingly small step in the direction of achieving the desired outcomes in the process of sign language recognition. In this paper, a vision-based system was introduced, which was able to interpret the American Sign Language hand gestures and convert them to speech or text and vice versa. The proposed solution was tested in real-time circumstances, and it revealed that the classification models were able to detect all trained gestures, the user-independent feature, which is one of the main requirements for this kind of system. Combined with machine learning algorithms, the selected hand features proved to be very efficient and can be used for any real-time sign language recognition system. In future, further improvements will be made to the system and experiments shall be carried out using entire language datasets. Finally, the proposed solution is a good starting point of development of any vision-based sign language recognition user interface system. Sign language grammar is easily flexible, and the system can be adapted to teach new gestures in new languages.

References

[1] Sunitha K. A, Anitha Saraswathi.P, Aarthi.M, Jayapriya. K, Lingam Sunny, “Deaf Mute Communication Interpreter- A Review”, International Journal of Applied Engineering Research, Volume 11, pp 290-296, 2016. [2] “ A Review Paper on Sign Language Recognition for The Deaf and Dumb ” by R Rumana , Reddygari Sandhya Rani , Mrs. R. Prema., published in ijert,2021. [3] “ An Automated System for Indian Sign Language Recognition ” by Chandandeep Kaur, Nivit Gill et al., International Journal of Advanced in Research in Computer Science and Software Engineering [4] “ Hand Gesture Recognition using DenseNet201-Mediapipe Hybrid Modelling ” by Prachetas Padhi, Mousumi Daset al,. published in IEEE in 2022. [5] \"Real-Time Sign Language Translation Using Machine Learning and Deep Learning Techniques\" by S. K. Saha et al., published in the International Journal of Computer Applications in 2019. [6] \"Real-Time Sign Language Translation using Machine Learning and Deep Learning Algorithms\" by S. M. Islam et al., published in the International Journal of Advanced Computer Science and Applications in 2019. [7] M. Süzgün, H. Özdemir, N. Camgöz, A. K?nd?ro?lu, D. Ba?aran, C. Togay, and L. Akarun, “HospiSign: An interactive sign language platform for hearing impaired,” Journal of Naval Sciences and Engineering, vol. 11, no. 3, pp. 75-92, 2015. [8] J. A. Deja, P. Arceo, D. G. David, P. Lawrence, and R. C. Roque, “MyoSL: A Framework for measuring usability of two-arm gestural electromyography for sign language,” in Proc. International Conference on Universal Access in Human Computer Interaction, 2018, pp. 146-159. 50 [9] C. Ong, I. Lim, J. Lu, C. Ng, and T. Ong, “Sign-language recognition through Gesture & Movement Analysis ( SIGMA ),” Mechatronics and Machine Vision in Practice, vol. 3, pp. 232-245, 2018. N. Sandjaja and N. Marcos, “Sign language number recognition,” in Proc. 2009 Fifth International Joint Conference on INC, IMS and IDC, 2009, pp. 1503- 1508. [10] E. P. Cabalfin, L. B. Martinez, R. C. L. Guevara, and P. C. Naval, “Filipino sign language recognition using manifold learning,” in Proc. TENCON 2012 IEEE Region 10 Conference, 2012, IJCRT2309355 International Journal of Creative Research Thoughts (IJCRT) www.ijcrt.org d50 www.ijcrt.org © 2023 IJCRT | Volume 11, Issue 9 September 2023 | ISSN: 2320-2882 pp. 1-5. [11] P. Mekala, Y. Gao, J. Fan, and A. Davari, “Real-time sign language recognition based on neural network architecture,” in Proc. 2011 IEEE 43rd Southeastern Symposium on System Theory, 2011, pp. 195–199. [12] J. P. Rivera and C. Ong, “Recognizing non-manual signals in Filipino sign language,” in Proc. Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018, pp. 1-8. [13] J. P. Rivera and C. Ong, “Facial expression recognition in Filipino sign language: Classification using 3D Animation units,” in Proc. the 18th Philippine Computing Science Congress (PCSC 2018), 2018, pp. 1-8. [14] J. Bukhari, M. Rehman, S. I. Malik, A. M. Kamboh, and A. Salman, “American sign language translation through sensory glove; SignSpeak,” International Journal of u-and e-Service, Science and Technology., vol. 8, no. 1, pp. 131–142, 2015 [15] T. Chouhan, A. Panse, A. K. Voona, and S. M. Sameer, “Smart glove with gesture recognition ability for the hearing and speech impaired,” in Proc. 2014 IEEE Global Humanitarian Technology Conference-South Asia Satellite (GHTC-SAS), 2014, pp. 105-110. [16] F. R. Session, A. Pacific, and S. Africa, Senat’2117, 2014, pp. 1–3 [17] L. G. Zhang, Y. Chen, G. Fang, X. Chen, and W. Gao, “A vision-based sign language recognition system using tied-mixture density HMM,” in Proc. the 6th International Conference on Multimodal Interfaces, 2004, pp. 198-204. [18] Q. Wang, X. Chen, L. G. Zhang, C. Wang, and W. Gao, “Viewpoint invariant sign language recognition,” Computer Vision and Image Understanding, vol. 108, no. 1-2, pp. 87–97, 2007. [19] T. Starner, J. Weaver, and A. Pentland, “Real-time American sign language recognition using desk and wearable computer based video,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 12, pp. 1371-1375, 1998. [20] C. Vogler and D. Metaxas, “Handshapes and movements: Multiple-channel American sign language handshapes and movements: Multiple-channel ASL recognition,” in Proc. International Gesture Workshop, 2003, pp. 247-258.

Copyright

Copyright © 2025 Sayed Omar Aabid, Syed Ayaan Abbas, Sudhanshu Tyagi, Utkarsh Kumar, Dr. Sonia Deshmukh. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET71570

Publish Date : 2025-05-24

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here