Real-Time Sign Language Recognition and Translation: A Survey of Deep Learning Techniques

Authors: B. N. Madhukar, Nishanth P, Naveen Kumar V, Revansidda , Sakshi Yeli

DOI Link: https://doi.org/10.22214/ijraset.2025.75069

Abstract

Closing the communication gap between the Deaf and Hard-of-Hearing (DHH) community and hearing people has remained a key area of research in the recent past. This survey paper discusses the most current deep learning methods applied in Sign Language Recognition (SLR) and Sign Language Translation (SLT) systems. The paper examines the transition from the traditional vision-based methods to more recent neural models, such as Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), and Transformer models such as Sign Language Transformers (SLT). Hybrid models that integrate CNN and LSTM have scored remarkable accuracy rates of more than 90% in static as well as dynamic gestures. The latest YOLO and MediaPipe frameworks enable real-time detection of hand gestures through simple webcams. The paper also compares various datasets, performance metrics, and preprocessing techniques utilized for several regional sign languages, such as Indian, Arabic, and American Sign Languages. Challenges still remain in continuous sign recognition, signer variation, and dataset variation despite the major advancements. This work recommends directions for the development of fully end-to-end, real-time, and multilingual sign language interpretation systems toward inclusive human-computer interaction.

Introduction

Sign language is vital for communication within the Deaf and Hard-of-Hearing (DHH) community, but barriers exist between signers and non-signers in education, healthcare, and public services. Sign Language Recognition (SLR) and Sign Language Translation (SLT) aim to bridge this gap using AI, leveraging deep learning, computer vision, and natural language processing. Modern approaches use CNNs, LSTMs, and Transformer-based models to capture spatial and temporal features for accurate recognition and real-time translation.

Data Acquisition & Preprocessing:
Visual data is collected via cameras or sensors like Kinect, with keypoints extracted using libraries such as MediaPipe. Preprocessing includes frame extraction, ROI segmentation, normalization, and background subtraction. Data augmentation ensures robust performance under varying lighting and signer conditions. Standard datasets like ASL Alphabet, PHOENIX-2014T, and custom regional datasets are commonly used.

Literature & Models:
Recent studies highlight hybrid and Transformer-based models as the state-of-the-art:

CNN–LSTM hybrids for 3D skeletal gesture recognition and Arabic/Indian sign languages achieved accuracies >90%.
Transformers enable end-to-end recognition and translation, outperforming traditional RNNs and achieving BLEU scores over 21.
YOLOv11/v8 + MediaPipe and transfer learning approaches (ResNet50, MobileNetV2) allow high-precision, real-time recognition for ASL and Turkish Sign Language.
Comparative analyses confirm Transformer and hybrid architectures outperform CNN-only models in both isolated and continuous gesture recognition.

Advancements & Evaluation:
The field has shifted from CNN-only models to hybrid CNN–LSTM and Transformer-based architectures, which handle temporal and spatial dependencies more effectively. Metrics like accuracy, BLEU, and WER are standard for evaluation. Real-time inference on consumer-grade hardware is now feasible. Regional datasets are emerging, but PHOENIX-2014T remains a key benchmark for SLT.

Conclusion

Sign language translation and recognition have been transformed by deep learning, greatly increasing accessibility for the DHH community. Models, ranging from CNN-based classifiers to end-to-end Transformer networks, have demonstrated exceptional accuracy in multiple languages, surpassing 90%. Notwithstanding these successes, issues like signer variability, illumination, and a small number of diverse datasets still exist. For more realistic translation, future studies should concentrate on multilingual, real-time SLT systems that incorporate lip movement, facial expression, and gesture. AI-powered sign language systems have the potential to develop into effective instruments for inclusive communication with further development.

References

[1] Narayan et al., GestureNet: Real-Time Sign Language Recognition Using a Hybrid Neural Network, IJFMR, 2024. [2] A. Bayegizova et al., Effectiveness of the Use of Algorithms and Methods of Artificial Technologies for Sign Language Recognition, EEJET, 2022. [3] N. C. Camgoz et al., Sign Language Transformers, arXiv:2003.13830, 2020. [4] Y. S. N. Rao et al., Dynamic Sign Language Recognition and Translation, JTAIT, 2024. [5] T. H. Noor et al., Real-Time Arabic Sign Language Recognition, Sensors, 2024. [6] B. Alsharif et al., Real-Time American Sign Language Interpretation, Sensors, 2025. [7] M. Khan et al., Deep Learning Technology to Recognize ASL Alphabet, Sensors, 2023. [8] A. Yilmaz et al., Real-Time Sign Language Recognition Based on YOLO Algorithm, NCA, 2024. [9] P. Singh et al., Sign Language Recognition and Translation: A Multi-Modal Approach, ACL Anthology, 2023. [10] R. Patel et al., Deep Learning for Sign Language Recognition: A Comparative Review, Paradigm, 2024.

Copyright

Copyright © 2025 B. N. Madhukar, Nishanth P, Naveen Kumar V, Revansidda , Sakshi Yeli. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET75069

Publish Date : 2025-11-05

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here