Due to the general public\'s ignorance of sign language, communication between hearing-impaired people and the general public continues to be extremely difficult. Current systems are primarily concerned with recognizing static gestures; they are unable to comprehend continuous dynamic gestures that are used in actual communication. In this paper, a real-time continuous sign language translation system utilizing deep learning and computer vision techniques is proposed. The system uses a Long Short-Term Memory (LSTM) network to capture temporal dependencies and MediaPipe to extract hand landmark key-points. Real-time processing of gesture sequences results in meaningful text output. To enable efficient communication, the generated text is further converted into audible speech. In contrast to image-based techniques, a vector-based strategy is employed to lower computational complexity.
Introduction
This paper addresses communication barriers between hearing-impaired individuals and the general public due to limited understanding of sign language. Existing systems mainly recognize static gestures and fail to interpret continuous, real-world sign language used in natural communication.
To solve this, the study proposes a real-time continuous sign language translation system using deep learning and computer vision. The system combines MediaPipe for extracting hand landmark keypoints and an LSTM network to model temporal dependencies in gesture sequences, enabling accurate interpretation of continuous movements.
The recognized gestures are converted into text in real time and further translated into speech to support effective communication. Unlike traditional image-based approaches, the system uses a vector-based representation of hand landmarks, which reduces computational complexity and improves efficiency.
Conclusion
The proposed system successfully enables real-time continuous sign language recognition using MediaPipe and LSTM. It overcomes the limitations of static gesture-based systems by accurately capturing temporal dependencies. The model achieves high accuracy with low latency and efficient CPU-based performance. The integration of text and speech output enhances practical usability for communication. Overall, the system provides a scalable and effective assistive solution for bridging communication gaps. This paper presents a practical real-time continuous sign language translation system based on MediaPipe landmark extraction and LSTM temporal modeling. The architecture addresses key limitations of static frame-based systems and enables low-latency, offline-capable assistive communication. The modular design and lightweight feature representation make the solution suitable for academic development and real-world extension.
Future work includes sentence-level recognition with grammar correction, Transformer-based temporal modeling, multimodal fusion (hand + face + pose), larger ISL datasets with dialect diversity, on-device optimization for mobile deployment, and two-way communication interfaces healthcare emergency sign interfaces, educational accessibility platforms, and standardized real-time ISL evaluation benchmarks..(speech/text to sign avatar).For broader future projects, promising directions include federated personalization.
References
[1] A. Abdullah, N. Ali, R. H. Ali, Z. ul Abideen, A. Z. Ijaz, and A. Bais, “American Sign Language Character Recognition using Convolutional Neural Networks,” in Proc. IEEE CCECE, 2023.
[2] A. Gupta, A. Sawan, S. Singh, and S. Kumari, “Dynamic Sign Language Recognition with Hybrid CNN-LSTM and 1D-Convolutional Layers,” in Proc. IEEE ICRITO, 2024.
[3] A. Mishra, S. Gupta, D. Goel, and V. Tiwari, “ISL Recognition of Emergency Words Using MediaPipe, CNN and LSTM,” in Proc. IEEE PEEIC, 2023.
[4] B. Suvvari and P. C. Prathibhamol, “Indian Sign Language Classification using Advancement of CNN,” in Proc. IEEE IEMENTech, 2023.
[5] C. Choudhary, N.Vyas, and U.K.Lilhore, “An Optimized Sign Language Recognition Using Convolutional Neural Networks (CNNs) and Tensor-Flow,” in Proc. Int. Conf. on Technological Advancements in Computational Sciences, 2023.
[6] K. B.Tran, U.D. Nguyen, and Q. T. Huynh, “Continuous Sign Language Recognition Using MediaPipe,” in Proc. IEEE ATC, 2023.
[7] L.Y.Bin, G.Y.Huann, and L.K.Yun, “Study of Convolutional Neural Network in Recognizing Static American Sign Language,” in Proc. IEEE ICSIPA, 2019.
[8] P. Edward and B. S. W. Alexan, “Comparative Study Between CNN and LSTM Approaches for Sign Language Recognition,” in Proc. IEEE NILES, 2024.
[9] S. Xavier, V. B, and M. L. Pai, “Real-time Hand Gesture Recognition Using MediaPipe and Artificial Neural Networks,” in Proc. IEEE ICCCNT, 2023.
[10] V. K. Gurrala, J. Shruthi, S. Talasila, J. Supreeth, and R. Vaishnavi, “Real-Time Hand Gesture Recognition Using LSTM-Based Deep Learning,” in Proc. IEEE IEMENTech, 2025.