Indian Sign Language (ISL) serves as the primary communication medium for India\'s vast deaf and hard-of-hearing community, which numbers in the millions. A severe scarcity of qualified human interpreters creates significant communication barriers, limiting access to education, healthcare, and social integration. Machine Learning (ML), and deep learning in particular, have emerged as critical technologies to bridge this gap. Traditional ML approaches provided foundational solutions but were limited in handling the high variability of visual data. With the advent of deep learning, Convolutional Neural Networks (CNNs) significantly improved the accuracy of static sign recognition, while hybrid spatiotemporal models, such as CNN-Long Short-Term Memory (LSTM) networks, began to address dynamic gestures. More recently, Transformer-based architectures have shown state-of-the-art performance in complex, continuous sign-to-text translation. This survey presents a comprehensive analysis of the evolution of ISL interpretation systems, from traditional ML classifiers to advanced deep learning frameworks. We discuss the strengths and limitations of these techniques and provide a detailed review of the critical components, including system architectures, publicly available datasets, and evaluation metrics. We highlight persistent challenges, including dataset scarcity, signer dependency, regional linguistic variations, and the crucial, often-overlooked, role of non-manual features. The study concludes by outlining open research directions, including generative data augmentation, privacy-preserving federated learning, and the integration of large language models, aimed at advancing the practical, scalable, and equitable deployment of ISL interpretation systems.
Introduction
The text provides a comprehensive overview of Indian Sign Language (ISL) recognition, emphasizing its importance for inclusive communication and the challenges caused by a lack of interpreters and high costs of traditional interpretation methods. It highlights the urgent need for automated, real-time ISL recognition systems tailored to India’s unique linguistic and cultural context, as models trained on American Sign Language (ASL) datasets often perform poorly on ISL.
The document traces the evolution of ISL recognition from early rule-based and handcrafted feature methods to modern deep learning approaches. Traditional techniques such as template matching, skin color segmentation, HMMs, and DTW laid the groundwork but struggled with scalability, environmental variability, and signer independence. The adoption of deep learning—particularly CNNs for spatial feature extraction and LSTMs/Transformers for temporal modeling—significantly improved recognition accuracy, especially for continuous signing. Hybrid CNN–LSTM and Transformer-based architectures, along with 3D CNNs, have further strengthened robustness and contextual understanding.
The text also emphasizes the role of landmark detection and multimodal data integration. Frameworks like MediaPipe and OpenPose enable real-time extraction of hand, pose, and facial landmarks, which, when combined with RGB data, improve accuracy across diverse backgrounds and signers. Emerging techniques such as graph neural networks and attention mechanisms enhance modeling of complex gesture relationships.
A major focus is placed on dataset availability and challenges. ISL is described as a low-resource language, with limited public datasets compared to other sign languages. The distinction between isolated-sign datasets and continuous-sign datasets is highlighted, with recent large-scale public datasets enabling the shift toward sentence-level recognition and translation using advanced models. Overall, the text concludes that combining deep learning, multimodal data, robust datasets, and community-centered design is essential for building accurate, scalable, and socially responsible ISL recognition systems that promote accessibility and inclusion.
Conclusion
This paper has presented a comprehensive survey on the state of Indian Sign Language interpretation using machine learning. We have charted the field\'s rapid and necessary evolution, beginning with traditional machine learning classifiers applied to isolated signs, progressing to the widespread use of deep Convolutional Neural Networks (CNNs) that achieved high-accuracy on static sign classification, and moving through to the complex spatiotemporal architectures, such as CNN-LSTM hybrids and state-of-the-art Transformers, required for dynamic and continuous sentence-level translation.
This progress has been driven by two key enablers: (1) the development of lightweight, real-time pose estimation pipelines, dominated by tools like MediaPipe, and (2) the recent, critical release of large-scale, continuous public datasets, such as ISLTranslate and iSign, which have provided the necessary data to train complex translation models.
Despite these successes, significant and fundamental challenges remain. The field is still critically hampered by a scarcity of diverse, well-annotated data. The linguistic complexities of regional dialects and the technical hurdle of signer-independent generalization—where models fail to work for new, unseen users—remain largely unsolved.
The most critical open research direction, however, is the integration of Non-Manual Features (NMFs). The vast majority of current research is overly focused on hand gestures, ignoring the facial expressions, head movements, and mouth shapes that convey essential grammatical and emotional meaning. Future systems must evolve from simple hand-trackers to holistic, multimodal frameworks that interpret the face, body, and hands in unison. By focusing on these challenges—and by leveraging emerging trends such as privacy-preserving federated learning, generative data augmentation, and the linguistic power of large language models —the research community can move closer to delivering a robust, practical, and truly equitable interpretation tool that serves the needs of the DHH community in India.
References
[1] Othman, Sign Language Varieties Around the World, in Sign Language Processing: From Gesture to Meaning, Springer, 2024, pp. 41–56.
[2] S. Renjith and R. Manazhy, \"Sign language: A systematic review on classification and recognition,\" Multimedia Tools and Applications, vol. 83, no. 31, pp. 77077–77127, Feb. 2024.
[3] N. Aloysius and M. Geetha, \"Understanding vision-based continuous sign language recognition,\" Multimedia Tools and Applications, vol. 79, nos. 31–32, pp. 22177–22209, Aug. 2020.
[4] N. Aloysius and M. Geetha, \"A review on deep convolutional neural networks,\" in Proc. Int. Conf. Commun. Signal Process., Apr. 2017, pp. 588–592.
[5] Q. Zhu, J. Li, F. Yuan, and Q. Gan, \"Multiscale temporal network for continuous sign language recognition,\" J. Electron. Imag., vol. 33, no. 2, Apr. 2024, Art. no. 023059.
[6] L. Hu, L. Gao, Z. Liu, and W. Feng, \"Scalable frame resolution for efficient continuous sign language recognition,\" Pattern Recognit., vol. 145, Jan. 2024, Art. no. 109903.
[7] R. Zuo and B. Mak, \"Improving continuous sign language recognition with consistency constraints and signer removal,\" ACM Trans. Multimedia Comput., Commun., Appl., vol. 20, no. 6, pp. 1–25, Jun. 2024.
[8] N. Aloysius, M. Geetha, and P. Nedungadi, \"Continuous sign language recognition with adapted conformer via unsupervised pretraining,\" arXiv preprint arXiv:2405.12018, 2024.
[9] M. Geetha et al., \"Toward real-time recognition of continuous Indian sign language: A multi-modal approach using RGB and pose,\" IEEE Access, vol. 73, 2025, Art. no. 3554618.
[10] K. Goyal, \"Indian Sign Language Recognition Using Mediapipe Holistic,\" arXiv preprint arXiv:2304.10256, 2023.
[11] A. H. Mohammedali, H. H. Abbas, and H. I. Shakadi, \"Real-time sign language recognition system,\" Int. J. Health Sci., vol. 6, pp. 10384–10407, 2022.
[12] K. Shenoy et al., \"Real-time Indian sign language recognition,\" in 2018 IEEE 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 2018, pp. 1–6.
[13] R. S. Sri Lakshmi et al., \"Sign language recognition system using convolutional neural network and computer vision,\" 2020.
[14] V. Puranik, V. Gawande, J. Gujarathi, A. Patani, and T. Rane, \"Video-based sign language recognition using recurrent neural networks,\" in 2022 2nd Asian Conference on Innovation in Technology (ASIANCON), IEEE, 2022, pp. 1–6.
[15] S. Shagun, V. Singh, and U. Tiwary, \"Indian Sign Language recognition system using SURF with SVM and CNN,\" Array, vol. 14, 2022, Art. no. 100141.
[16] A. Kasapbas¸i, A. Eltayeb, A. HAM, O. Al-Hardanee, and Yilmaz, \"DeepASLR: A CNN based human computer interface for American Sign Language recognition,\" Comput. Methods Biomed. Prog. Update, vol. 2, 2022, Art. no. 100048.
[17] B. Sundar and T. Bagyammal, \"American Sign Language Recognition for Alphabets Using MediaPipe and LSTM,\" Procedia Comput. Sci., vol. 215, 2022, pp. 642–651.
[18] M. Geetha, N. Aloysius, D. A. Somasundaran, A. Raghunath, and P. Nedungadi, \"Toward real-time recognition of continuous Indian sign language: A multi-modal approach using RGB and pose,\" IEEE Access, vol. 73, 2025, Art. no. 3554618.
[19] A. Vaswani et al., \"Attention is all you need,\" in Advances in Neural Information Processing Systems, vol. 30, 2017, pp. 5998–6008.
[20] A. Choudhury, A. K. Talukdar, M. K. Bhuyan, K. K. Sarma, \"Movement epenthesis detection for continuous sign language recognition,\" J. Intell. Syst., vol. 26, no. 3, pp. 471–481, 2017.
[21] M. G. Ghosh, D. Ghosh, and P. Bora, \"Continuous hand gesture segmentation and co-articulation detection,\" in Computer Vision, Graphics, and Image Processing, Springer, 2006, pp. 564–575.
[22] G. Wu and Y. Yang, \"Deep learning approaches for sign language recognition: A survey,\" IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 3, pp. 693–711, Mar. 2020.
[23] S. Kumar and R. R. Singla, \"Sign language recognition using deep learning,\" Int. J. Comput. Vis., vol. 128, no. 7, pp. 1617–1637, 2020.
[24] T. Zhang, J. Gao, and L. Li, \"Sign language recognition with multi-modal fusion and deep neural networks,\" Pattern Recognition Letters, vol. 151, 2022, pp. 112–119.
[25] H. Mnassri, R. Bchir, M. A. Zayane, and T. Ladhari, \"Sign Language Detection Based on Artificial Intelligence from Images,\" in IEEE International Conference on Artificial Intelligence Green Energy (ICAIGE), 2024.
[26] G. Jessica Ruslim, N. Salim, I. Edbert, and D. Suhartono, \"Sign Language Detection to Enhance Online Communication,\" 2024.
[27] IJRASET, \"Sign Language Recognition System using Machine Learning Techniques,\" 2025.
[28] System development reports and datasets discussing real-time ISL gesture recognition systems.
[29] V. Puranik, V. Gawande, J. Gujarathi, A. Patani, and T. Rane, \"Video-based sign language recognition using recurrent neural networks,\" in 2022 2nd Asian Conference on Innovation in Technology (ASIANCON), 2022, pp. 1–6.
[30] K. Shenoy et al., \"Real-time Indian sign language recognition,\" in 2018 IEEE 9th International Conference on Computing, Communication and Networking Technologies (ICCCNT), 2018, pp. 1–6.