This work showcases the design of a real-time Indian Sign Language (ISL) to text and speech translation system for enhancing communication for those with hearing and speech disability. The system uses MediaPipe to obtain 3D hand landmark coordinates that are saved in a CSV dataset. A Feed-Forward Neural Network, namely a Multi-Layer Perceptron (MLP) in TensorFlow/Keras, is trained to identify 26 alphabetic symbols (A–Z) from 126 input features that describe the x, y, z coordinates of 21 hand landmarks per hand. Two hidden layers with dropout are used to avoid overfitting and a test accuracy of around 96% is achieved. The model is converted to TensorFlow Lite after training for integration into mobile environments in a lightweight manner. A Flask-based server exposed through Cloudflare tunnels accepts real-time landmark information from the mobile app, does inference on the TFLite model, and returns the predicted result. The app dynamically shows recognized letters, constructs words and sentences, and reads them out in speech using a Text-to-Speech (TTS) engine. Also, Swaram API integration supports multilingual audio output, thus rendering the system universally accessible to varied linguistic users. This end-to-end solution is a scalable, efficient, and accessible method of ISL recognition and speech translation
Introduction
This project addresses communication barriers faced by people with speech or hearing disabilities by developing a real-time Indian Sign Language (ISL) recognition system. Using computer vision and machine learning, it translates ISL alphabet gestures (A-Z) into text and speech, with multilingual support tailored for India’s diverse languages.
The system uses Google’s MediaPipe Hands to detect 21 hand landmarks per hand, capturing spatial coordinates to form the input features. A multi-layer perceptron (MLP) neural network model classifies these gestures with about 96% accuracy. The trained model is optimized for mobile use by converting it to TensorFlow Lite format.
A Flask backend processes the hand landmark data sent from an Android app, predicts the corresponding sign, and returns the result. The app displays the recognized text, assembles it into sentences, and converts it to audible speech via built-in TTS and the Swaram API, enabling multilingual audio output.
The project builds on previous sign language recognition research by integrating deep learning, real-time mobile compatibility, and multilingual text-to-speech, promoting inclusiveness and accessibility. The modular design ensures scalability, with future plans to extend recognition to full words and sentences, use sequence models, and incorporate wearable devices.
Conclusion
This work introduces an end-to-end system for real-time Indian Sign Language (ISL) recognition and translation to both text and speech modes for narrowing the communication gap between the general public and the hearing-impaired people. The system efficiently integrates computer vision, deep learning, and deployment on a mobile platform to provide a practical and scalable end-to-end assistive technology solution.
Centering around this system is a solid Multi-Layer Perceptron (MLP) model, which was trained on MediaPipe-extracted hand landmark data. With the ability to capture 126 landmark coordinates for hand gestures, the model had a good accuracy rate (~96.07%) in classifying 26 alphabetic signs of ISL. Through thoughtful architectural decisions like ReLU activations, dropout regularization, and sparse categorical crossentropy loss, the model performed robust generalization on unseen data. Conversion of the trained Keras model into TensorFlow Lite format allowed real-time inference on mobile devices with minimal latency, which made the system highly efficient and accessible.
The deployment pipeline leveraged a Flask backend served over Cloudflare tunnels so that communication between the Android frontend and the model backend could be smooth. The app successfully tracked hand gestures with the camera of the phone, processed them through MediaPipe, transmitted the landmarks to the server, and obtained immediate predictions and showed them on the screen. Also, the system forms words out of single-letter predictions and plays them back as audio through both Android\'s default Text-to-Speech (TTS) engine as well as the Swaram API for multilingual speech translation. It guarantees users of different linguistic heritage in India to derive the system\'s benefits.
System usability was also verified under varying lighting and hand postures and was discovered stable and consistent. Additionally, incorporation of multilingual TTS provides priceless contribution with regional language inclusion like Hindi, Tamil, Telugu, and Kannada.
Socially, the system is a low-cost, comfortable, and convenient way of offering support to hearing-impaired and speech-impaired individuals. By facilitating real-time communication without interpreters or special hardware, it can be especially effective in schools, public services, and everyday life.
Although the present model is restricted to single alphabet recognition, it provides a solid foundation for future development. Extending the dataset to cover ISL gestures for complete words, dynamic signs, and grammar would significantly enhance sentence-level translation. The addition of temporal models like LSTMs or Transformers could facilitate continuous sign stream recognition. Further enhancements in hand tracking in difficult conditions and improved error correction mechanisms would further enhance the system\'s performance.
In summary, this project effectively implements a technically feasible and socially impactful method of ISL detection and translation. The combination of light-weight deployment, deep learning, and multi-lingual ability makes it a leading contender in the competition for low-cost AI-based communication tools. With additional R&D efforts, the system can be further developed to be an entire sign language interpreter providing better accessibility and empowerment to deaf Indian masses.
References
[1] H. Garg, P. Dubey, S. Gupta, and R. Jain, \"Real-Time Conversion for Sign-to-Text and Text-to-Speech Communication Using Machine Learning,\" Proc. Int. Conf. Artif. Intell. Appl., pp. 85–96, Mar. 2024. Singapore: Springer Nature Singapore. DOI: 10.1007/978-981-97-8074-7_7.
[2] S. Sindhu, P. Reddy, and V. Kumar, \"Dynamic Gesture Recognition System for Real-Time ISL Translation,\" J. Artif. Intell. Appl., vol. 12, no. 2, pp. 45–56, 2024.
[3] R. Langote, T. Sharma, and K. Mehta, \"Fingerspelling-to-Text System with Sentiment Analysis Using RNNs,\" Int. J. Comput. Sci. Inf. Technol., vol. 16, no. 3, pp. 112–121, 2024.
[4] M. Hegde, R. Patil, and N. Rao, \"Smart Translation System for ISL Using CNN and TTS,\" Proc. IEEE Conf. Human-Computer Interact., vol. 29, no. 1, pp. 210–220, 2024.
[5] A. Damdoo and R. Kumar, \"Integrative Survey on ISL Recognition and Translation Techniques: Deep Learning Approaches,\" J. Intell. Syst. Appl., vol. 18, no. 1, pp. 35–48, 2025.
[6] P. Sharma, R. Verma, and A. Singh, \"Speech-to-ISL Translation System Using NLP Techniques,\" Int. J. Artif. Intell. Appl., vol. 15, no. 4, pp. 120–135, 2022.
[7] S. Grover, K. Patel, and M. Rao, \"Comprehensive Review of Sign Language Translation Systems,\" J. Image Process. Mach. Learn., vol. 8, no. 2, pp. 45–60, 2021.
[8] D. Sharma and R. Kumar, \"Sign-to-Speech Translation Using CNN and TTS,\" Proc. IEEE Conf. Assistive Technol., vol. 29, no. 1, pp. 210–220, 2020.
[9] R. Akshatharani and M. Manjanaik, \"Real-Time ISL Translator Using MediaPipe and CNN-LSTM,\" Int. J. Comput. Vis., vol. 12, no. 3, pp. 98–112, 2021
[10] L. Bharathi and P. Sridhar, \"Signtalk: Real-Time Sign-to-Text and Speech Conversion,\" Proc. Int. Conf. Human-Computer Interact., vol. 34, no. 2, pp. 67–79, 2021.
[11] M. Tiku and K. Reddy, \"Real-Time Sign-to-Text and Speech Conversion Using Deep Learning,\" J. Intell. Syst. Appl., vol. 18, no. 1, pp. 35–48, 2020.
[12] R. Sheela and K. Dinesh, \"ISL Translator with Real-Time Processing for Healthcare and Education,\" Int. J. Comput. Sci. Inf. Technol., vol. 16, no. 3, pp. 112–121, 2022.
[13] A. Kumar and S. Rao, \"Gesture-to-Text and Speech Translation Using Deep Learning and NLG,\" J. Signal Image Process., vol. 19, no. 2, pp. 78–92, 2022.
[14] Y. S. N. Rao et al., \"Dynamic Sign Language Recognition and Translation Through Deep Learning: A Systematic Literature Review,\" J. Theor. Appl. Inf. Technol., vol. 102, no. 21, 2024.
[15] P. Singh et al., \"Development of Sign Language Translator for Disable People in Two-Ways Communication,\" in Proc. 2023 1st Int. Conf. Circuits, Power Intell. Syst. (CCPIS), Sep. 2023, pp. 1–6.
[16] P. K. Saw et al., \"Gesture Recognition in Sign Language Translation: A Deep Learning Approach,\" in Proc. 2024 Int. Conf. Integrated Circuits, Commun. Comput. Syst. (ICIC3S), vol. 1, Jun. 2024, pp. 1–7.
[17] A. Poojary, M. Variar, R. Radhakrishnan, and G. Hegde, \"Indian Sign Language Translation For Hard-Of-Hearing And Hard-Of-Speaking Community,\" 2022.
[18] S. S. Kumar et al., \"Time series neural networks for real time sign language translation,\" in Proc. 2018 17th IEEE Int. Conf. Mach. Learn. Appl. (ICMLA), Dec. 2018, pp. 243–248
[19] [A. Kamble et al., \"Conversion of Sign Language to Text,\" Int. J. Res. Appl. Sci. Eng. Technol. (IJRASET), vol. 11, no. 5, 2023.
[20] C. O. Kumar et al., \"Real time detection and conversion of gestures to text and speech to sign system,\" in Proc. 2022 3rd Int. Conf. Electron. Sustain. Commun. Syst. (ICESC), Aug. 2022, pp. 73–78.