Over six million deaf and hard-of-hearing individuals in India rely on Indian Sign Language (ISL) as their primary mode of communication. Despite this, they continue to face significant barriers in accessing healthcare, education, and public services, largely because real-time multilingual translation tools remain unavailable at scale. Existing systems typically produce static text output in English alone. This paper presents a fully web-based ISL multilingual chatbot that addresses this gap end-to-end. The system employs a MobileNetV2 convolutional neural network trained on a dataset of hand gesture images spanning A–Z and 0–9 signs, achieving high accuracy in real-time webcam-based recognition. The recognized gesture text is passed directly to a cohere’s command language model, which generates natural, context-aware conversational responses. These responses can then be translated into multiple Indian regional languages — including Hindi, Tamil, Telugu, Kannada, Malayalam, Marathi, Bengali, and others — and delivered as both text and synthesized speech. The complete system is deployed using a FastAPI backend and a React frontend, with all components designed for low latency and real-world usability.
Introduction
The project addresses the lack of accessible AI communication systems for Indian Sign Language (ISL) users, enabling them to interact with conversational agents via hand gestures. It leverages deep learning (MobileNetV2 with transfer learning) for real-time gesture recognition using a standard webcam, converting recognized signs into text. The system integrates with a Cohere AI conversational backend to generate meaningful responses, which are then translated into major Indian languages and delivered as both text and speech. The architecture consists of a React-based frontend for webcam capture and display, a FastAPI backend for gesture prediction, chat, translation, and text-to-speech, and a gesture recognition model trained with fine-tuned MobileNetV2. The workflow involves hand tracking, gesture sequence analysis, AI inference, and multilingual output, providing a seamless, web-based, inclusive communication platform for deaf and hard-of-hearing individuals in India.
Conclusion
This paper has described the design, implementation, and deployment of a real-time Indian Sign Language chatbot that connects gesture recognition directly to a large language model conversational backend. The system is built entirely from open and freely available components — MobileNetV2 for visual recognition, Cohere AI for conversational AI, deep-translator for regional language translation, and pyttsx3 for voice output — and runs in a standard web browser without specialized hardware.
A significant portion of this work involved resolving practical engineering challenges that arise when deploying machine learning systems in real production environments: NumPy version conflicts, Keras 3 format changes, TensorFlow GPU precision issues, API SDK deprecations, and the gap between validation accuracy and real-world recognition performance. These challenges are rarely discussed in research papers but represent the majority of the work required to move from a trained model to a usable system.
The result is a functional prototype that demonstrates the feasibility of the core concept: a deaf or hard-of-hearing individual can open a browser, sign a message using ISL gestures, and receive an intelligent AI response in their preferred Indian language with voice output. Extending this prototype into a robust, widely deployable application — with dynamic gesture support, a broader ISL vocabulary, and mobile access — represents a clear and achievable path toward genuinely inclusive AI communication.
References
[1] A. Howard et al., \"MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,\" arXiv:1704.04861, 2017.
[2] M. Sandler et al., \"MobileNetV2: Inverted Residuals and Linear Bottlenecks,\" in Proc. IEEE CVPR, 2018.
[3] Google LLC, \"MediaPipe Hands: On-device Real-time Hand Tracking,\" arXiv:2006.10214, 2020.
[4] S. Gupta, R. Sharma, and A. Kumar, \"Indian Sign Language Detection for Real-Time Translation Using CNN with MediaPipe,\" arXiv preprint arXiv:2507.20414,Jul. 2025. [Online]. Available: https://arxiv.org/abs/2507.20414 [web:65]
[5] S. Hochreiter and J. Schmidhuber, \"Long Short-Term Memory,\" Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[6] TensorFlow Development Team, \"TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems,\" 2015. [Online]. Available: https://www.tensorflow.org
[7] S. Ramachandran, \"Indian Sign Language Dataset,\" Kaggle, 2021. [Online]. Available: https://www.kaggle.com
[8] FastAPI Documentation, Sebastián Ramírez, 2024. [Online]. Available: https://fastapi.tiangolo.com
[9] React Documentation, Meta Open Source, 2024. [Online]. Available: https://react.dev
[10] deep-translator Library, Nidhal Baccouri, 2023. [Online]. Available: https://github.com/nidhaloff/deep-translator
[11] R. Kadwade, A. Tangade, and N. Pakhare, \"Indian Sign Language Recognition System,\" International Journal of Engineering Research&Technology,vol.12,no.5,pp.789-798,May2023. [Online]. Available: https://www.ijert.org/indian-sign-language-recognition-system [web:63]
[12] S. Kanade et al., \"Indian Sign Language recognition system using SURF with polynomial classifier,\" Software
Impacts, vol.12, pp.100.112,Apr.2022.[Online].Available:https://www.sciencedirect.com/science/article/pii/S2590005622000121 [web:60]
[13] V. Narayanan et al., \"A Comprehensive Approach to Indian Sign Language Recognition Using Sequential LSTM with MediaPipe Holistic,\" EAI Endorsed Transactions on AI and Robotics, vol.2,no.1,pp.1-12, May 2025. [Online]. Available: https://publications.eai.eu/index.php/airo/article/view/8693 [web:69]
[14] IIT Kanpur CS365 Team, \"Indian Sign Language Character Recognition,\" CS365 Project Report, Dept. of Computer Science,Indian Institute of Technology Kanpur,2015.[Online]. Available:https://cse.iitk.ac.in/users/cs365/2015/_submissions/vinsam/report.pdf [web:66]