Indian Sign Language (ISL) serves as a primary mode of communication for Deaf and hard-of-hearing communities in India. However, despite its societal importance, ISL remains largely unsupported by mainstream technological platforms, limiting inclusive communication. This research introduces a real-time ISL recognition andtranslationsystem thatconvert shandgesturesintocorresponding text and speech outputs, enabling phrase-level communication rather thanisolated characterinterpretation. The architecture uses a modular pipeline approach, with a Convolutional Neural Network (CNN) for accurate gesture classification, a phrase-mapping module to translate gestures into meaningful expressions, a MediaPipe for accurate hand landmark detection, and a text-to-speech (TTS) system to turn the generated text into audible speech output. Unlike previous systems restricted to static signs, our approach supports semantically rich, multi-word phrases, enhancingnatural communicationflow. Aspecially constructed dataset of ten frequently used Indian Sign Language (ISL) phrases was used to train the model. To improve generalization,150samplesfromeachclassweretakeninvarious lighting and background conditions. The final system achieved 95% classification accuracy, operated at 60 frames per second, and maintainedlatency below100milliseconds. Usabilitytestingwith multiple users confirmed the system\'s robustness, responsiveness, and accessibility. The findings demonstrate the viability of deploying deep learning-based ISL recognition systems in authentic environments, including public areas, healthcare facilities, and educational institutions.
Introduction
Communication barriers persist for Deaf and hard-of-hearing individuals, especially in societies dominated by spoken language. Indian Sign Language (ISL) is vital for bridging this gap, but existing technological solutions mostly recognize static signs or isolated words, lacking real-time, fluid phrase recognition and speech output.
This research presents a real-time ISL recognition system that translates dynamic hand gestures into full phrases with both text and speech outputs. Using computer vision (MediaPipe), deep learning (CNN), and text-to-speech (TTS) technologies, the system captures live video, extracts hand landmarks, classifies gestures into one of 10 predefined phrases, and provides immediate audio feedback. The model was trained on a dataset of 1,500 gesture samples, achieving 95% accuracy with low latency (<100 ms) and smooth performance (~60 FPS).
Key innovations include phrase-level interpretation rather than isolated signs, real-time responsiveness, and speech synthesis, addressing major gaps in current ISL recognition technologies. Limitations involve a relatively small dataset and occasional errors in poor lighting or occluded conditions. The system’s modular design and user-friendly interface make it practical for enhancing inclusivity and communication in everyday settings.
Conclusion
Thisresearchpresentsareal-timeISLgesturerecognitionsystemthat translates signed phrases into text and speech. By leveraging MediaPipe for hand tracking, CNNs for classification, and TTS for output, the system provides an inclusive communication platform for Deaf individuals.
Achieving 95% accuracy and 60 FPS performance, the prototype demonstratesstrongpotentialfordeploymentinaccessibility-focused environments.Futureworkwillfocusonincreasinggesturediversity, improving modeladaptability, and expanding deployment platforms.
References
[1] Chaudhary, R., & Dey, A. (2023). “Real-Time Indian Sign Language Recognition using CNN,” IEEE Access.
[2] Rani, N., & Kaur, H. (2022). “Enhanced ISL Recognition using Machine Learning Algorithms,” International Journal of ComputerApplications.
[3] Patil, S., & Kale, P. (2022). “Real-time ISL Recognition Using Deep Learning Models,” IJIRCCE.
[4] Srivastava, S. et al. (2024). “Continuous Sign Language Recognition Using MediaPipe Holistic,” Wireless Personal Communications.
[5] OpenAIAPIDocumentation–https://platform.openai.com/docs
[6] MediaPipe Hands–https://google.github.io/mediapipe/solutions/hands
[7] Pyttsx3Documentation –https://pyttsx3.readthedocs.io