Communication barriers among hearing impaired and non-signing individuals remain a major challenge in India due to limited use of Indian Sign Language (ISL). Although many gesture recognition systems exist, most are unidirectional, rely on expensive hardware such as sensor-equipped gloves, or rely on cloud-based processing that limits use in rural areas. This paper proposes a real-time, fully offline, two-way ISL translation system using a low-cost Raspberry Pi 5. The solution integrates MediaPipe-based hand landmark tracking, CNN-based motion recognition, and offline speech-to-text, using the VOSK engine to convert ISL characters to text and spoken audio to text. The 7-inch LED screen provides immediate feedback to both hard-of-hearing and hearing-impaired users. The system achieved 92% accuracy for the static ISL alphabet, 95% accuracy for numeric gestures, and less than 1 second latency, demonstrating its feasibility as a portable and accessible communication aid.
Introduction
This study presents a portable, low-cost, offline bidirectional Indian Sign Language (ISL) translation system designed to bridge communication gaps between deaf and hearing individuals. More than five million people in India use ISL, but most hearing people cannot understand it, creating barriers in education, healthcare, and public services. Existing translation systems often support only one-way communication, depend on expensive wearable devices, focus on non-ISL sign languages, or require internet connectivity.
The proposed system addresses these limitations by enabling two-way communication using affordable hardware and offline processing. It consists of two main modules:
ISL-to-Text/Speech Module
Uses a camera and MediaPipe to detect 21 hand landmarks in real time.
A lightweight Convolutional Neural Network (CNN) recognizes ISL alphabets, numbers, and common gestures.
Recognized gestures are converted into text and optionally speech output.
Speech-to-ISL Module
Uses a USB microphone and the VOSK offline speech recognition engine.
Converts spoken language into text without requiring internet access.
Displays the recognized text on-screen for ISL users.
The system runs entirely on a Raspberry Pi 5, making it portable, energy-efficient, and suitable for deployment in rural and low-connectivity environments. Software components include Python, OpenCV, MediaPipe, CNN-based gesture recognition, VOSK speech recognition, and pyttsx3 text-to-speech.
Key Features
Fully offline operation
Low-cost and portable design
No wearable gloves or sensors required
Real-time bidirectional communication
Supports ISL alphabets, numbers, and common gestures
Suitable for schools, healthcare centers, and public service settings
Results
Testing on a Raspberry Pi 5 showed strong performance:
ISL Alphabet Recognition: 92% accuracy, 0.21 s latency
ISL Numeric Recognition: 95% accuracy, 0.18 s latency
Speech-to-Text: 90% accuracy, 0.4–1 s latency
MediaPipe performed consistently across different skin tones and backgrounds, though accuracy decreased by about 6% in low-light conditions. The system operated reliably for extended periods without overheating.
Conclusion
This paper presents a real-time, offline, two-way Indian Sign Language (ISL) translation system built on a Raspberry Pi 5, designed to bridge the communication gap between deaf and hearing people. Leveraging MediaPipe for hand and gesture tracking and VOSK for offline speech recognition, the system allows for seamless sign-to-text and speech-to-text translation without relying on sensors or an Internet connection. Future improvements aim to extend the capabilities of dynamic motion detection using LSTM or Transformer models, support for continuous signal sequences, rich vocabulary training, more robust noise-resistant speech-to-text performance, and deployment as an Android mobile app. Overall, the system provides an affordable, scalable solution suitable for classrooms, hospitals, government offices and rural communities across India.
References
[1] P. Jadhav, R. Dhok, “Real-Time ISL Recognition Using CNN,” IEEE, 2021.
[2] V. Priyadharshini et al., “ISL to Speech Using CNN and TTS,” IRJET, 2020.
[3] P. Bhardwaj et al., “Static ISL Gesture Recognition Using Deep Learning,” Springer, 2020.
[4] R. Nandhini, T. S. Kumar, “Hand Gesture Recognition Using MediaPipe,” IJSREM, 2022.
[5] Sneha R. et al., “Real-Time Translator for ISL Alphabets,” IJCA, 2021.
[6] F. Zhang, V. Bazarevsky, I. Grishchenko, Y. Raveh, T. Vakunov, M. Grundmann and S. J. K. Arora, “MediaPipe Hands: On-Device Real-Time Hand Tracking,” arXiv preprint arXiv:2006.10214, 2020.
[7] A. Graves, Supe[1] P. Jadhav, R. Dhok, “Real-Time ISL Recognition Using CNN,” IEEE, 2021.
[2] V. Priyadharshini et al., “ISL to Speech Using CNN and TTS,” IRJET, 2020.
[3] P. Bhardwaj et al., “Static ISL Gesture Recognition Using Deep Learning,” Springer, 2020.
[4] R. Nandhini, T. S. Kumar, “Hand Gesture Recognition Using MediaPipe,” IJSREM, 2022.
[5] Sneha R. et al., “Real-Time Translator for ISL Alphabets,” IJCA, 2021.
[6] F. Zhang, V. Bazarevsky, I. Grishchenko, Y. Raveh, T. Vakunov, M. Grundmann and S. J. K. Arora, “MediaPipe Hands: On-Device Real-Time Hand Tracking,” arXiv preprint arXiv:2006.10214, 2020.
[7] A. Graves, Supervised Sequence Labelling with Recurrent Neural Networks, Studies in Computational Intelligence, vol. 385, Springer, 2012.
[8] A. G. Howard et al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” arXiv preprint arXiv:1704.04861, 2017.
al., “MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications,” arXiv preprint arXiv:1704.04861, 2017.