An AI-Based Sign Language Recognition System for Real-Time Text and Speech Conversion

Authors: Geetha Sri Bolla, Krishnaveni Ampolu, U. Sai Sri, P. Tejeswararao, Y. Rohini, B. Sai, A. Pavithra

DOI Link: https://doi.org/10.22214/ijraset.2026.78342

Abstract

Communication between hearing-impaired individuals and people who do not understand sign language remains a major challenge in society. Sign language is an effective method of communication for deaf and mute individuals, but most people are not familiar with it. This communication gap often limits social interaction, education opportunities, and access to services for the hearing-impaired community. This paper presents a Sign Language to Text and Speech Recognition System that converts hand gestures into readable text and audible speech using computer vision and machine learning techniques. The system captures hand gestures through a camera, processes the images using image processing techniques, and recognizes the gestures using trained machine learning models. The recognized gestures are then converted into text and synthesized speech output. The proposed system aims to bridge the communication gap between deaf-mute individuals and the general public by providing real-time gesture recognition and translation. The system is designed to be user-friendly, accurate, and efficient, making communication easier and more accessible.

Introduction

1. Background and Problem

Communication is essential, but hearing- and speech-impaired individuals rely on sign language, which most people cannot understand.
This creates barriers in education, healthcare, public services, and employment.
Technological solutions are needed to automatically interpret sign language and convert it into text and speech for easier communication.

2. Proposed Solution

A real-time system using computer vision and machine learning to detect hand gestures and convert them into:
- Text output
- Synthesized speech
Key components of the system:
1. Image Acquisition – Webcam captures video frames of hand gestures.
2. Image Pre-processing – Noise reduction, greyscale conversion, resizing, background removal.
3. Hand Detection & Segmentation – Isolates hand region using skin detection, contour analysis, or color detection.
4. Feature Extraction – Identifies characteristics like hand shape, finger positions, edges, and contours.
5. Gesture Classification – Machine learning or deep learning models (e.g., CNN) predict the corresponding gesture.
6. Text Generation – Recognized gesture is mapped to its textual representation.
7. Speech Conversion – Text-to-speech (TTS) engine produces audible output.
8. Real-Time Output Display – Text and speech are presented immediately for smooth communication.

3. Literature Survey

Early methods used sensor gloves (accurate but costly and inconvenient).
Vision-based systems (camera + image processing) became more practical and cost-effective.
Advances include:
- Skin color segmentation for gesture detection
- Depth-based features using 3D sensors (Kinect)
- Animated avatars for bidirectional speech-sign translation
- CNNs and machine learning models for improved recognition
Most research highlights real-time, accurate, and accessible solutions for the deaf community.

4. System Architecture

Modules: Image Acquisition → Pre-processing → Feature Extraction → Gesture Classification → Text & Speech Output
Uses computer vision libraries (e.g., OpenCV) and deep learning models (e.g., CNN) for high recognition accuracy.
Provides real-time gesture detection and conversion to text and speech.

5. Implementation

Frontend: Captures user input through webcam.
Backend: Processes frames using Python, OpenCV, and machine learning models.
Text-to-Speech Conversion: Uses libraries like pyttsx3 to generate audible output.
Real-time output allows simultaneous visual (text) and auditory (speech) communication.

6. Results & Discussion

System achieves 90–95% accuracy for trained gestures.
Performs well under controlled lighting and clear gestures.
Limitations include:
- Rapid gestures
- Complex or cluttered backgrounds
- Poor lighting
Future improvements:
- Larger, more diverse datasets
- Advanced deep learning models
Overall, the system effectively bridges communication gaps, enabling hearing-impaired individuals to interact with people who do not understand sign language.

7. Advantages

Real-time gesture recognition
Converts gestures into both text and speech
Improves accessibility and independence for hearing-impaired users
Reduces reliance on human interpreters
Provides a user-friendly, technology-driven solution for daily communication needs

Conclusion

The Sign Language to Text and Speech Recognition System provides an effective solution for improving communication between hearing-impaired individuals and the general public. The proposed system utilizes computer vision and machine learning techniques to recognize hand gestures captured through a camera and convert them into meaningful text and speech output. This approach helps bridge the communication gap by enabling real-time interpretation of sign language gestures. By using image processing techniques such as preprocessing, gesture detection, feature extraction, and classification, the system can accurately recognize different sign language gestures. The integration of machine learning models and libraries such as Python and OpenCV improves the efficiency and accuracy of gesture recognition. The generated text and speech output allow users who do not understand sign language to easily interpret the intended message. In the future, the system can be enhanced by increasing the dataset size, improving recognition accuracy using advanced deep learning models, and supporting a larger vocabulary of sign gestures. Additionally, integrating mobile and real-time applications can further expand its usability and accessibility. The system can also be improved by supporting continuous sign language recognition instead of recognizing only individual gestures, which would allow more natural and fluent communication. Furthermore, incorporating multilingual text-to-speech functionality could help users communicate in different languages, making the system more versatile and globally applicable.

References

[1] Anuja V. Nair, Bindu. V, “A Review on Indian Sign Language Recognition”, International journal of computer applications, Vol. 73, pp: 22, (2013). [2] Archana S. Ghotkar, Rucha Khalat, Sanjana Khupase, Surbhi Asanti & Mithila Hada, “Hand Gesture Recognition for Indian Sign Language”, IEEE International Conference on Computer Communication and Informatics (lCCCI), pp: 1-4, (2012). [3] Hu Peng, “Application Research on Face Detection Technology based on Open CV in Mobile Augmented Reality”, International Journal of Signal Processing, Image Processing and Pattern Recognition, Vol. 8, No. 2 (2015). [4] J. Rekha, J. Bhattacharya, and S. Majumder, “Shape, Texture and Local Movement Hand Gesture Features for Indian Sign Language Recognition”, IEEE 3rd International Conference on Trend in Information Sciences & Computing (TISC2011), pp. 30-35, (2011) [5] Meenakshi Panwar, “Hand Gesture Recognition based on Shape Parameters” International Conference on Computing, Communication and Application (ICCCA), pp: I-6, IEEE, (2012). [6] Sree, S. V. D. T., Mogili, U. M. R., & Ampoly, K. V. (2025) Enhancing Security in Wearable Computing: A Lightweight Authenticated Key Exchange Scheme, International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211, Volume 13, Issue 5, pp 3103-3108. [7] Anjali, S., Mogili, U., & Ampolu, K. V. (2025) Efficient Key-Based Encryption and Authentication for Advanced Digital Forensic Storage Security, International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211, Volume 13, Issue 5, pp 3097-3102. [8] Adithya, P. U., Mogili, U., & Mondru, J. T. (2025) A Novel Parity Authenticator-Based Zero-Knowledge Auditing Approach for Secure Cloud Data Management, International Journal of All Research Education and Scientific Methods (IJARESM), ISSN: 2455-6211, Volume 13, Issue 5, pp 994-999. [9] Kanakala Pranay Raj, Umamaheswararao Mogili. (2020), “Cloud-of-Cloud: A Novel Protocol for Secure Data Storage and Sharing in Multi-Cloud Environment”, Journal of Interdisciplinary Cycle Research (JICR), Volume XII, Issue VI, pp 2201-2209, DOI:18.0002.JICR.2020.V12I6.008301.3171227. [10] Mogili, U., Mohamed, A., & Kasup, C. (2023, December). Mechanism of Data Sharing Using Secured Keyword Search in Cloud Computing. In Conference of Innovative Product Design and Intelligent Manufacturing System (pp. 483-494). Singapore: Springer Nature Singapore. [11] Pravin R Futane, Rajiv V Dharaskar, “Hasta Mudra an interpretation of Indian sign hand gestures”, IEEE 3rd International Conference on Electronics Computer Technology, Vol.2, pp:377-380, (2011). [12] Prof. Rajeshri Rahul Itkarkar, “A Study of Vision Based Hand Gesture Recognition for Human Machine Interaction”, International Journal of Innovative Research in Advanced Engineering, Vol. 1, pp:12, (2014). [13] Rajam, P. Subha and Dr G Bala Krishnan, \"Real Time Indian Sign Language Recognition System to aid Deaf and Dumb people\", 13thInternational Conference on Communication Technology (ICCT),pp. 737-742, (2011). [14] Ruize Xu, Shengli Zhou, Li, W.J, “MEMS Accelerometer Based Nonspecific-User Hand Gesture Recognition”, IEEE Sensors Journal, vol.12, no.5, pp.1166-1173, (2012). [15] Tokuda, K.; Nankaku, Y.; Toda, T.; Zen, H.; Yamagishi, J.; Oura, K., “Speech Synthesis Based on Hidden Markov Models”, in Proceedings of the IEEE, vol.101, no.5, pp.1234-1252, (2013). [16] Mogili, U., Ampolu, K. V., Rajasekharam, B., & Timothy, M. J. AI-Driven Interaction in AR Environments, in Journal of Digital Economy, 2024, Volume 3, Issue 1, pp. 228-234. [17] Timothy, M. J., Rajasekharam, B., Ampolu, K. V., & Mogili, U. Threat Detection Using AI in Cybersecurity Systems, in IJIS, 2023, Volume 7, Issue 1, pp. 1-7. [18] Ampolu, K.V., Mogili, U., Timothy, M. J., & Rajasekharam, B. Machine Learning Models for Predictive Maintenance, in IJIS, 2022, Volume 6, Issue 4, pp. 1-7. [19] Rajasekharam, B., Timothy, M. J., Mogili, U., Ampolu, K.V., Machine Learning Models for Predictive Maintenance, in JDE, 2023, Volume 2, Issue 2, pp. 95-101. [20] Soujania, B., Ampolu, K. V., Timothy, M. J., & Mogili, U. (2025) Classifying Disease Information Forums through Semantic Similarity-Based Machine Learning, Science, Technology and Development Journal, Volume XIV, Issue II, pp 67-75. [21] B Satish Kumar, Kavitha C., Mogili, U.R., S. Pallam Shetty (2022). “Application of Machine Learning To Enhance the Performance of The Prophet Routing Protocol For Delay Tolerant Networks”. Journal for Basic Sciences, Volume 23, Issue 5, 2107-2116, DOI:10.37896/JBSV23.5/2278. [22] I. Sree Geeta, Umamaheswararao Mogili. (2022), “Use of Several Machine Learning Algorithms for Effective Prediction of Cyberbullying”, International Journal of Creative Research Thoughts, Volume 10, Issue 6, pp 17. [23] Mogili, U., & Mohamed, A. (2023, November). Artificial intelligence and machine learning in the fields of education, medical, and smart phones. In AIP conference proceedings (Vol. 2917, No. 1, p. 050012). AIP Publishing LLC.

Copyright

Copyright © 2026 Geetha Sri Bolla, Krishnaveni Ampolu, U. Sai Sri, P. Tejeswararao, Y. Rohini, B. Sai, A. Pavithra. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET78342

Publish Date : 2026-03-15

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here