This paper presents a novel real-time sign language detection system designed to enhance communicationbetweenthedeafandhard-of-hearing community and non-signers. Utilizing standard web cameras, the system captures and analyses hand and facialgestures,employingadvancedcomputervision and deep learning techniques to recognize sign language gestures. Key markers corresponding to specificsigns areidentifiedandtranslatedintovoice output and on-screen text, providing a dual-output feature that fosters inclusivity and accessibility. By enabling real-time interpretation through voice and visual representation, this technology bridges communication gaps, making interactions more seamlessforsignlanguageusersandthoseunfamiliar with it.
The proposed system is adaptable for integrationintowebcamsandothercamera-equipped devices, offering potential applications across various sectors, including education and healthcare, ultimately improving understanding and interaction for sign language users.
Introduction
Purpose & Importance:
Sign language is a crucial communication method for the deaf and hard-of-hearing, but a significant communication gap exists between signers and non-signers. To address this, a real-time sign language detection system has been proposed to bridge this divide using only standard webcams and advanced AI techniques.
System Overview:
The system uses computer vision and deep learning (e.g., CNNs) to detect and recognize hand and facial gestures.
It offers dual outputs: visual (text on screen) and auditory (voice via text-to-speech), promoting inclusivity and accessibility.
It’s designed to work with commonly available devices (laptops, tablets) without the need for specialized hardware.
Literature Survey:
A review of prior research and systems utilizing CNNs, LSTMs, YOLO, Kinect sensors, Vision Transformers, and other deep learning models for gesture and sign language recognition.
These works demonstrate varied approaches, including spatial-temporal recognition, zero-shot learning, and electromyography signals, showing strong interest and progress in this field.
Problem Definition:
Sign language remains largely inaccessible to non-signers, especially in essential settings like healthcare or education.
Current systems often require specialized hardware or are not truly real-time.
The goal is to develop a practical, real-time, webcam-based system that translates sign language into both speech and text.
Methodology:
Conduct research, gather diverse datasets, preprocess inputs, and develop a CNN-based model for gesture recognition.
Facial recognition is included to improve context and accuracy.
Key markers are identified using optical flow and keypoint detection.
Functional Requirements:
Real-time gesture recognition with both text and voice output.
User customization, multi-language support, gesture training, error handling, and interaction logging.
Non-Functional Requirements:
Low latency, scalability, intuitive UI, OS/browser compatibility, data privacy, and updatability.
Modules:
Gesture Detection – Tracks hand and face movements via webcam.
Gesture Recognition – Uses AI to interpret detected gestures.
Text and Voice Output – Displays and speaks the recognized signs.
Technology Stack:
Python 3.10.6
OpenCV 4.8.0
MediaPipe 0.10.5
TensorFlow 2.13.0
pyttsx3 for text-to-speech
Results:
Includes user registration and login, dashboard, and charts showing system performance.
Compared to existing systems, the proposed model shows:
Faster and more accurate real-time translation.
Broader compatibility using standard webcams.
Dual output (text and voice).
Simplified setup and ease of use.
Conclusion
In conclusion, the proposed real-time sign language detection system significantly bridges communication gaps between the deaf and hard-of- hearing community and non-signers. By leveraging standardwebcamerasandadvancedcomputervision techniques, this system not only enhances accessibility in vital sectors such as education and healthcare but also promotes broader societal awareness of sign language. The dual-output feature—providingbothvoiceinterpretationand on- screen text—ensures effective real-time communication,enablinguserstointeractseamlessly regardless of their familiarity with sign language.
References
[1] R. Mehta, \"Hand Gesture Recognition Using Convolutional Neural Networks (CNNs),\" Project, 2023.
[2] K. S. Deshmukh, \"Real-Time Sign Language Detection Using Kinect Sensors,\" Project, 2023.
[3] M. N. Rao, \"Deep Learning-Based Sign Language Recognition Using LSTM Networks,\" Project, 2023.
[4] A. T. Bhatia, \"Sign Language Recognition Using YOLO and OpenCV,\" Project, 2023.
[5] A. Elhagry and R. G. Elrayes, \"Video-based Egyptian Sign Language Recognition using CNN and CNN-LSTM Models,\" International Journal of Machine Learning and Computing, vol. 12, no. 4, pp. 123-135, 2023.
[6] R. Rastgoo, K. Kiani, and S. Escalera, \"Zero-Shot Learning for Sign Language Recognition Using RGB-D Videos,\" IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 5, pp. 1125-1137, May 2023.
[7] M. Montazerin, S. Zabihi, and E. Rahimian, \"Hand Gesture Recognition using Vision Transformers (ViT) and Surface Electromyography (HD-sEMG) Signals,\" IEEE Transactions on Biomedical Engineering, vol. 70, no. 3, pp. 175-183, Mar. 2023.
[8] S. Sava? and A. Ergüzen, \"A Two-Stage Approach for Hand Gesture Recognition Using Transfer Learning and Deep Ensemble Learning,\" IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 6, pp. 2509-2516, June 2023.
[9] M. Islam, M. Aloraini, and S. Aladhadh, \"Stacked Encoded Deep Learning Framework for Sign Language Recognition,\" IEEE Access, vol. 11, pp. 2441-2449, 2023.
[10] H. Hu, W. Zhao, W. Zhou, and H. Li, \"SignBERT+: A Self-Supervised Pre-Training Framework for Sign Language Recognition,\" IEEE Transactions on Multimedia, vol. 25, no. 3, pp. 789-797, Mar. 2023.