Real-Time Sign Language Recognition in Digital Meetings is an innovative solution designed to bridge the communication gap for individuals with hearing or speech impairments, fostering inclusivity and accessibility in digital communication. By integrating advanced technologies such as deep learning, WebRTC, and text-to-speech synthesis, this system enables the seamless conversion of sign language gestures into audible speech in real time, ensuring effective interaction between verbal and non-verbal participants. The solution leverages deep learning models to accurately recognize and interpret sign language gestures, while WebRTC facilitates low-latency, real-time communication, making it suitable for virtual meetings and online collaboration platforms. Furthermore, the integration of text-to-speech synthesis ensures that recognized gestures are transformed into clear, audible output, allowing for fluid and natural communication. This approach not only enhances accessibility for individuals with hearing or speech challenges but also creates an inclusive environment where all participants can engage and collaborate effortlessly. By addressing the communication barriers faced in digital meetings, this technology has the potential to revolutionize virtual interactions, making them more equitable and universally accessible for diverse users.
Introduction
Overview
The project addresses a critical accessibility gap in video conferencing by enabling real-time conversion of sign language gestures into audible speech, thus improving inclusivity for users with hearing or speech impairments. This is especially relevant in the wake of the widespread adoption of digital meeting platforms like Zoom, Microsoft Teams, and Google Meet post-COVID-19.
Key Technologies and Goals
WebRTC: Ensures seamless, low-latency video and audio communication between users.
Machine Learning Gesture Recognition: A trained model detects hand gestures from video frames with high accuracy, even under varying conditions.
Text-to-Speech (TTS): Converts recognized gestures into spoken audio for communication with hearing participants.
WebSocket: Facilitates real-time, bidirectional communication between front-end and server for data transfer.
Main Objectives
Enable real-time video/audio streaming using WebRTC.
Build a machine learning model to accurately recognize sign language gestures.
Implement Text-to-Speech for translating recognized signs into natural-sounding speech.
Promote digital inclusivity for users with hearing or speech impairments.
Related Work Highlights
Hybrid Deep Learning Models (e.g., LSTM + YOLOv6): Achieved high accuracy in recognizing both static and dynamic signs.
Challenges: Include variability in signer movements, lighting, gesture complexity, and real-time system adaptability.
Proposed System Architecture
Data Capture: Captures video via user’s camera and streams it to the backend using WebSocket.
Backend Processing:
Gesture Detection Model analyzes video frames.
TTS Engine converts detected gestures to audio.
Streaming Feedback: Audio and visual responses are returned to users through WebRTC for real-time interaction.
Loop & Automation: Operates continuously for the duration of the meeting, alerting users to issues like "No Sign Detected."
Scalability: Optimized with server clustering and load balancing for multiple users.
Workflow Summary
User joins a meeting → camera activates → frames captured.
Frames sent to server → gestures recognized → audio output returned.
If no gesture is detected, user is notified to retry.
System runs continuously until the meeting ends.
System Modeling
Sequence Diagrams: Visualize the flow from video capture to speech output.
UML Use Case Diagram:
Actors: User, Prediction Model.
Use cases: Start meeting, capture/process frame, detect sign, convert to speech, notify, and output audio.
Results & Evaluation
The system performed reliably across diverse scenarios:
? Accurate Detection of dynamic gestures like “Hello”
? Robust Performance under network latency
? Failsafe Handling when camera input is missing
? Graceful Response to unknown or unclear gestures
? Continued Functionality in low-quality visual input
Conclusion
This project effectively bridges communication barriers in digital meetings, enabling real-time, inclusive interaction between sign language users and verbal participants. The system combines AI, real-time streaming, and speech synthesis to make virtual communication more equitable, setting a precedent for accessibility in digital platforms.
References
[1] Romala Sri Lakshmi Murali, L.D. Ramayya, V. Anil Santosh: “Sign Language Recognition System Using Convolutional Neural Network and Computer Vision” December 2022, SSN (Online) 2582-1431 ISSN, PP 137-142
[2] Ahmed MateenButtar, Usama Ahmad, Abdu H. Gumaei, Adel Assiri, Muhammad Azeem Akbar, Bader FahadAlkhamees: “Deep Learning in Sign Language Recognition: A Hybrid Approach for the Recognition of Static and Dynamic Signs” August 2023, DOI 10.17148/IJARCCE.2020.9607 SSN (Online) 2278-1021 ISSN.
[3] RoshneeMatlani, RoshanDadlani, SharvDumbre, Shruti Mishra, Mrs. AbhaTewari: “Real-Time Sign Language Recognition Using Machine Learning and Neural Network” January 2022, ISSN 2582-7421.