Deep Learning - Based Real Time Sign Language Translator Using YOLO

Authors: Noorul Moufica M, Meena S, Parkavi P, Indhumathi A, Selvashanthi S

DOI Link: https://doi.org/10.22214/ijraset.2025.70838

Abstract

This paper presents a comprehensive Real-Time Sign Language Translator that bridges the communication gap between hearing-impaired individuals and the hearing population. The proposed system integrates two major functionalities: (1) real-time detection of alphabets and numbers using a YOLOv8-based object detection model, and (2) sentence-level gesture recognition along with speech-to-text translation using deep learning and natural language processing techniques. The system utilizes YOLOv8, OpenCV, and MediaPipefor visual detection and classification of hand signs, and *Flask for creating a responsive web-based interface. In the first module, the system supports the recognition of 36 static signs, including A–Z alphabets and digits 0–9. In the second module, 32 sentence gestures are recognized in real time and translated into meaningful sentences. Additionally, the system captures voice input using a microphone, converts it to text via speech recognition, and translates it into Tamil or Hindi. The platform includes functionality for live detection, screenshot capture, video recording, and multilingual output display, offering an inclusive and practical solution for accessible communication.

Introduction

This paper presents a low-cost, real-time, hardware-free system designed to bridge the communication gap between the hearing/speech impaired and the general public using AI and machine learning. The system recognizes sign language gestures and converts them into text and speech in English, Tamil, and Hindi, with a web-based interface for practical deployment in education, social interaction, and assistive technologies.

???? System Overview

The solution is composed of two core modules:

Static Gesture Recognition: Uses YOLOv8 to detect hand signs for alphabets (A–Z) and numbers (0–9).
Sentence Gesture Recognition & Voice Input: Recognizes 32 predefined sentence-level gestures and also supports speech-to-text functionality using Google Speech API.

The system is implemented using Python, Flask, OpenCV, MediaPipe, and Ultralytics YOLOv8, and runs on a user-friendly web platform.

???? Literature Review Highlights

Deep learning techniques like CNNs, GoogLeNet, LSTMs, and HMMs have been used in past sign language systems.
Emphasis has been placed on improving accuracy, speed, and real-time usability.
Time-series models like LSTM are ideal for recognizing dynamic gestures.

?? Methodology

A. Datasets

Alphabet & Number Dataset: 36 gesture classes (A–Z, 0–9)
Sentence Dataset: 32 commonly used phrases (e.g., "Hello", "Thank You")

B. System Architecture

Client Side: Web interface with webcam/microphone input
Server Side: Handles detection, voice recognition, translation, and UI updates

C. Gesture Recognition

Live input processed using YOLOv8, which outputs letters, numbers, or sentences.

D. Speech Recognition

Converts spoken input to English text, then filters and translates it.

E. Output Integration

Real-time display of detected or transcribed content, translated into Tamil or Hindi.

F. Deployment

Runs locally using Flask with intuitive buttons for live detection, screenshots, and translation.

???? Results & Analysis

System Setup: Core i5 CPU, 8GB RAM, Windows 10
Gesture Recognition:

Accuracy: 95%
Inference time: 0.08–0.12 seconds/frame
Confusions noted in similar gestures (e.g., ‘M’ vs ‘N’)

Speech-to-Text:

Accuracy: 90–95% in quiet environments
WER: 5–10%
Accuracy drops to 85% in noisy areas

System Performance:

Real-time frame rate: 12–18 FPS (on CPU)
Translation delay: <1.5 seconds
Multilingual support: English, Tamil, Hindi
User feedback: Interface rated intuitive and responsive

?? Challenges

Gesture variation and signing speed can affect accuracy.
Low lighting or cluttered backgrounds hinder detection.
Real-time speed is limited on systems without GPU support.

???? Future Enhancements

Dynamic Gesture Recognition using RNN/LSTM for continuous signs.
Mobile App Integration for portability via Flutter or React Native.
Expanded Vocabulary with regional gestures and grammar rules.
Offline Mode & Edge Computing for use in low-connectivity areas.
More Language Support beyond Tamil and Hindi.
User Feedback Loop for adaptive learning and improved accuracy.

Conclusion

This paper proposed a real-time sign language translator combining gesture and speech input to bridge communication barriers. Using YOLOv8 for gesture detection and speech recognition for voice input, the system delivers accurate, multilingual translation through a web-based interface. It achieves real-time performance without external hardware and supports alphabets, numbers, and sentence gestures. With high usability and promising results, the system offers an effective step toward inclusive communication for the hearing-impaired.

References

[1] A. I. Singh, B. Mathai, S. Silas, and J. B. Princess, “Real-Time Sign Language Translator for Deaf and Mute,” in Proc. Int. Conf. on Electronics, Robotics and Computer Science (ICERCS), 2023, doi: [10.1109/ICERCS57948.2023.10433971] (https://doi.org/10.1109/ICERCS57948.2023.10433971). [2] S. Thakar, S. Shah, B. Shah, and A. V. Nimkar, “Sign Language to Text Conversion in Real Time using Transfer Learning,” arXiv preprint, arXiv:2211.14446v1 \\[cs.CV], 2022. \\[Online]. Available: [https://arxiv.org/abs/2211.14446] (https://arxiv.org/abs/2211.14446) [3] P. K. Saw, N. Nancy, S. Gupta, A. Raj, S. Chauhan, and K. Agrawal, “Gesture Recognition in Sign Language Translation: A Deep Learning Approach,” in Proc. ICIC3S, 2024, doi: [10.1109/ICIC3S61846.2024.10603225] (https://doi.org/10.1109/ICIC3S61846.2024.10603225). [4] E. B. Setiawan, A. Darmawan, and B. Herdiana, “Static Sign Language Translator Using Hand Gesture and Speech Recognition,” JMSI, vol. 10, no. 2, 2024, doi: [10.46754/jmsi.2024.10.002] (https://doi.org/10.46754/jmsi.2024.10.002). [5] M. Papatsimouli et al., “Real Time Sign Language Translation Systems: A Review Study,” in Proc. MOCAST, 2022, doi: [10.1109/MOCAST54814.2022.9837666] (https://doi.org/10.1109/MOCAST54814.2022.9837666). [6] S. Dhulipala, F. F. Adedoyin, and A. Bruno, “Sign and Human Action Detection Using Deep Learning,” J. Imaging, vol. 8, no. 7, p. 192, 2022, doi: [10.3390/jimaging8070192] (https://doi.org/10.3390/jimaging8070192). [7] S. Mhatre, S. Joshi, and H. B. Kulkarni, “Sign Language Detection using LSTM,” in Proc. IEEE CCET, Bhopal, India, 2022, pp. 1–6, doi: [10.1109/CCET56606.2022.10080705] (https://doi.org/10.1109/CCET56606.2022.10080705). [8] M. S. Amin and S. T. H. Rizvi, “Sign Gesture Classification and Recognition Using Machine Learning,” Cybernetics and Systems, 2023, doi: [10.1080/01969722.2022.2067634] (https://doi.org/10.1080/01969722.2022.2067634). [9] J. Gangrade and J. Bharti, “Vision-based Hand Gesture Recognition for Indian Sign Language Using CNN,” IETE J. of Research, 2023, doi: [10.1080/03772063.2020.1838342] (https://doi.org/10.1080/03772063.2020.1838342). [10] M. Al-Hammadi et al., “Spatial Attention-Based 3D Graph Convolutional Neural Network for Sign Language Recognition,” Sensors, vol. 22, no. 12, p. 4558, 2022, doi: [10.3390/s22124558] (https://doi.org/10.3390/s22124558).

Copyright

Copyright © 2025 Noorul Moufica M, Meena S, Parkavi P, Indhumathi A, Selvashanthi S. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET70838

Publish Date : 2025-05-12

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here