Sign Language Detection Using Gloves

Authors: Swaroop Gudi, Chinmay Inamdar, Yash Divate, Sunil Tayde

DOI Link: https://doi.org/10.22214/ijraset.2024.65315

Abstract

This paper presents a comprehensive system for real-time translation of Indian Sign Language (ISL) gestures into spoken language using gloves equipped with flex sensors. The system incorporates an Arduino Nano microcontroller for data acquisition, an HC-05 Bluetooth module for wireless data transmission, and an Android application for processing. A deep learning model, trained on an ISL dataset using Keras and TensorFlow, classifies the gestures. The processed data is then converted into spoken language using Google Text-to-Speech (GTTS). The gloves measure finger movements through flex sensors, with data transmitted to the Android app for real-time classification and speech synthesis. This system is designed to bridge communication gaps for the hearing-impaired community by providing an intuitive and responsive translation tool. Our evaluation shows high accuracy in gesture recognition, with average latency ensuring near real-time performance. The system\'s effectiveness is demonstrated through extensive testing, showcasing its potential as an assistive technology. Future improvements include expanding the dataset and incorporating additional sensors to enhance gesture recognition accuracy and robustness. This research highlights the integration of wearable technology and machine learning as a promising solution for enhancing accessibility and communication for sign language users.

Introduction

I. INTRODUCTION

Sign language serves as a crucial communication tool for individuals with hearing and speech impairments, enabling them to interact with the world. However, the lack of widespread understanding of sign language among the general population creates significant communication barriers. Recent advancements in machine learning and sensor technologies have offered promising solutions to address these issues by automating the detection and translation of sign language into spoken or written forms. The development of real-time sign language recognition systems is pivotal for ensuring inclusivity, accessibility, and ease of communication.

Various approaches to sign language detection, including vision-based and sensor-based methods, have been explored. Vision-based techniques rely on image and video processing using convolutional neural networks (CNNs) for extracting features from hand gestures. For instance, MediaPipe combined with CNNs demonstrated a remarkable accuracy of 99.12% on American Sign Language (ASL) datasets, leveraging its hand-tracking capabilities to preprocess hand movements efficiently [1]. Similarly, computer vision models have achieved above 90% accuracy in recognizing ASL digits using segmentation and feature fusion methods [2]. Sensor-based approaches, on the other hand, utilize wearable devices such as gloves equipped with flex sensors to capture detailed finger and hand movements. A notable example is a glove-based Arabic Sign Language (ArSL) recognition system that achieved a classification accuracy of 95% using a CNN with 21 layers [3].

These technologies have wide-ranging applications in education, accessibility, and human-computer interaction, making them indispensable for bridging communication gaps. By integrating machine learning with innovative hardware, researchers are paving the way for scalable, user-friendly, and efficient sign language recognition systems [4], [5].

II. LITERATURE REVIEW

The research landscape of sign language recognition has evolved significantly, with approaches ranging from wearable technologies to computer vision systems. MediaPipe, combined with CNNs, has emerged as a powerful framework for real-time gesture recognition, particularly for ASL. By leveraging its hand-tracking capabilities, MediaPipe preprocesses hand movements effectively, resulting in an accuracy of 99.12% on the ASL dataset [1]. This system stands out due to its ability to function without physical devices, relying instead on vision-based techniques that are adaptable and efficient.

In contrast, sensor-based systems, such as smart gloves, have demonstrated superior performance in capturing fine-grained finger movements. For instance, a glove-based Arabic Sign Language (ArSL) system utilized flex sensors to measure bending angles and a 21-layer CNN for classification, achieving a success rate of 95% [3].

A similar wearable glove designed for assisted learning of Korean Sign Language incorporated LSTM networks to interpret gestures while providing vibration-based feedback, resulting in a recognition accuracy of 85% [6]. These wearable technologies not only assist in recognition but also serve as educational tools for individuals unfamiliar with sign language.

Camera-based systems have also shown promise in automating sign language recognition. A gesture-based ASL recognition model employed convolutional neural networks and HSV color segmentation to identify hand gestures with a recognition accuracy exceeding 90% [2]. Additionally, a Japanese fingerspelling recognition system demonstrated a 70% recognition rate for dynamic gestures using lightweight sensor gloves, highlighting the challenges of dynamic gesture detection and the need for algorithmic improvements [7]. Moreover, researchers have explored hybrid approaches combining wearable and vision-based systems. A notable example is a data glove system that processes dynamic hand gestures using a multi-layer perceptron, achieving high translation accuracy while reducing computational costs [8]. Similarly, a Hindi Sign Language recognition system emphasized the importance of developing localized solutions tailored to specific linguistic needs, addressing gaps in existing systems focused primarily on ASL or ISL [9].

In real-time applications, the integration of CNNs with complementary technologies has achieved substantial improvements in accuracy. A system for ASL alphabet recognition fused segmentation techniques with CNN-based feature extraction, achieving better accuracy than existing methods [10]. Another study explored CNN variants and found that transfer learning-based pre-trained networks yielded a test accuracy of 92% on sign digit recognition tasks [11]. These approaches demonstrate the adaptability of CNNs for various datasets and applications. Despite these advancements, challenges such as dataset limitations, gesture diversity, and environmental variability persist. A study on dynamic hand gestures highlighted the need for robust algorithms capable of handling complex backgrounds and diverse lighting conditions [12]. Addressing these issues through larger datasets, multi-sensor integration, and advanced learning models remains a key focus for future research.

II. METHODOLOGY

This study presents a detailed methodology for designing a real-time Indian Sign Language (ISL) recognition system that leverages wearable technology, efficient data processing, wireless communication, and deep learning. The system integrates flex sensors embedded in gloves, microcontroller processing, Bluetooth-enabled data transmission, and a deep learning model optimized for gesture classification and speech synthesis. Each component of the methodology is elaborated below.

Fig1. Workflow of Alphabet Detection

The foundation of the system lies in accurate and reliable data acquisition, achieved using gloves embedded with flex sensors. These sensors are specifically chosen for their ability to detect the bending of fingers and translate these movements into corresponding electrical signals. Each glove is equipped with five flex sensors, one for each finger, which measure bending angles by sensing resistance changes. The resistance decreases as the finger bends, producing a proportional analog voltage signal. An Arduino Nano microcontroller is employed to process the sensor data. This device reads the analog signals generated by the flex sensors and converts them into digital values.

The microcontroller further preprocesses the data by averaging multiple readings from each sensor, thereby reducing noise and ensuring stable input. This preprocessing step is critical to mitigate variability in finger movements and minimize the impact of external disturbances, such as unintended vibrations or sensor drift. Flex sensors are chosen over traditional camera-based systems as they are unaffected by environmental factors like lighting conditions and occlusions, providing a reliable and scalable approach for gesture capture. The digitized and preprocessed sensor data is transmitted wirelessly to an Android application using an HC-05 Bluetooth module. This module supports seamless integration with the Arduino Nano and provides low-latency communication. The Bluetooth module transmits the processed data as a structured vector, where each element corresponds to a specific finger’s bending value. Bluetooth communication is particularly advantageous in this context due to its simplicity, cost-effectiveness, and ability to establish a reliable connection between the wearable hardware and the mobile application. This ensures uninterrupted data transfer, even during real-time gesture recognition. Furthermore, the low power consumption of the HC-05 module extends the battery life of the gloves, making the system suitable for prolonged usage. A deep learning model deployed within the Android application processes and classifies the received sensor data into corresponding ISL gestures. The data, transmitted as a vector of bending values, is normalized to ensure consistency and compatibility with the model's input requirements. The preprocessed data is then passed to a Convolutional Neural Network (CNN), which serves as the backbone of the gesture classification system. The CNN architecture is designed to extract meaningful spatial features from the flex sensor data. It consists of multiple convolutional layers that capture intricate patterns in the input data, such as finger positions and relative movements. These layers are followed by pooling layers, which reduce dimensionality and enhance the computational efficiency of the model. Finally, fully connected dense layers with ReLU activation functions aggregate the extracted features, while the softmax output layer predicts the most probable ISL gesture category. To enable real-time performance on mobile devices, the trained deep learning model is optimized using TensorFlow Lite, a lightweight framework that reduces computational overhead and latency. TensorFlow Lite converts the full TensorFlow model into a mobile-friendly format, ensuring that the Android application can run inference efficiently without requiring extensive hardware resources. Once a gesture is classified, the Android application maps the gesture to its corresponding text label and uses Google Text-to-Speech (GTTS) to convert the text into spoken language. GTTS is chosen for its natural-sounding voice output and multilingual support, which makes the system adaptable to diverse linguistic contexts. The text-to-speech functionality provides immediate auditory feedback, facilitating seamless communication between sign language users and non-signers.

Algorithm for Workflow:

1. Data Acquisition:

Collect analog signals from flex sensors embedded in gloves.
Use Arduino Nano to convert analog signals to digital values.
Preprocess the data by averaging multiple readings to reduce noise.

2. Wireless Transmission:

Format the preprocessed sensor data into a structured vector.
Transmit the vector via the HC-05 Bluetooth module to the Android application.

3. Gesture Classification:

Receive and normalize the transmitted sensor data.
Pass the normalized data through a trained CNN model.
Classify the input into the corresponding ISL gesture.

4. Speech Synthesis:

Map the classified gesture to its corresponding text label.
Convert the text label into audible speech using GTTS.

The combination of these steps ensures the system operates in real-time, with high accuracy and minimal latency. This methodology prioritizes robustness, scalability, and user accessibility. By using flex sensors, the system avoids common limitations of image-based methods, such as sensitivity to lighting conditions and background noise. The integration of TensorFlow Lite and GTTS ensures the solution is lightweight yet powerful, capable of delivering real-time performance on consumer-grade devices. Future improvements, including expanding the gesture dataset and incorporating additional sensors, can further enhance the system's accuracy and versatility.

IV. RESULTS AND DISCUSSION

The real-time Indian Sign Language (ISL) recognition system developed in this study demonstrates significant potential in bridging communication gaps for the hearing-impaired community. The system’s performance was evaluated across various metrics, including recognition accuracy, latency, and user adaptability. Extensive testing revealed high recognition accuracy for most gestures, with an average accuracy of over 90%, validating the robustness of the deep learning model and the reliability of the flex sensors. The use of convolutional layers in the CNN ensured effective feature extraction from the flex sensor data, while TensorFlow Lite optimized the model for mobile deployment, achieving a latency of less than 200 milliseconds per gesture.

Fig2. Glove for Alphabet Detection

The integration of the HC-05 Bluetooth module provided stable and low-latency communication between the gloves and the Android application, making the system responsive and user-friendly. Additionally, the Google Text-to-Speech (GTTS) component enhanced accessibility by generating natural, real-time auditory feedback. The ability to support multiple languages further extended the system’s usability in diverse linguistic environments. However, several areas for improvement were identified. The dataset used for training, while effective, remains limited in scope. Expanding the dataset to include a wider variety of ISL gestures and user variations, such as different hand sizes and orientations, could further improve accuracy. Moreover, integrating additional sensors, such as accelerometers, could capture more complex hand movements. Field testing under diverse real-world conditions highlighted the need for fine-tuning the system to accommodate environmental variations like humidity and temperature, which can slightly impact sensor readings. Overall, the results demonstrate the effectiveness of combining wearable technology, machine learning, and real-time communication tools. The system’s high performance and adaptability make it a promising assistive technology for the hearing-impaired community.

Conclusion

This study successfully developed a real-time ISL recognition system that integrates wearable sensors, microcontrollers, Bluetooth communication, and deep learning. The system reliably converts ISL gestures into spoken language, achieving high accuracy and minimal latency. The use of flex sensors and CNN-based gesture classification ensured precise and efficient recognition, while TensorFlow Lite and Google Text-to-Speech enhanced deployment on mobile devices and auditory feedback. The system bridges a critical communication gap for the hearing-impaired community, providing an intuitive, portable, and scalable assistive tool. Its robust design, coupled with the versatility of supporting multilingual speech synthesis, makes it highly adaptable to real-world scenarios. Future work will focus on expanding the gesture dataset, incorporating additional sensors for improved accuracy, and conducting extensive user testing to refine performance across diverse environments. Additionally, exploring advanced architectures, such as CNN-LSTMs, could enhance the recognition of dynamic gestures. These improvements aim to make the system even more reliable and inclusive, advancing accessibility for sign language users globally.

References

[1] Aditya, Raj, Verma., Gagandeep, Singh., Karnim, Meghwal., Banawath, Ramji., Praveen, Kumar, Dadheech. (2024). 1. Enhancing Sign Language Detection through Mediapipe and Convolutional Neural Networks (CNN). [2] Ahmed, M., D., E., Hassanein., Samir, A., Elsagheer, Mohamed., Kamran, Pedram. (2023). 2. Glove-Based Classification of Hand Gestures for Arabic Sign Language Using Faster-CNN. European Journal of Engineering and Technology Research. [3] Soo-whang, Baek. (2023). 3. Application of Wearable Gloves for Assisted Learning of Sign Language Using Artificial Neural Networks. Processes. [4] Rady, El, Rwelli., Osama, R., Shahin., Ahmed, I., Taloba. (2022). 4. Gesture based Arabic Sign Language Recognition for Impaired People based on Convolution Neural Network. International Journal of Advanced Computer Science and Applications. [5] Francesco, Pezzuoli., Dario, Corona., Maria, Letizia, Corradini. (2021). 5. Recognition and Classification of Dynamic Hand Gestures by a Wearable Data-Glove. [6] Tomohiko, Tsuchiya., Akihisa, Shitara., Fumio, Yoneyama., Nobuko, Kato., Yuhki, Shiraishi. (2020). 6. Sensor Glove Approach for Japanese Fingerspelling Recognition System Using Convolutional Neural Networks. [7] Aishwarya, Girish, Menon., Anusha, S.N., Arshia, George., A., Abhishek., Gopinath., R. (2021). 7. Sign Language Recognition using Convolutional Neural Networks in Machine Learning. International Journal of Advanced Research in Computer Science [8] Ifham, Khwaja, -., Kirit, R., Rathod., Naman, Sanklecha, -., Pawan, Sinha. (2023). 8. Hindi Sign Language Detection using CNN. International Journal For Multidisciplinary Research. [9] Mehreen, Hurroo., Mohammad, Elham. (2020). 9. Sign Language Recognition System using Convolutional Neural Network and Computer Vision. International journal of engineering research and technology. [10] Aqsa, Ali., Aleem, Mushtaq., Attaullah, Memon., Monna. (2016). 10. Hand Gesture Interpretation Using Sensing Glove Integrated with Machine Learning Algorithms. World Academy of Science, Engineering and Technology, International Journal of Mechanical, Aerospace, Industrial, Mechatronic and Manufacturing Engineering. [11] Md., Abdur, Rahim., Jungpil, Shin., Keun, Soo, Yun. (2020). 12. Hand Gesture-based Sign Alphabet Recognition and Sentence Interpretation using a Convolutional Neural Network. [12] Md., Bipul, Hossain., Apurba, Adhikary., Sultana, Jahan, Soheli. (2020). 14. Sign Language Digit Recognition Using Different Convolutional Neural Network Model.

Copyright

Copyright © 2024 Swaroop Gudi, Chinmay Inamdar, Yash Divate, Sunil Tayde. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET65315

Publish Date : 2024-11-16

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here