American Sign Language (ASL) is a language consisting of hand signs, gestures, facial expressions, and body movements. This paper focuses on building a real-time system that can recognise ASL hand gestures using deep learning and computer vision. The primary goal is to assist individuals with hearing or speech difficulties in communicating more easily by translating their hand signs into text and speech. This system uses a webcam to capture live hand movements and applies Google\'s MediaPipe technology to track 21 key points on the hand. Instead of using raw images, the system works with these points to better understand each gesture. A deep learning model is trained using a combination of CNN (Convolutional Neural Network) to understand the shape of the hand and LSTM (Long Short-Term Memory) networks to know how the hand moves over time. Each gesture is seen as a sequence of 30 frames, which helps the model learn how signs are made in real life. Once a gesture is recognised, the system shows the result on a simple and easy-to-use interface created with Tkinter. It can also read the recognised words aloud using a text-to-speech feature, making the system helpful even for users with visual challenges.
Introduction
This paper presents a real-time American Sign Language (ASL) recognition system designed to reduce communication barriers between people who use sign language and those who do not understand it. The system uses a webcam and Google's MediaPipe framework to track 21 hand landmarks and capture hand movements. These landmarks are processed by a deep learning model combining LSTM-based sequence learning (with hand landmark data collected over 30 frames) to recognize ASL gestures accurately.
A custom dataset containing all 26 ASL alphabet signs, along with Space and Delete gestures, was created using webcam recordings. MediaPipe extracts 3D coordinates of hand landmarks, which are stored and used to train the model. The architecture consists of multiple LSTM layers, followed by dense layers and a softmax classifier that predicts gesture classes. Recognized gestures are displayed through a Tkinter graphical interface, allowing letters to form words and sentences. A text-to-speech module converts recognized text into spoken language, improving accessibility for users with hearing, speech, or visual impairments.
The system operates in real time at approximately 15–20 frames per second on standard hardware. Experimental results demonstrate an overall recognition accuracy of about 95%, with training accuracy approaching 100% and loss converging close to zero after sufficient training. MediaPipe provided reliable hand tracking across different lighting conditions and backgrounds, contributing to robust performance.
The study highlights the advantages of combining MediaPipe, OpenCV, TensorFlow, and LSTM networks for dynamic gesture recognition. Although some visually similar signs were occasionally misclassified, the system achieved strong real-time performance and user-friendly interaction. Overall, the proposed ASL recognition system effectively translates sign language into text and speech, offering a practical assistive technology for enhancing communication and inclusivity for people with hearing and speech impairments.
Conclusion
This research successfully demonstrates the development of a real-time American Sign Language (ASL) recognition system using MediaPipe for hand landmark detection and an LSTM-based deep learning model for gesture classification. By capturing 21 hand landmarks and analyzing sequences of 30 frames, the system effectively identifies both static and dynamic gestures with high accuracy 95%. The integration of a simple GUI and text-to-speech support further enhances usability, making it accessible not only to hearing- or speech-impaired users but also to those with visual impairments. Overall, the system provides a smooth, responsive, and practical solution for real-time sign language communication.
Looking ahead, there are several directions to improve and expand this work. Future iterations of the system can be enhanced to recognize more complex signs, support two-handed gestures, and handle continuous sentence-level inputs. Improving model robustness under varying lighting, backgrounds, and hand orientations is also essential for real-world deployment. Additionally, training the model on a more diverse dataset will improve its adaptability across different users. Porting the system to mobile or web-based platforms and incorporating multilingual speech output can make it more versatile and accessible for broader use across different regions and user communities.
References
[1] Rokade, Y. I., & Jadav, P. M. (2017). Indian Sign Language Recognition System. International Journal of Engineering and Technology (IJET), 9(3S), 189 – 196. DOI: https://doi.org/10.21817/ijet/2017/v9i3/170903S030
[2] Kaur, J., & Krishna, C. R. (2019). An Efficient Indian Sign Language Recognition System using SIFT Descriptor. International Journal of Engineering and Advanced Technology (IJEAT), 8(6), 1456 – 1461. DOI: https://doi.org/10.35940/ijeat.F8124.088619
[3] Sruthi, C. J., & Lijiya, A. (2019). Signet: A Deep Learning based Indian Sign Language Recognition System. In Proceedings of the 2019 International Conference on Communication and Signal Processing (ICCSP), IEEE, Chennai, India. DOI: https://doi.org/10.1109/ICCSP.2019.8698006
[4] Deshpande, A., Shriwas, A., Deshmukh, V., & Kale, S. (2023) Sign language recognition system using CNN. In Proceedings of the 2023 International Conference on Intelligent and Innovative Technologies in Computing, Electrical and Electronics (IITCEE), IEEE.
[5] DOI: https://doi.org/10.1109/IITCEE57236.2023.10091051
[6] Molchanov, P., Gupta, S., Kim, K., & Kautz, J. (2016) Hand gesture recognition with 3D convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (pp. 1 – 7). DOI: https://research.nvidia.com/sites/default/files/pubs/2015-06_Hand-Gesture-Recognition/CVPRW2015-3DCNN.pdf
[7] He, S. (2019) Research of a Sign Language Translation System Based on Deep Learning. In Proceedings of the 2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM), IEEE, Dublin, Ireland. DOI: https://doi.org/10.1109/AIAM48774.2019.00083
[8] Amrutha, K., & Prabu, P. (2021) ML ML-Based Sign Language Recognition System. In Proceedings of the 2021 International Conference on Innovative Trends in Information Technology (ICITIIT), IEEE, Kottayam, India. DOI: https://doi.org/10.1109/ICITIIT51526.2021.9399594
[9] Kumar, A., Thankachan, K., & Dominic, M. M. (2016) Sign language recognition. In Proceedings of the 3rd International Conference on Recent Advances in Information Technology (RAIT), IEEE, Dhanbad, India. DOI: https://doi.org/10.1109/RAIT.2016.7507939
[10] Sruthi, C. J., & Lijiya, A. (2019) Signet: A Deep Learning based Indian Sign Language Recognition System. In Proceedings of the 2019 International Conference on Communication and Signal Processing (ICCSP), IEEE, Chennai, India, April 4–6, 2019. DOI: https://doi.org/10.1109/ICCSP.2019.8698006
[11] Kaur, J., & Krishna, C. R. (2019) An Efficient Indian Sign Language Recognition System using SIFT Descriptor. International Journal of Engineering and Advanced Technology (IJEAT), 8(6), 1456–1461. DOI: https://doi.org/10.35940/ijeat.F8124.088619
[12] Chikmurge, D., & Shriram, R. (2021) Marathi Handwritten Character Recognition Using SVM and KNN Classifier. In A. Abraham, S. Shandilya, L. Garcia-Hernandez, & M. Varela (Eds.), Hybrid Intelligent Systems (HIS 2019), Advances in Intelligent Systems and Computing (Vol. 1179). Springer, Cham. DOI: https://doi.org/10.1007/978-3-030-49336-3_32
[13] Zhang, C., Pan, X., Li, H., Gardiner, A., Sargent, I., Hare, J., & Atkinson, P. M. (2018) A hybrid MLP-CNN classifier for very fine resolution remotely sensed image classification. ISPRS Journal of Photogrammetry and Remote Sensing, 140, 133 – 144. DOI: https://doi.org/10.1016/j.isprsjprs.2017.07.014
[14] Loeding, B. L., Sarkar, S., Parashar, A., & Karshmer, A. I. (2004) Progress in Automated Computer Recognition of Sign Language. In K. Miesenberger, J. Klaus, W. L. Zagler, & D. Burger (Eds.), Computers Helping People with Special Needs. ICCHP 2004, Lecture Notes in Computer Science (Vol. 3118, pp. 1079 – 1087). Springer, Berlin. DOI: https://doi.org/10.1007/978-3-540-27817-7_159
[15] Er-Rady, A., Faizi, R., Oulad Haj Thami, R., & Housni, H. (2017) Automatic sign language recognition: A survey. In Proceedings of the 2017 International Conference on Advanced Technologies for Signal and Image Processing (ATSIP), IEEE, Fez, Morocco. DOI: https://doi.org/10.1109/ATSIP.2017.8075561
[16] Bauer, B., & Karl-Friedrich, K. (2002) Towards an Automatic Sign Language Recognition System Using Subunits. In I. Wachsmuth & T. Sowa (Eds.), Gesture and Sign Language in Human-Computer Interaction. GW 2001, Lecture Notes in Computer Science (Vol. 2298, pp. 64 – 75). Springer, Berlin. DOI: https://doi.org/10.1007/3-540-47873-6_7
[17] Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., … & Oliphant, T. E. (2020) Array programming with NumPy. Nature, 585(7825), 357 – 362. DOI: https://doi.org/10.1038/s41586-020-2649-2