Sign language serves as a vital means of communication for individuals who are deaf or speech-impaired. Despite its growing use, a communication barrier still exists between signers and non-signers. Recent advances in computer vision and deep learning have enabled the development of gesture recognition systems that can bridge this gap. In this research, we propose a real-time sign language recognition system that uses transfer learning with MobileNetV2 and a custom classification head. The system captures American Sign Language (ASL) gestures through a webcam and converts them into corresponding text in real time. The model is trained on a preprocessed ASL dataset and achieves high accuracy using efficient neural architectures, data augmentation techniques, and optimized training workflows. The system includes gamified learning levels—ranging from easy to complex—that provide feedback, scoring, and progress tracking to promote consistent user engagement and structured skill development.
Introduction
People with speech and hearing impairments use sign language to communicate, but many non-signers struggle to understand it, creating a need for trained interpreters, especially in medical, legal, and educational contexts. While video remote interpretation services exist, they have limitations. To improve accessibility, the text proposes a custom Convolutional Neural Network (CNN) model to automatically recognize American Sign Language (ASL) gestures from video frames.
The system captures real-time hand images using a webcam, extracts key hand landmarks via MediaPipe, preprocesses these images, and uses a MobileNetV2-based CNN architecture to classify gestures into 26 English alphabet letters. The model is trained on a labeled ASL dataset with data augmentation to improve accuracy and generalization.
The literature review discusses various deep learning and computer vision methods for sign language recognition, including CNNs, RNNs, transfer learning, multimodal approaches, and wearable sensor-based systems across multiple sign languages.
The proposed system uses TensorFlow/Keras for training, OpenCV for video capture, and Django for backend application development. Future directions include real-time subtitle-to-sign conversion, mobile app integration, multi-language support, educational platform embedding, and adding facial expression and emotion recognition to enhance sign interpretation.
Conclusion
The proposed sign language recognition system leverages MediaPipe Hands for precise hand tracking and a MobileNet-based CNN model for accurate real-time gesture recognition by processing video input frame by frame, within an interactive web platform.Designed as a gamified learning environment, the system motivates users to practice and improve their sign language skills effectively.
The platform incorporates SQLite to manage user authentication and store performance scores securely. By fostering inclusive communication and breaking down language barriers, this system holds significant social relevance in promoting accessibility and understanding across diverse communities.
Overall, it offers an engaging and accessible solution to enhance communication and learning for a wide range of users.
References
[1] M. Hafiz, A. Joshi, and A. Salinas, “Development of a Sign Language Recognition System Using Machine Learning,” Team E5 Final Report, pp. 1–60.
[2] H. Orovwode, O. Ibukun, and J. A. Abubakar, “A machine learning-driven web application for sign language learning,” Front. Artif. Intell., vol. 7, no. 1297347, pp. 1–10, Jun. 2024, doi: 10.3389/frai.2024.1297347.
[3] J. J. Raval and R. Gajjar, “Real-time Sign Language Recognition using Computer Vision,” in Proc. 2021 3rd Int. Conf. Signal Process. Commun. (ICSPC), Coimbatore, India, 2021, pp. 542–546, doi: 10.1109/ICSPC51351.2021.9451709.
[4] R. G. Rajan and M. J. Leo, “American Sign Language Alphabets Recognition using Hand Crafted and Deep Learning Features,” in Proc. 2020 Int. Conf. Inventive Comput. Technol. (ICICT), 2020.
[5] A. Wadhawan and P. Kumar, “Deep learning-based sign language recognition system for static signs,” Neural Comput. Appl., vol. 32, no. 12, pp. 7957–7968, 2020.
[6] S. Dhivyasri, et al., “An Efficient Approach for Interpretation of Indian Sign Language using Machine Learning,” in Proc. 2021 3rd Int. Conf. Signal Process. Commun. (ICPSC), 2021.
[7] L. K. S. Tolentino, et al., “Static sign language recognition using deep learning,” Int. J. Mach. Learn. Comput., vol. 9, no. 6, pp. 821–827, 2019.
[8] N. M. Adaloglou, et al., “A Comprehensive Study on Deep Learning-based Methods for Sign Language Recognition,” IEEE Trans. Multimedia, 2021.
[9] P. T. Krishnan and P. Balasubramanian, “Detection of alphabets for machine translation of sign language using deep neural net,” in Proc. 2019 Int. Conf. Data Sci. Commun. (IconDSC), 2019.
[10] T. Pariwat and P. Seresangtakul, “Multi-Stroke Thai Finger-Spelling Sign Language Recognition System with Deep Learning,” Symmetry, vol. 13, no. 2, p. 262, 2021.