Automatic sign language recognition is a recent field of research that has gained significance because it has the power to enhance communication between deaf people or hard of hearing and the rest of the population. In this paper, a convolutional neural network (CNN)-based model is described that can be used to identify American Sign Language (ASL) alphabets in real-time and in a computer vision system. The given model is trained and tested on the Sign Language MNIST dataset [1], where 24 fixed ASL letters (A -Y, without J or Z) are recognized. The test precision of the system is 96.49 percent, with 11 of 24 classes getting 100 percent accuracy and recall. The macro and weighted F1-score is 0.96 and 0.97 respectively, which shows that there is uniformity in performance across categories. The model is computationally minimal and can do real time inference and therefore is applicable to real life assistive applications.
Introduction
The text discusses the development of an automatic sign language recognition system to bridge communication gaps between sign language users and non-users. Using advancements in deep learning—especially convolutional neural networks (CNNs)—the study focuses on recognizing 24 static American Sign Language (ASL) alphabet letters (excluding dynamic letters J and Z).
The proposed model is a lightweight CNN designed for real-time use on standard hardware, achieving high accuracy (around 96.49%) while maintaining computational efficiency. It uses the Sign Language MNIST dataset, with preprocessing steps such as normalization, reshaping, and class filtering to improve performance.
Compared to earlier approaches (including sensor-based systems and complex deep learning models like transfer learning and RNNs), this model strikes a balance between accuracy and efficiency. It automatically extracts features through convolutional layers, avoiding manual feature engineering, and handles classification using a simple yet effective architecture.
The system demonstrates strong performance, with high precision and recall across most classes, though some visually similar gestures remain challenging. Overall, the model proves suitable for real-time applications in assistive technology and human-computer interaction, offering an efficient and practical solution for static sign language recognition.
Conclusion
This study was able to design and confirm a powerful deep learning model on 1 real- time American Sign language recognition and attained an impressive 96.49% precision on the Sign Language MNIST database. The suggested convolutional Neural Network structure was shown to perform better than the detection of 24 ASL letters, and 11 of 24 classes showed 100 percent precision and recall. The combination of batch normalization, major dropout regularization, and adaptive learning rate optimization is what allowed the model to allow consistent convergence in training and successfully avoids overfitting. The system provides real-time processing, which is particularly advantageous with its efficiency in computation, and thus it is especially applicable to real-world application in assistive technologies and human-computer interaction systems. Although Class 16 (63% precision) causes some minor difficulties, the overall performance is much better than current means, and macro and weighted F1-scores equal to 0.96 and 0.97 respectively prove that there is a balanced performance among all the gesture categories.
The contribution of this work to the technical sphere of gesture recognition and the practical area of accessibility technology is great. The solution suggested is a successful solution in bridging the communication gaps of the deaf and hard-of- hearing community and setting a new level of accuracy and efficiency of the sign language recognition systems. Further work will center on enlarging the vocabulary to cover dynamic gestures and increasing the robustness in different environmental conditions to further increase the practical use that the system can have in the real-life setting.
References
[1] Z. Chassagneux, “Sign Language MNIST,” Kaggle, 2017. [Online]. Available: https://www.kaggle.com/datasets/datamunge/sign-language-mnist
[2] S. Kumar, A. Sharma, and R. Gupta, “Real-Time American Sign Language Recognition Using Deep Learning,” IEEE Access, vol. 10, pp. 135–144, 2022, doi: 10.1109/ACCESS.2022.3198754.
[3] TensorFlow Development Team, “TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems,” 2023. [Online]. Available: https://www.tensorflow.org
[4] R. Gupta and R. Singh, “A CNN Approach to Sign Language Recognition,” Journal of Computer Vision and Pattern Recognition, vol. 8, no. 3, pp. 112–125, 2021.
[5] F. Chollet et al., “Keras: The Python Deep Learning API,” 2023. [Online]. Available: https://keras.io
[6] A. Sharma and S. Kumar, “Hand Gesture Recognition Using Convolutional Neural Networks,” International Journal of Computer Applications, vol. 174, no. 12, pp. 15–21, 2021.
[7] S. Shahad Thamear Abd Al-Latief, “Deep Learning for Sign Language Recognition: A Comparative Review,” Journal of Smart Internet of Things, vol. 2024, no. 1, pp. 78–116, Jun. 2024. A broad survey of over 140 works on SLR, covering datasets, architectures, and challenges.
[8] A. Gangal, C. Kuppahally, and M. Ravindran, “Sign Language Recognition with Convolutional Neural Networks,” tech. rep., Stanford CS231n, 2024. Includes ablation on hyperparameters and data augmentation for static ASL recognition.
[9] R. Rastgoo, “Sign Language Recognition: A Deep Survey,” Expert Systems with Applications, vol. 182, 2021, pp. 115123. A widely cited (610+) survey on vision-based SLR using deep learning.
[10] P. Jayanthi, R. K. Ponsy, K. Swetha, and S. A. Subash, “Real Time Static and Dynamic Sign Language Recognition using Deep Learning,” Journal of Scientific & Industrial Research, vol. 81, no. 11, pp. 1186–1194, 2022. Reports static and video-based recognition results, including CNN variants with batch normalization.
[11] “Real-Time Sign Language Recognition Using Deep Learning and Computer Vision,” Research paper, Mar. 2025. Focuses on CNN + computer vision for real-time ASL recognition and addresses segmentation and lighting challenges.
[12] D. Key, Real-Time American Sign Language Recognition Using 3D CNNs and LSTM: Architecture, Training, and Deployment, preprint, Dec. 2025. Hybrid 3D CNN + LSTM for spatial-temporal ASL recognition on large benchmarks.
[13] K. Hirooka et al., Stack Transformer Based Spatial-Temporal Attention Model for Dynamic Multi-Culture Sign Language Recognition, preprint, Mar. 2025. Uses transformer attention mechanisms to capture motion and spatial patterns across multiple sign languages.
[14] Q. Zhu et al., Continuous Sign Language Recognition Based on Motor Attention Mechanism and Frame-Level Self-Distillation, preprint, Feb. 2024. Proposes an attention-based continuous SLR model with improved dynamic representation.