Sign language is very important communication method for people who are hearing or speech disabled. However, many people are not familiar with sign language, which creates a communication gap between them and normal users. The proposed system aims to reduce this gap by detecting hand gestures and converting them into readable text using Artificial Intelligence and Machine Learning techniques. In this project, hand gestures are captured using the built-in laptop webcam. A custom dataset consisting of 10–15 unique gestures is created, where each gesture represents a specific word or message. The collected gesture data is used to train a Recurrent Neural Network (RNN) model for gesture recognition. During real-time operation, the system captures gesture images from the webcam, processes them using image processing techniques, and guess the gesture using the trained model. Once the gesture is recognized, it is automatically converted into associated text and displayed on the screen. The system provides a simple and efficient way to translate sign language gestures into text without requiring any additional hardware devices.
Introduction
The proposed sign language detection system addresses communication barriers faced by individuals with hearing or speech disabilities. Traditional sign language systems often assign gestures to individual alphabets, requiring multiple gestures to form sentences, which is time-consuming and inefficient for real-time communication.
The proposed system leverages Artificial Intelligence and Machine Learning to recognize hand gestures using a standard laptop webcam and directly convert them into commonly used words or sentences. Key components include:
Input module: Real-time video capture via webcam.
Image processing: Hand detection and feature extraction using libraries like OpenCV and NumPy.
Text conversion and display: Recognized gestures are converted into readable text and optionally speech output.
Workflow: Captured frames → hand keypoint detection → preprocessing and feature extraction → gesture recognition via RNN → selection of highest-confidence prediction → text/speech output.
Advantages:
Reduces the communication gap between disabled individuals and others.
Requires only a standard webcam, with no extra hardware.
Supports training additional gestures for scalability.
Maps gestures directly to sentences for faster communication.
Limitations:
Recognition is limited to trained gestures.
Accuracy may vary with lighting and camera quality.
Adding new gestures requires retraining the model.
Overall, the system offers a flexible, real-time, and accessible solution for improving interaction between hearing-impaired users and the general public.
Conclusion
The Sign Language Detection and Text and Word Conversion system was successfully developed using Python and AI/ML techniques. The system captures hand gestures through a webcam and converts them into readable text or words.In this project, a limited number of sign language gestures were trained in the model. Instead of training separate gestures for each alphabet, specific gestures were assigned to commonly used sentences. This approach was used because training gestures for every alphabet would require a large dataset and more time for model training. The trained model was able to recognize the defined gestures and display the corresponding text output accurately.
This system demonstrates how artificial intelligence and image processing can help reduce the communication gap between deaf or mute individuals and others.In the future, the system can be improved by including more gestures and training the model for complete sign language alphabets to make the system more advanced and accurate. In this system, a limited number of sign language gestures were trained and assigned to commonly used sentences. Training separate gestures for each alphabet was not implemented because it requires a large dataset and more training time.The system demonstrates how artificial intelligence can be used to assist communication for deaf and mute individuals. It provides a simple way to translate hand gestures into text so that others can easily understand the message.
The project also shows that machine learning and computer vision technologies can be effectively applied in real-world communication systems. In the future, the system can be expanded by adding more sign language gestures, improving model accuracy, and converting the text output into voice for better communication.
References
[1] OpenCV Contributors, \"OpenCV-Python Tutorials: Video Capture and Processing,\" OpenCV Documentation,
https://docs.opencv.org/4.x/dd/d43/tutorial_py_video_display.html, Accessed: March 2025
[2] Scikit-learn Developers, \"Scikit-learn 1.3.0: Machine Learning Utilities for Evaluation,\" https://scikit-learn.org/1.3/, Accessed: March 2026.
[3] Matplotlib Development Team, \"Matplotlib 3.7.2: Visualization of Training Metrics and Results,\" https://matplotlib.org/3.7.2/, Accessed: March 2026.
[4] Google AI Edge, \"MediaPipe Solutions: Holistic Landmark Detection (v0.10.7),\" MediaPipe Documentation,
https://ai.google.dev/edge/mediapipe/solutions/vision/holistic, Accessed: March 2026.
[5] Pandas Development Team, \"Pandas 2.0.3 Documentation: Data Handling and Preprocessing,\" https://pandas.pydata.org/docs/, Accessed: March 2026.
[6] Abdulrahman-tech, \"ASL Recognition: Real-time Sign Language using PyTorch and MediaPipe,\" GitHub Repository, 2024–2025. [Online]. Available: https://github.com/Abdulrahman-tech/asl-recognition
[7] Abdulrahman-tech, \"ASL Recognition: Real-time Sign Language using PyTorch and MediaPipe,\" GitHub Repository, 2024–2025. [Online]. Available: https://github.com/Abdulrahman-tech/asl-recognition
[8] N. M. Bhat, \"pyttsx3: Offline Text-to-Speech Conversion Library (v2.90),\" PyPI and Documentation, https://pypi.org/project/pyttsx3/, Accessed: March 2026.
[9] Google AI Edge, \"MediaPipe Solutions: Holistic Landmark Detection (v0.10.7),\" MediaPipe Documentation,
https://ai.google.dev/edge/mediapipe/solutions/vision/holistic, Accessed: March 2026.