Deaf and Dumb Gesture Recognition System

Authors: Vaibhav Shah, Nikhil Sharma, Prince Solanki, Prof. Hemalata Mote

DOI Link: https://doi.org/10.22214/ijraset.2022.41413

Abstract

Rising incidents of visual and hearing imparity is a matter of global concern. India itself has around 12 million visually impaired people and over 21 million people are either blind or dead or both. For the blind people, there are various solutions existing such as eye-donation, and hearing aid for the deaf but not everyone can afford it. The purpose of our project is to provide an effective method of communication between the natural people and the impaired people. According to a research article in the “Global World” on January 4,2017 with a deaf community of millions, hearing India is only just beginning to sign. So, to address this problem, we are coming forth with a model based on modern and advanced technologies like machine learning, image processing, artificial intelligence to provide a potential solution and bridge the gap of communication. The sign method is the most accepted method as a means of communication to impaired people. The model will give out the output in the form of text and voice in regional as well as English languages so it can have an effect on the vast majority of the population in rural as well as urban India. This project will definitely provide accessibility, convenience, safety to our visually impaired brothers and sisters who are looked upon by the society just because of their disability.

Introduction

I. INTRODUCTION

Generally, normal people make use of speech for communicating with others whereas verbally impaired people make use of sign language for communication purposes. Sign language is expressed in the form of gestures. Gestures is a Sign language and a movement of the fingers and hands with the specific shape made out of them. As everyone is not aware about Sign Language, the verbally impaired people face difficulty while expressing their thoughts and ideas. There are multiple sign languages such as American Sign Language (ASL), Indian Sign Language (ISL), British Sign Language (BSL), Australian Sign Language (Auslan). Our system makes use of the Indian Sign Language (ISL) as our target audience are Indians. All thanks to the development and advancements in technology, it has made possible to make a system for the visually impaired people to communicate and express their ideas. The proposed system aims to bridge the communication gap between the verbally impaired and normal people by developing a device which will convert gestures into speech. For easy accessibility, convenience this system will remove the barriers of communications such as Linguistic, Physical barriers, Language Barriers and Personal barriers. Using the technologies like Artificial Intelligence, (A.I), Machine Learning (M.L), Deep Learning (D.L), Data Science (DS). We can deliver a potential solution that can help millions of visually impaired people all around the globe by helping them to stand shoulder to shoulder with everyone.

II. LITERATURE REVIEW

The paper, Real-time sign language recognition using PCA [1] deals with the image processing by dealing with 26 gestures from Indian Sign Language using Machine Learning Algorithm. This system aims at taking input gestures in a uniform white background. The input gestures are captured with the help of a webcam. In the training database, there are 26 combinations of Indian sign which are developed using the right hand. On these captured input gestures pre- processing is done. Segmentation is carried out to segment the hand so that object and background are separated. The author aims at discussing that the sign recognition procedure includes four models such as data acquisition, pre-processing and hand segmentation, feature extraction, sign recognition, and sign to text and voice conversion. Image processing is a way by which segmentation is done. For the recognition process, different features like Eigenvalues and Eigenvectors are extracted. The algorithm used for gesture recognition is the Principal Component Analysis (PCA) and the gesture which is recognized is converted into text as well as voice format. The input gesture is then tested against all the images in the training database.

The paper, Finger Gesture and patten recognition-based device security system [2], deals with a hand gesture recognized based system to compare patterns with the database in which there are various images to trigger the unlocking of mobile devices by matching image pairs. and also, to recognize real-time gestures.

Human Machine Interface (HMI) is used to build a communication gap between humans and systems so that the user can translate the natural environment variables like voice, gestures, etc. Into digital domains.

The author has aimed to develop a system due to security reasons. Nowadays, as we know there are various security techniques like pattern unlocks, passwords and advanced features like face and voice unlock in mobile devices. At the same time, the need for advancement to introduce a method. The author aims towards building such a system that recognizes hand- drawn pattern input through the webcam of a portable device and comparing patterns through template matching. Image acquisition is done using the MATLAB function.

Segmentation or pattern matching for templates is done using two ways either template matching or neural networks. The author has performed experiments to find out results and according to that template matching has fewer errors as compared to neural networks and with this, the system has achieved an appreciable rate of accuracy. The future scope of this system can be that this system can be built in such a way that it is deployed in portable devices like laptops, smartphones.

The paper, Hand gesture to speech conversion using MATLAB [3], the methodology used in this paper is Image segmentation and feature extraction algorithm to recognize the hand gestures captured by the camera, and these hand gestures are gestured by verbally impaired people.

The author has implemented the system in such a way that it consists of three functional blocks i.e. camera, personal computer, speakers. The use of the camera is for image acquisitional video acquisition is generally subjected to many concerns such as several cameras used. The position of the camera, background condition, and lighting sensitivity. The camera is fixed in such a way that it faces the center of the chest on a person’s body, this means that camera is worn by hearing-impaired people. A personal computer is the main block where complete image processing is carried out. Finally, the recognized gesture is then converted into a speech track which is played through the speaker.

This system does not require any gloves or electromechanical devices. The proposed system has different stages like Image Segmentation, Feature Extraction, Speech Playback, Gesture Classification. This system has a maximum accuracy of 80%. The future scope will be extended to all phonemes in English. The drawback in this system is that the algorithm is too simple and for further challenging operations it should be improved.

The paper, Sign Language to Speech Conversion [4], as we know a normal person does not understand Indian Sign Language therefore, he will not get the views, thoughts expressed by impaired people. Therefore, the author aims to build a system through which impaired people can communicate and express their thoughts, views, feelings to normal people. In this sensor- based glove is built and the sensor used is flex sensors. Flex sensors are used to sense the bend of fingers and tilt of hand fist in hand gestures. Text is displayed in English alphabets and a few words. Then this text is converted into speech with the help of HMM and it is used to build Text- to-Speech synthesizer.

This involves capturing and combining orientation and hand movements, facial expressions, hand shapes, and arms or body movements, and transfer of these thoughts fluidly to the receiver using all these movements. This system involves two modules and they are, Fingerspelling (gesture recognition module) and Text-to-Speech synthesis module. In this gesture recognition system, there is a camera which captures the hand gesture. Then it goes with the flow of the system, as this captured image as an input undergoes image processing techniques.

This preprocessed image undergoes a feature extraction technique where features are extracted where the extracted features are trained using static images for which using various image recognition algorithms, corresponding images are recognized. Flex sensor gives a full degree of freedom for hand movement and it is also sensitive and more accurate. using a flex sensor- based system, a recognition rate of 99% can be achieved in real-time. An accelerometer is also used for finding hand movement and orientation. This system consumes very low power and it is also portable. It also reduces ambiguity in gestures and shows improved accuracy. The average recognition rate for all gestures is 87.5% and the recognition rate for each gesture is 80- 90%. The scope of improvement of this project is it can develop words, phrases, and simple sentences by concatenating alphabets, and also a greater number of flex sensors can be employed to recognize gestures.

The paper, Hand Gesture Recognition to Speech Conversion in Regional Language [5], the author aims to develop a system that will build a communication gap between impaired people and normal people. The author aims to take up this challenging task and develop a system which will convert hand gestures of the impaired people into text and speech. In this methodology, Digital wireless gloves are developed which are fitted with Flex sensors and accelerometer. The use of flex sensors is to sense the bend of fingers and tilt of hand fist in hand gestures and an accelerometer is used to capture movements in hands and arms. The output is in the form of text and speech. The text is displayed in English. The system involves an LCD module for displaying text and voice playback IC is used to give real-time speech output.

This 7 device provides more flexibility as this device acts as a communicator. The voice output of this is in regional language (here Marathi). An instruction given to LCD performs a predefined task like initializing it, clearing its screen, setting the cursor position, controlling the display, etc. In LCD, the data displayed is the ASCII value of character. The user after performing a hand gesture has to hold for 2 seconds to ensure proper recognition. This system is implemented using the PIC18F4520 microcontroller. The scope of this system is it can be developed for additional languages using Arduino and R programming.

The paper, Hand Talk-Implementation of a Gesture Recognizing Glove [6], the author aims at developing gesture recognizing gloves for recognizing gestures. The gloves are embedded with flex sensors. Flex sensors are used to detect bend in fingers and data is mapped to a character set by implementing a Minimum Mean Square Error Machine learning algorithm. After gesture recognition, the recognized character is transmitted to an android phone via Bluetooth, which performs text to speech conversion. our technology has advanced to another level by this there are several developments like low-power electronics and wireless devices and also devices that can design both the analog front-end and digital processing back-end and also inspired a new range of wearable micro-devices.

The author has developed this system to provide a low-cost system for an impaired person to communicate with their artificial voice with a normal person. The author has trained this glove in such a way that it can learn from gestures and convert it into speech in a wide range of different languages. In this system, the primary input is posing and orientation of the hand. The main focus is on to what extent the fingers are bent.

The process of encoding is carried out on this acquired data and it is wirelessly transmitted to a mobile device.in the mobile device, there is a software which recognizes these shapes and orientation of the hand. The author aimed at building a low-cost solution. The only constraint was the cost of flex sensors as their cost in the market is $10 per sensor. The author aimed to build own flex sensors at the cost of less than 1 rupee per sensor for this system. The scope of this system is other sensors can also be fitted to detect more complex gestures.

The paper, Hand Gesture Recognition and Voice Conversion System for Dumb People [7], it is difficult for impaired people to communicate with normal people as they don’t understand Indian Sign Language, therefore the author aims in building a system which can help in communicating them with the normal people. In emergencies, it will be very difficult to convey their message and communicate so the solution to this problem will be to convert these gestures into hearing a voice. Vision and non-vision are two major techniques for detection of hand movements and gestures. In the vision, a camera is used and in non-vision, sensors are used.

The author aims to build this system using the non-vision technique. As dumb people are mostly deaf so the message by normal people is converted into sign language and in an emergency, it is delivered to relatives and friends. In this system, the components used are flex sensor, DC converter – ADC0804, microcontrollers such as PIC, Arduino, and speaker. For detecting hand motion flex sensors are used and it is analog. An Arduino has an inbuilt analog to digital converter therefore the input is directly given to Arduino and this converts analog sensor input into digital input in Arduino. In raspberry bi, ADC IC is used externally for example ADC0804. This device is portable and has less weight and this system is beneficial to both normal and mute people. physical characteristics.

III. COMPONENTS OF PROPOSED SYSTEM

A. System Requirements

Hardware

a. CPU: 64 Bit Intel or AMD Processor

b. Speed:1.1 GHZ

c. RAM: 8 GB(min)

d. Operating System: Windows 7 and above

e. HD Webcam

2. Software

a. Anaconda Distribution

b. Python 3.x.x (Preferably 3.6.8)

c. Integrated Development Environments Such as Visual Studio Code, PyCharm.

d. Notebooks Such as Jupyter or Google Collab.

IV. BLOCK DIAGRAM OF PROPOSED SYSTEM

V. METHODOLOGY

CNN (Convolution Neural Network) feature Comparison: A very known Deep Learning Algorithm “Convolution Neural Network” it is used to extract very high-level data representations of the image content. Rather than pre-processing the data to derive features like textures and shape, CNN takes the image’s raw pixel data as input and learns how to extract these features and ultimately conclude what object they represent. Within Indian Sign Language (ISL), every letter has some symbol. For our machine to understand we take a large amount of images. CNN divides the recorded video in multiple images, these images will be gathered and will assign importance to various aspects/objects in the image and be able to differentiate one from other. Hereafter, the arrangement of images will be done according to the most informative image and they will be organized in a proper sequence and the software will be trained and written - using CNN. These images now are further extracted into frames. These extracted frames will be then compared with the trained model. If the match is found, then the corresponding output will be displayed. Whereas, if the match is not found, then the gesture will not be identified.
OpenCV (Python Library) For Invoking Camera: OpenCV (Open Source Computer Vision Library) is an open source PC vision and AI programming library. The library has quite approximately 2500 optimized algorithms. We utilize these algorithms to distinguish and perceive faces, recognize objects, arrange human activities in recordings, track camera developments, track moving items, remove 3D models of articles, produce 3D point clouds from stereo cameras, stitch images together to provide a high resolution image of a complete scene, find similar images from a picture database, remove red eyes from images taken using flash, follow eye movements, recognize scenery and a lot more. In simple language it's library is used for Image Processing. It mainly does all the operations associated with Images. The Python library “OpenCV” can be invoked, so we've to perform the gesture. As soon as the gesture is performed it'll acquire the hand shape, motion and gesture.
TensorFlow (An open source Deep Learning Library by Google): In Machine Learning, and Deep learning all the algorithms are heavily dependent on Mathematics In general, mathematical techniques like Matrices, Statistics and Distributions, Probability and Linear Algebra are the most dominant. When it comes to Machine Learning and Deep Learning Algorithms like ANN (Artificial Neural Network), CNN (Convolutional Neural Network), RNN (Recurrent Neural Network) and so on are widely used to deploy a potential solution. To eliminate mathematical computation and to simplify the complexity, Google has developed a library known as TensorFlow. Tensor in Mathematical Terms means a Matrix, vector, array of components. As we heavily use matrices and vectors in implementation of the M.L/ D.L model, hence the name TensorFlow. TensorFlow is a high-level library that makes use of resources like the GPU (Graphical Processing Unit) for faster calculation compared to the CPU (Central Processing Unit). The alternative to TensorFlow is PyTorch developed by Facebook.

VI. FUTURE ENHANCEMENT

Use and attention to PC interface through ISL translation. Instruction and preparation will be simpler through ISL translation/representation for hard of hearing and unable to speak individuals. Serving the human kind by utilization of innovation. Social angles like humankind can increment in a singular brain by including genuinely weakened individuals in our everyday life. Visually impaired individuals can likewise utilize a similar framework by expanding it for voice interface .

Conclusion

Through this project we are trying to build flexible system for the physically impaired people that will ease their life.An attempt to create a sign language to text conversion wireless lightweight system,through the use of information gestured by physically impaired person which can be effectively conveyed to a normal person.The system will try to convert the sign language of the physically impaired person into text that can interpreted by their own genre and also to the rest of the world.

References

[1] Sawant, S. N., & Kumbhar, M. S. (2014, May). Real time sign language recognition using pca. In 2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies (pp. 1412-1415). IEEE. [2] Itkarkar, R. R., & Nandi, A. V. (2013, July). Hand gesture to speech conversion using Matlab. In 2013 Fourth International Conference on Computing, Communications and Networking Technologies (ICCCNT) (pp. 1-4). IEEE.. [3] Padmanabhan, V., & Sornalatha, M. (2014). Hand gesture recognition and voice conversion systemfor dumb people. International Journal of Scientific & Engineering Research, 5(5), 427. [4] Preetham, C., Ramakrishnan, G., Kumar, S., Tamse, A., & Krishnapura, N. (2013, April). Hand talk- implementation of a gesture recognizing glove. In 2013 Texas Instruments India Educators\' Conference (pp. 328-331). IEEE. [5] P. Vijayalakshmi and M.Aarthi,”Sign language to speech conversion,”2016 International Conference on Recent Trends in Information Technology(ICRTT),Chennai,2016,pp-1.6 [6] Khare, S. (2015, March). Finger gesture and pattern recognition based device security system. In 2015 International Conference on Signal Processing and Communication (ICSC) (pp. 443- 447). IEEE. [7] Jadhav, B. D., Munot, N., Hambarde, M., & Ashtikar, J. (2015). Hand Gesture Recognition to Speech Conversion in Regional Language. IJCSN International Journal of Computer Science and Network, 4(1), 161-166.

Copyright

Copyright © 2022 Vaibhav Shah, Nikhil Sharma, Prince Solanki, Prof. Hemalata Mote. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET41413

Publish Date : 2022-04-12

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here