Face Mask Detection Using Machine Learning

Authors: Kavita Saxena, Rishabh Jain, Rishabh , Rohit Kumar

DOI Link: https://doi.org/10.22214/ijraset.2021.39262

Abstract

COVID-19 epidemic has affected our daily life disturbing the world trade and transport. Wearing a face mask has become a new necessity for safety. In the near future, many institutions will ask the customers to wear masks to avail of their services. Therefore, face mask detection has become a necessity to help society. This paper presents a simplified approach to achieve this purpose using some packages like TensorFlow, Keras, OpenCV and Scikit-Learn. This method detects the face from the image in frame and then identifies if it has worn a mask or not. As in a surveillance task, it can also detect a face along with a mask in movement through image processing. The method attains accuracy up to 93% and 91.2% respectively on two datasets. We explore optimized values of parameters using the Sequential CNN (Convolutional Neural Network) model to detect the presence of masks correctly.

Introduction

I. INTRODUCTION

The corona virus COVID-19 pandemic is causing a global health crisis so the effective protection methods is wearing a face mask in public areas according to the World Health Organization (WHO). The COVID-19 pandemic forced governments across the world to impose lockdowns to prevent virus transmissions. People are forced by laws to wear face masks in public in many countries. To cure certain respiratory illness, including COVID-19, wearing a clinical mask is compulsory. The public should be aware of whether to put on the mask for source control or aversion of COVID-19. Potential points of interest of the utilization of masks lie in reducing vulnerability of risk from a noxious individual during the “pre-symptomatic” period and stigmatization of discrete persons putting on masks to restraint the spread of virus. WHO stresses on prioritizing medical masks and respirators for health care assistants? Therefore, face mask detection has become a crucial task in present global society. The process of monitoring large groups of people is becoming more difficult in public areas. So, we will create a automation process for detecting the faces. Here we introduce a facemask detection model that is based on computer vision and deep learning.

Face mask detection involves in detecting the location of the face and then determining whether it has a mask on it or not. The issue is proximately cognate to general object detection to detect the classes of objects. Face identification categorically deals with distinguishing a specific group of entities i.e., Face. The proposed model can be integrated with Surveillance Cameras to impede the COVID-19 transmission by allowing the detection of people who are wearing masks not wearing face masks. The model is integration between deep learning and classical machine learning techniques with Open cv, Tensor flow and Keras. We will achieve the highest accuracy and consume the least time in the process of training and detection.

LITERATURE SURVEY

An easy way to comply with IJRASET paper formatting requirements is to use this document as a template and simply type your text into it.

A. Face Mask Detector

In this system face mask detector can be deployed in many areas like markets, airports, schools and other heavy traffic places to monitor the public and to avoid the spread of the disease by checking who is following basic rules and who is not. It did not allow the access of webcam which posed a hurdle in testing images and video stream. We have modelled a facemask detector using Deep learning. We are processed a system computationally efficient using MobileNetV2 which makes it easier to Extract the data sets. We use CNN architecture for better performance. We can fix it in any kind of cameras

B. Face Detection Techniques

Artificial Human beings have not tremendous ability to identify different faces than machines, so automatic face detection system plays an important role in face recognition, head pose estimation etc. It has some problems like face occlusion, and on uniform illumination. We use Neural Network to detect face in the Live video stream. Tensor flow is also used in this system. We are using mob net CNN Architecture model in our proposed system. We will overcome all these problems in this paper.

C. Multi-Stage CNN Architecture for Face Mask Detection

This system consists of a dual-stage (CNN)architecture capable of detecting masked and unmasked faces and can be integrated with pre-installed cameras. This will help in tracking safety violations, promote the use of face masks and ensure a safe working environment. Datasets were collected from public domain along with some data scraped from the internet. We use only pretrained datasets for detection. We can use any cameras to detect faces. It will be very useful for society and for peoples to prevent them from virus transmission. Here we use live video detection using open cv (python library).

D. Real Time Face Mask Recognition With Alarm System Using Deep Learning

This process gives a precise and speedily results for facemask detection. This system uses the architectural features of VGG-16 as the foundation network for face recognition. Deep learning techniques are applied to construct a classifier that will collect image of a person wearing a face mask and no masks. Our proposed study is using the architectural features of CNN as the foundation network for face detection. It shows accuracy in detecting person wearing a face mask and not wearing a face mask. This study presence a useful tool in fighting the spread of covid 19 virus.

III. TECHNOLOGY STACK

A. Deep Learning

Deep Learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks. The main feature of deep learning that helps it to do so is neural networks. Deep learning is one of the most novel ways to improve face recognition technology. The idea is to extract face embeddings from images with faces. Such facial embeddings will be unique for different faces. And training of a deep neural network is the most optimal way to perform this task.

B. Library Used

TensorFlow: Tensor Flow, an interface for expressing machine learning algorithms, is utilized for implementing ML systems into fabrication over a bunch of areas of computer science, including sentiment analysis, voice recognition, geographic information extraction, computer vision, text summarization, information retrieval, computational drug discovery and flaw detection to pursue research. In the proposed model, the whole Sequential CNN architecture (consists of several layers) uses TensorFlow at backend. It is also used to reshape the data (image) in the data processing.
Keras: Keras gives fundamental reflections and building units for creation and transportation of ML arrangements with high iteration velocity. It takes full advantage of the scalability and cross-platform capabilities of TensorFlow. The core data structures of Keras are layers and models . All the layers used in the CNN model are implemented using Keras. Along with the conversion of the class vector to the binary class matrix in data processing, it helps to compile the overall model.
OpenCV: OpenCV Samples from Dataset 1 including faces without masks and with masks. Dataset 2 from Kaggle consists of 853 images and its countenances are clarified either with a mask or without a mask. Some face collections are head turn, tilt and slant with multiple faces in the frame and different types of masks having different colours as well. source computer vision and ML software library, is utilized to differentiate and recognize faces, recognize objects, group movements in recordings, trace progressive modules, follow eye gesture, track camera actions, expel red eyes from pictures taken utilizing flash, find comparative pictures from an image database, perceive landscape and set up markers to overlay it with increased reality and so forth. The proposed method makes use of these features of OpenCV in resizing and color conversion of data images.

IV. ARCHITECTURE

A. System Design

The major requirement for implementing this project using python programming language along with Deep learning, Machine learning, Computer vision and also with python libraries. The architecture consists of Mobile Net as the backbone, it can be used for high and low computation scenarios.

We are using CNN Algorithm in our proposed system.

Implementation:

We have four modules

Datasets Collecting: We collect no of data sets with face mask and without masks. we can get high accuracy depends on collecting the number of images.
Datasets Extracting: We can extract the features using mobile net v2 of mask and no mask sets.
Models Training: We will train the model using CNN, open cv, keras (python library).
Facemask Detection: We can detect Pre-processing image and also detect via live video. If people wear mask, it will permit them, if not then it will give the buzzer to wear mask to prevent them from virus transmission.

V. THE PROPOSED METHOD

The proposed method consists of a cascade classifier and a pre-trained CNN which contains two 2D convolution layers connected to layers of dense neurons. The algorithm for face mask detection is as follows:

A. Dataset

Two datasets have been used for experimenting the current method. Dataset 1 consists of 1376 images in which 690 images with people wearing face masks and the rest 686 images with people who do not wear face masks. It mostly contains front face pose with single face in the frame and with same type of mask having white colour only.

Dataset 2 from Kaggle consists of 853 images and its countenances are clarified either with a mask or without a mask. In fig. 2 some face collections are head turn, tilt and slant with multiple faces in the frame and different types of masks having different colours as well.

Data Processing: Data Processing Data pre-processing involves conversion of data from a given format to much more user friendly, desired and meaningful format. It can be in any form like tables, images, videos, graphs, etc. This organized information fit in with an information model or composition and captures relationship between different entities. The proposed method deals with image and video data using NumPy and OpenCV.
Data Visualization: Data visualization is the process of transforming abstract data to meaningful representations using knowledge communication and insight discovery through encodings. It is helpful to study a particular pattern in the dataset . The total number of images in the dataset is visualized in both categories – ‘with mask’ and ‘without mask’. The statement categories (data path) the list of directories in the specified data path. The variable categories now looks like: [‘with mask’, ‘without mask’] Then to find the number of labels, we need to distinguish those categories using labels. It sets the labels as: [0, 1] Now, each category is mapped to its respective label using label dict=dict(zip(categories, labels)) which at first returns an iterator of tuples in the form of zip object where the items in each passed iterator is paired together consequently. The mapped variable label dict looks like: {‘with mask’: 0, ‘without mask’: 1}.
Conversion of RGB Image to Gray Image: Modern descriptor-based image recognition systems regularly work on grayscale images, without elaborating the method used to convert from colour-to-grayscale. This is because the colour to-grayscale method is of little consequence when using robust descriptors. Introducing nonessential information could increase the size of training data required to achieve good performance. As grayscale rationalizes the algorithm and diminishes the computational requisites, it is utilized for extracting descriptors instead of working on colour images instantaneously.

We use the function cv2.cvtColor(input image, flag) for changing the colour space. Here flag determines the type of conversion. In this case, the flag cv2.COLOR BGR2GRAY is used for gray conversion.

4. Image Reshaping: The input during relegation of an image is a three-dimensional tensor, where each channel has a prominent unique pixel. All the images must have identically tantamount size corresponding to 3D feature tensor. However, neither images are customarily coextensive nor their corresponding feature tensors . Most CNNs can only accept fine-tuned images. This engenders several problems throughout data collection and implementation of model. However, reconfiguring the input images before augmenting them into the network can help to surmount this constraint. The images are normalized to converge the pixel range between 0 and 1. Then they are converted to 4 dimensional arrays using data=np.reshape(data,(data.shape[0], img size,img size,1)) where 1 indicates the Grayscale image. As, the final layer of the neural network has 2 outputs – with mask and without mask i.e., it has categorical representation, the data is converted to categorical labels.

VI. TRAINING OF MODEL

Now we will train the model on our processed dataset in it through:

A. Building the Model using CNN Architecture

CNN has become ascendant in miscellaneous computer vision tasks. The current method makes use of Sequential CNN. The First Convolution layer is followed by Rectified Linear Unit (ReLU) and MaxPooling layers. The Convolution layer learns from 200 filters. Kernel size is set to 3 x 3 which specifies the height and width of the 2D convolution window. As the model should be aware of the shape of the input expected, the first layer in the model needs to be provided with information about input shape. Following layers can perform instinctive shape reckoning. In this case, input shape is specified as data. shape [1:] which returns the dimensions of the data array from index 1. Default padding is “valid” where the spatial dimensions are sanctioned to truncate and the input volume is non-zero padded. The activation parameter to the Conv2D class is set as “relu”. It represents an approximately linear function that possesses all the assets of linear models that can easily be optimized with gradient-descent methods. Considering the performance and generalization in deep learning, it is better compared to other activation functions. Max Pooling is used to reduce the spatial dimensions of the output volume. Pool size is set to 3 x 3 and the resulting output has a shape (number of rows or columns) of: shape of output = (input shape - pool size + 1) / strides), where strides have default value (1,1). The second Convolution layer has 100 filters and Kernel size is set to 3 x 3. It is followed by ReLu and MaxPooling layers. To insert the data into CNN, the long vector of input is passed through a Flatten layer which transforms matrix of features into a vector that can be fed into a fully connected neural network classifier.

B. Picture

To reduce overfitting a Dropout layer with a 50% chance of setting inputs to zero is added to the model. Then a Dense layer of 64 neurons with a ReLu activation function is added. The final layer (Dense) with two outputs for two categories uses the SoftMax activation function.

Splitting the Data and Training the CNN Model: After setting the blueprint to analyse the data, the model needs to be trained using a specific dataset and then to be tested against a different dataset. A proper model and optimized train test split help to produce accurate results while making a prediction. The test size is set to 0.1 i.e., 90% data of the dataset undergoes training and the rest 10% goes for testing purposes. The validation loss is monitored using Model Checkpoint. Next, the images in the training set and the test set are fitted to the Sequential model. Here, 20% of the training data is used as validation data. The model is trained for 20 epochs (iterations) which maintains a trade-off between accuracy and chances of overfitting.

VII. RESULT AND ANALYSYS

The model is trained, validated and tested upon two datasets. Corresponding to dataset 1, the method attains accuracy up to 93%. It depicts how this optimized accuracy mitigates the cost of error. Dataset 2 is more versatile than dataset 1 as it has multiple faces in the frame and different types of masks having different colours as well. Therefore, the model attains an accuracy of 91.2% on dataset 2 as shown in depicts the contrast between training and validation loss corresponding to dataset 2. One of the main reasons behind achieving this accuracy lies in MaxPooling. It provides rudimentary translation invariance to the internal representation along with the reduction in the number of parameters the model has to learn. This sample-based discretization process down-samples the input representation consisting of image, by reducing its dimensionality.

Number of neurons has the optimized value of 64 which is not too high. A much higher number of neurons and filters can lead to worse performance. The optimized filter values and pool size help to filter out the main portion (face) of the image to detect the existence of mask correctly without causing over-fitting

Conclusion

The aim of this paper is the development of face mask detection we can detect if the person is wearing a face mask or not. To allow their entry would be of great help to the society. The accuracy of the model will be achieved and the optimization of the model is a continuous process and so we are building a highly accurate solution. We can prevent peoples from Virus Transmission through this System.

References

[1] A.Kumar,A.Kaur,M.Kumar,”Face detection techniques: A review, ”Artificial intelligence review,volume.52 . [2] Guangchengwang, Yumiko “Masked face recognition data sets and application” National natural science foundation of china 2020 [3] Raza Ali, Saniya Adeel, Akhyar Ahmed Face Mask Detector July 2020 [4] Amit Chavda, jasonDsouza, SumeetBadgujar Multi-Stage CNN Architecture for Face Mask Detection September 2020 [5] Amrit Kumar, Bhadani, Anurag Sinha A Facemask detector using machine learning and image processing techniques November 2020 Engineering science

Copyright

Copyright © 2022 Kavita Saxena, Rishabh Jain, Rishabh , Rohit Kumar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET39262

Publish Date : 2021-12-04

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here