Driver Drowsiness Detection using Haar Cascade

Authors: V. Andrew Spourgeon , S. Mrudula , S. Anil , V. Pravallika, Dr. G. A. V. Rama Chandra Rao

DOI Link: https://doi.org/10.22214/ijraset.2022.43815

Abstract

The drowsiness of a person driving a vehicle is the primary cause of accidents all over the world. Due to lack of sleep and tiredness, fatigue and drowsiness are common among many drivers, which often leads to road accidents. Alerting the driver ahead of time is the best way to avoid road accidents caused by drowsiness. In this work, two ways are used to detect the drowsiness of a person effectively. First Driver face is captured and eye retina detection and facial feature extraction are done. We propose new detection methods using deep learning techniques. To estimate the driver’s state, we use Facial and Eye regions for detecting drowsiness.

Introduction

I. INTRODUCTION

Our current statistics reveal that just in 2015 in India alone, 148,707 people died due to car related accidents. Of these, at least 21 percent were caused due to fatigue causing drivers to make mistakes. This can be a relatively smaller number still, as among the multiple causes that can lead to an accident, the involvement of fatigue as a cause is generally underestimated. Fatigue combined with bad infrastructure in developing countries like India is a recipe for disaster. Fatigue, in general, is very difficult to measure or observe unlike alcohol and drugs, which have clear key indicators and tests that are available easily. Probably, the best solutions to this problem are awareness about fatigue-related accidents and promoting drivers to admit fatigue when needed. When there is an increased need for a job, the wages associated with it increases leading to more and more people adopting it. Such is the case for driving transport vehicles at night. Money motivates drivers to make and refers to feeling sleepy or tired, or being unable to keep your eyes open. The development of technology allows us to introduce more advanced solutions in standard of living. As Per the info provided by NHTSA each year about 100,000 crashes get reported involving drowsy driving. The exact figure would be far more. Facial expressions can offer deep insights into many physiological conditions of the body. There are innumerable number of algorithms and techniques available for face detection which is the fundamental commencement within the process. Drowsiness in humans is characterized by some very specific movements and facial expressions- e.g.- the eyes begin to shut. To encounter this worldwide problem, an answer is tracking eyes to detect drowsiness and classify a driver drowsy. For real time application of the model, the input video is acquired by mounting a camera on the dashboard of the car and capturing the driver’s face. The Dlib model is trained to spot 68 facial landmarks, from which the drowsiness features are extracted, and the driver is alerted if drowsiness is detected. A lot of research is done in the field of driving safety to reduce the number of accidents. Following work was referred to for the study of the proposed system.

A. Drowsiness

Drowsiness refers to feeling sleepy or tired, or being unable to keep your eyes open. Drowsiness, also called excess sleepiness, can be accompanied by lethargy, weakness, and lack of mental agility. In fact, continuous fatigue can cause levels of performance impairment similar to those caused by alcohol. While driving, these symptoms are extremely dangerous since they significantly increase the probabilities of drivers missing road signs or exits, drifting into other lanes or even crashing their vehicle, causing an accident. The analysis of face images is a popular research area with applications such as face recognition, and human identification and tracking for security systems. This project is focused on the localization of the eyes and mouth, which involves looking at the entire image of the face, and determining the position of the eyes and mouth, by applying the existing methods in image processing algorithm. Once the position of the eyes is located, the system is designed to determine whether the eyes and mouth are opened or closed, and detect fatigue and drowsiness.

B. Facial Recognition

A facial recognition system is a technology capable of matching a human face from a digital image or a video frame against a database of faces, typically employed to authenticate users through ID verification services, works by pinpointing and measuring facial features from a given image.

C. Deep Learning

The field of artificial intelligence is essentially when machines can do tasks that typically require human intelligence. It encompasses machine learning, where machines can learn by experience and acquire skills without human involvement. Deep learning is a subset of machine learning where artificial neural networks, algorithms inspired by the human brain, learn from large amounts of data. Similar to how we learn from experience, the deep learning algorithm would perform a task repeatedly, each time tweaking it a little to improve the outcome. We refer to ‘deep lecarning’ because the neural networks have various (deep) layers that enable learning. Just about any problem that requires “thought” to figure out is a problem deep learning can learn to solve.

The amount of data we generate every day is staggering—currently estimated at 2.6 quintillion bytes—and it’s the resource that makes deep learning possible. Since deep learning algorithms require a ton of data to learn from, this increase in data creation is one reason that deep learning capabilities have grown in recent years. In addition to more data creation, deep learning algorithms benefit from the stronger computing power that’s available today as well as the proliferation of Artificial Intelligence (AI) as a Service. AI as a Service has given smaller organizations access to artificial intelligence technology and specifically the AI algorithms required for deep learning without a large initial investment.

Deep learning allows machines to solve complex problems even when using a data set that is very diverse, unstructured and inter-connected. The more the deep learning algorithms learn, the better they perform.

II. LITERATURE SURVEY

A literature survey was conducted on various face detection methods, existing drowsiness detection methods, and image processing techniques. It uses the front camera of the driver’s mobile phone, which is placed on the windshield. The car constantly uses the front camera to analyse the behaviour of the driver, i.e., whether the driver’s eyes are closed or not. The front camera constantly captures the image of the driver and, with the help of facial expression, The detection checks whether the eyes of the driver are closed or not. The image captured is compared with thousands of Machine learning algorithms are used to achieve the best results. A timer is started after it. When it is detected that the eyes are closed for a particular time, an alert is generated. This alert is used to trigger a with the help of a Bluetooth signal sent from the driver's phone at first, the phone is placed It is held in place by a holder in the windshield so that the driver does not have to hold it all the time and it remains stable. The images are captured at a rate of 5 frames per second. When the face is detected, about 30 points are placed on the face, like eyes. Nose etc. Captured images are then compared with images previously stored on the server and it returns the magnitude of different facial expressions in the drowsy state. The server returns a value in the range of 0–100. Now, whenever this value When the temperature reaches 100 (fully closed), a timer is started. The timer is stopped if the eyes are opened in between. If the timer crosses in a time period of 3 seconds, an alert is generated to wake up the driver. The phone rings, and an alarm is triggered. It uses the Haar cascade classifier for face detection. First, an image frame from the camera is retrieved by video query. Process. The input image is pre-processed by Gaussian filtering to remove noise. Using the Haar cascade, the proposed algorithm detects the face in the pre-processed image. In the main concept is detecting the driver’s face and setting it to Region of Interest (ROI). Next, use ROI to detect eyes and mouth. This process is done by getting input from an infrared 2D camera. In the flow chart, there are five steps, namely, image acquisition, face detection, eye detection, mouth detection, and eyes. Closure and yawning detection. Tiesheng Wang et al., (2005) had developed a system based on yawning detection for determining driver drowsiness. A system had an aim to detect driver drowsiness or fatigue on the base of video analysis which was presented. The main object of this study was on how to extract driver yawning. A real face detector was implemented to trace driver's face region. In this study, mouth window was traced. In which face region and degree of mouth openness was extracted to find driver yawning in video. This method was computationally capable because it ran at real-time on average. When the driver moved his head away by lack of concentration, the eyes and mouth might be occluded and might be detected. There was another situation should be reminded of the driver. For this other methods must be found to deal with it.

Jian-Da Wu et. al., (2007), developed and investigated a warning system while driving using image processing technique with fuzzy logic interface. This system was based on facial images analysis for warning the driver of drowsiness or inattention to prevent traffic accidents. The facial images of driver were taken by a CCD camera which was installed on the dashboard in front of the driver. A fuzzy logic algorithm and an interface were proposed to determine the level of fatigue by measuring the blinding duration and its frequency, and warn the driver accordingly. The experimental works were carried to evaluate the effect of the proposed system for drowsiness warning under various operation conditions. The experimental results indicated that the proposed expect system was effective for increasing safe in drive. This study proved the feasibility of applies image processing technique to safety of vehicle. In this system, besides judging the driver's level of fatigue, it also allowed the head of driver moving within an acceptable region.

Pooneh. R. Tabrizi et. al., (2008) had proposed an easy algorithm for pupil center and iris boundary localization and a new algorithm for eye state analysis, which there was incorporation into a four step system for drowsiness detection: face detection, eye detection, eye state analysis, and drowsy decision. This new system required no training data at any step or special cameras. Their eye detection algorithm used Eye Map, thus achieving excellent pupil center and iris boundary localization results on the IMM database. Novel eye state analysis algorithm detected eye state using the saturation (S) channel of the USV color space. Analysis algorithm of eye state using five video sequences and show superior results compared to the common technique based on distance between eyelids. An easy algorithm for pupil center and iris boundary localization based on Eye Map and a new algorithm for eye state analysis, which they incorporated into a four step system for drowsiness detection: face detection, eye detection, eye state analysis, and drowsy decision by PERCLOS parameter. Pupil center and iris boundary localization algorithm responded in a wide range of lighting conditions with high accuracy. It also required no training data. For eye state analysis, they proposed a chromatic-based algorithm which had better detection rate for closed eye than the eyelids distance based technique and did not use training data. Proposed system for drowsiness detection was simple, non-intrusive, without the need for training data at any step or special cameras and was safe in comparison with IR illuminators. The main limitation of this system was that it was applicable only when the eyes were visible in the image that means with daylight and without dark sunglasses.

Mandalapu Sarada Devi et al. (2008) had developed a system that can detect oncoming driver fatigue and issue timely warning could help in preventing many accidents, and consequently save money and reduced personal suffering. The authors had made an attempt to design a system that used video camera that points directly towards the driver's face in order to detect fatigue. If the fatigue was detected a warning signal was issued to alert the driver. The authors had worked on the video files recorded by the camera. Video file was converted into frames. Once the eyes are located from each frame, by measuring the distances between the intensity changes in the eye area one can determine whether the eyes were open or closed. If the eyes were found closed for 5 consecutive frames, the system draws the conclusion that the driver was falling asleep and issued a warning signal.

Image input from infrared: First, record driver video from the driver's seat using a camera, facing approximately 15 degrees to the driver's face. The frame rate is set at 25 frames per second .
Face detection: A face detection method is the Haar-like feature. This method was invented by Paul Viola and "Michael Jones. This step is used for finding the driver’s face. Faces are classified using Harr-like features.by determining the differential between the shaded rectangle and the normal rectangle, and comparing it with threshold and polarity. The following equation determines the sum of rectangular area: where I(x, y) represents the strength of shading x and y represents the sum of shading valuesx1, x2, y1 and y2 are the coordinates of rectangle’s corner. In this method, after detecting the driver’s face, the author will set it to ROI, and another area will be set to black.
Split image into upper and lower halves: The image of the face is divided into two halves. This method for limiting the scope of detecting the driver’s eyes and mouth.
Detecting the driver’s eyes and mouth: This step is used for detecting the driver’s eyes and mouth by using Haar like feature. With the camera settings, the driver’s image will be a little tilted. So, this step will rotate the image. about 3 degrees to the left and right
Detecting eyes' closure and yawning: To classify data, the author uses Support Vector Machine (SVM). Before train Normal images are turned into vector data using SVM. Histograms are extracted from normal images. oriented histogram Gradients (HOG) is a method to separate images into MA gradients and calculate the scale and vector of that cell. SVM can be used to distinguish between closed and open cases. Eyes, as well as a yawning mouth.In the video is first recorded using a webcam. To capture a frontal image, the camera will be positioned in front of the driver. The frames are extracted from the video to obtain 2-D images. A face is detected in the frames. After when detecting the face, facial landmarks like the positions of the eyes, nose, and mouth are marked on the images. aspect ratio of the eye, the mouth opening ratio and position of the head are calculated from the facial landmarks, and using these features and with a machine learning approach, a decision is obtained about the taken from two different angles: some of them were recorded with a camera mounted under the front mirror of the vehicle, while others were recorded from the dashboard. Because of their similarity to UTA-RLDD videos, in this work, we use the videos of the dashboard to detect if the driver is yawning.

III. MATERIALS AND METHODS

A. Existing System

Our projects main Aim is to detect whether the person who is driving the vehicle is drowsy or not, so many systems are working perfectly but there are limitations around it they are,

In the field of Data Science, there are no adequate data resources to conduct tests to enhance better accuracy.
So that is the reason which leads to the low accuracy of the end product and the poor image quality limits facial recognition effectiveness.

a. Limitations

When measuring the drowsiness level of a driver, there are two different approaches according to the origin of the data used for this measuring. On the one hand, there are systems that monitor the vehicle state to assess the fatigue of the driver, while on the other hand, there are systems that use parameters obtained from the own driver.

Among works that focus on the analysis of the vehicle state and its relation to fatigue, the most common measures that are studied are steering wheel behaviours or lane departures. In other parameters of the car are used, such as the vehicle position or the steering wheel angle, and they perform data fusion on multiple measures to achieve a more reliable system. However, even if the diminishing performance over skill-based tasks by the driver can actually be a consequence of drowsiness, it appears at a later stage and it cannot be used to detect the early symptoms of fatigue.

One of the most reliable ways of estimating fatigue is by using electroencephalograms (EEG) in combination with electrooculograms (EOG), but in real driving environments, these kinds of systems are usually rejected by drivers. Their main drawback is that they require that the driver has attached electrodes around the eyes and over the head, which makes them intrusive systems that produce discomfort and rejection by drivers. Because of this limitation, the most used fatigue detection systems are those in which the driver’s state is detected through a camera placed on the vehicle that takes images of the driver. In this work, we will focus on the detection of the early symptoms of drowsiness by using the driver’s state.

b. Disadvantages

There are many works that follow this approach, which use numerous and varied parameters and techniques for their detection. For example, in, the landmarks of the driver’s face (that is, a group of points that locate the most important elements of the face: eyes, eyebrows, nose, mouth, and facial shape) are obtained, and then, using these landmarks, some parameters, such as the percentage of eye closure (PERCLOS), are calculated. Afterwards, these features are introduced on a support vector machine (SVM) that classifies whether the driver is tired or not.

Fuzzy logic becomes a powerful tool when developing systems that help protect drivers: on the one hand, because it is easy and intuitive to create rules that are accurate Appl. Sci. 2022, 12, 1145 4 of 25 and whose results are easily understood, and, on the other hand, due to the fast computing of these kind of systems, which allows using them in real time. Examples of these systems are not limited to fatigue detection: in for example, a fuzzy-based alarm system is proposed that alerts the driver of dangerous situations. Among works that combine fuzzy logic and fatigue detection, in a system is proposed that analyzes the mouth and eyes of the driver, measuring its openness to assess whether the driver is fatigued or not, and if the system detects drowsiness over several consecutive frames, it raises an alarm. In contrast, in the authors use measures that represent the driver’s behavior over a window of time, as the average PERCLOS, the driver’s blinking rate or the head position, all of it measured over the last 60 s. After this, these parameters are introduced in a fuzzy inference system (FIS) formed by 32 different rules, and the drowsiness level of the driver is calculated.

c. Challenges Faced

Current detection systems repurpose classifiers to perform detection. To detect an object, these systems take a classifier for that object and evaluate it at various locations and scales in a test image. More recent approaches like CNN use region proposal methods to first generate potential bounding boxes in an image and then run a classifier on these proposed boxes. After classification, post-processing is used to refine the bounding boxes, eliminate duplicate detections, and rescore the boxes based on other objects in the scene. These complex pipelines are slow and hard to optimize because each individual component must be trained separately.

Even, YOLO must also run on GPU to run at optimal speed to detect objects in real time.

In a combination of depth videos and deep learning is used for fatigue detection. In particular, it uses two CNNs: a spatial CNN, which detects object’s positions, and a temporal CNN, which looks for information between two neighboring frames. By using these two CNNs, the system is able to calculate motion vectors from one frame to another, which allows to detect yawns, even when the driver uses a hand to cover his or her mouth. Although both works report good results, it is important to note that the experiments were performed over people that simulated their drowsiness state. By faking these situations, subjects tend to exaggerate their expressions and show symptoms that are clearly visible, which causes the developed systems to be less reliable in real environments. To avoid the problem of simulated data, in this work, we will use the UTA RealLife Drowsiness Dataset (UTA-RLDD). This dataset contains the frontal videos of 60 different people performing a simple task (reading or watching something on a computer), with a duration of 10 min per recording. These videos are classified based on the state of drowsiness of the subjects when they were recorded (awake, low vigilant or drowsy), and each person has at least one video of each category. UTA-RLDD was created for the task of multi-stage drowsiness detection, targeting not only extreme and easily visible cases, but also less explicit cases, where subtle micro-expressions are the discriminative factors. Because of this, it is a suitable dataset to search for the evidence of real drowsiness, which is the purpose of this work.

B. Proposed System

The drowsiness detection system developed in this work is part of a driver-based ADAS system, with two important restrictions: early detection and minimization of the number of false positives. The idea is that the system will warn the driver only in real cases of fatigue, to avoid false positives, which would cause boredom in the driver, causing them to turn off the ADAS, without executing the rest of the functionalities. When it comes to the recording of the driver, it is important to determine the frame rate that the camera has to communicate to the system. A high frame rate will overload the system because of the high number of frames per second (FPS) that have to be evaluated, but a low amount of FPS can affect negatively the system performance. In this domain, it is necessary that there are enough FPS to appreciate details of the image sequence that have a very short duration, such as blinks. Since the average blink duration ranges from 100 to 400 ms, in this work, a frame rate of 10 FPS is used, which is enough to detect blinks and avoid overloading the system. This way, 600 frames are evaluated every time a new frame is captured by the camera. To do this, the system stores the previous 599 frames, so that a full sequence of 60 s is analyzed at each instant.

These images are received by the preprocessing module, whose objective is to transform the received image into data that can be used by the drowsiness detection model. The preprocessed data are then sent to the analysis module, which performs the fatigue detection tasks and assesses the level of drowsiness of the driver at that moment, based on the information from the last 60 s. Lastly, the calculated drowsiness level is transmitted to the alarm activation module, which uses the last levels of drowsiness to determine whether it is necessary to alert the driver or not.One of the main objectives of the alarm activation module is to minimize the number of false positives of the system (drowsiness alerts when the driver is actually awake), since a high number of false positives disturbs the driver and increases the possibility of turning off the system. This is one of the reasons that motivate the experimentation over videos instead of frames and make the tests more exacting, since the activation of a single alarm in a 10 min video means that, regardless of what is detected before or after that moment, the classification of the video will be considered “drowsy”. Once the system has made its decision on whether to alert the driver or not (a yes/no possible outcome), it will communicate its decision to the human–computer interaction system responsible for warning the driver by using visual and/or sound stimuli.

The images get crop and preprocess the original frame at the preprocessing module. To avoid that the model is affected by possible noise, the first step is to apply a Gaussian blur to the original image. Blurring images is a common technique used to smooth edges and remove noise from an image, while leaving most of the image intact. From the blurred image, we extract the image region that contains the face, for which DLIB’s library [28] is used. In particular, we use its face detector, which calculates the coordinates of the face location by using histograms of oriented gradients (HOG) and a linear supporting vector machine (SVM). This way, this AI technique is used to crop the original image and leave only the face of the driver.

Alarm Activation

a. The driver is considered to be drowsy at a specific moment when the output of the analysis module is greater than a threshold, called drowsiness threshold. This value ranges between 0 and 1.

b. The driver has to be considered drowsy for multiple instants of the last 60 s to raise an alarm. This is determined by the variable min_time, which represents how many seconds the driver has to be drowsy before alerting him or her. This value ranges between 0 and 60. This way, when the driver is considered drowsy for at least the established minimum time, an alarm is raised. If the condition to raise an alarm continues after alerting the driver, a second alarm is not raised. Instead, another alarm is raised only if the conditions for alerting the driver are no longer met, and after that, drowsiness is detected again.

2. Pre-processing

The first step to calculate these parameters is locating the driver’s face. To do this, DLIB’s face detector is used once again. However, in this case, we also use the landmark detector that DLIB provides, which is an implementation of, where the shape predictor uses an ensemble of regression trees. DLIB’s landmark detector represents the facial features with a group of 68 points, 12 of them representing the eyes (6 points per eye), which allows us to know the position of the driver’s eyes. To calculate the parameters related to eyes and blinks, it is necessary to calculate first how opened or closed the driver’s eyes are.

To do this, we use the technique proposed in , we calculate the eye aspect ratio (EAR) by measuring the distance between the top eyelid and the bottom eyelid, and we divide it by the eye width, thus obtaining openness values that usually range between 0.16 and 0.36. Experimentally, it is determined a threshold of 0.20, so that every time that the measured EAR is under 0.20, it is considered that the driver has closed his or her eyes. By calculating how many times and for how long the driver blinks, we gather the first three measures.

Next, we want to use the face to detect if the driver is yawning. To do this, the face of the driver is cropped and preprocessed following the same process as described above.

3. Advantages

By taking into careful consideration we have observed and monitor closely to recover the below limitations and did our best to add better accuracy to the project, they are

a. Giving better accuracy by using more accurate and efficient algorithms

b. Better pixel-rated camera equipment and training to the data set and testing with more data so that it will be able to accurately detect drowsiness of the person.

c . Adding a audio alert system/ so that to report the person can recover from the drowsiness or take necessary action.

IV. RESULTS

This section presents the results derived from the experimental evaluation. These results were obtained by following the experimentation methodology described in Sec- tion with each of the two solutions proposed in this work, so there is a subsection for the performance of each alternative.

A. Showing Screen During Yawn Detection and Drowsiness Detection

This section presents the results derived from the experimental evaluation. These results were obtained by following the experimentation methodology described in Section 3.3 with each of the two solutions proposed in this work, so there is a subsection for the performance of each alternative.

???????

Conclusion

The various measures of driver drowsiness reviewed in this work are based purely on the level of drowsiness induced in the subject, which, in turn, depends on the time of day, duration of the task and the time that has elapsed since the last sleep. However, when developing a better drowsiness detection system, several other issues need to be addressed; the two most important ones are discussed below. The ultimate aim of our work is to activate an alarm when the system detects that the driver is drowsy, which means that the alarm activation module will follow a binary behavior (on/off, depending on the fatigue level of the driver). Because of this, only the “awake” and “drowsy” classes are used to train and test the system (60 awake videos, 62 drowsy videos). These videos were recorded with web camera resulting in a video with good quality and resolution. This is very interesting to emulate real situations in a car, where there are light changes, but the results obtained can be moderate, as would correspond to this situation. The various measures used to detect drowsiness include subjective, vehicle-based, physiological and behavioral measures.

References

[1] Dinges, D.F. An overview of sleepiness and accidents. J. Sleep Res. 1995, 4, 4–14. [CrossRef] [2] Dawson, D.; Reid, K. Fatigue, alcohol and performance impairment. Nature 1997, 388, 235. [CrossRef] [3] Williamson, A.M.; Feyer, A.M.; Mattick, R.P.; Friswell, R.; Finlay-Brown, S. Developing measures of fatigue using an alcohol comparison to validate the effects of fatigue on performance. Accid. Anal. Prev. 2001, 33, 313–326. [CrossRef] [4] Soares, S.; Monteiro, T.; Lobo, A.; Couto, A.; Cunha, L.; Ferreira, S. Analyzing Driver Drowsiness: From Causes to Effects. Sustainability 2020, 12, 1971. [CrossRef] [5] Pouyanfar, S.; Sadiq, S.; Yan, Y.; Tian, H.; Tao, Y.; Reyes, M.P.; Shyu, M.L.; Chen, S.C.; Iyengar, S.S. A Survey on Deep Learning: Algorithms, Techniques, and Applications. ACM Comput. Surv. 2018, 51, 1–36. [CrossRef] [6] Najafabadi, M.; Villanustre, F.; Khoshgoftaar, T.; Seliya, N.; Wald, R.; Muharemagic, E. Deep learning applications and challenges in big data analytics. J. Big Data 2015, 2, 1–21. [CrossRef] [7] Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [CrossRef] [8] Transfer Learning & Fine-Tuning. Available online: https://keras.io/guides/transfer_learning/ (accessed on 20 August 2021). [9] Roy, A.M.; Bhaduri, J. A Deep Learning Enabled Multi-Class Plant Disease Detection Model Based on Computer Vision. AI 2021, 2, 413–428. [CrossRef] [10] Zadeh, L. Fuzzy logic. Computer 1988, 21, 83–93. [CrossRef]

Copyright

Copyright © 2022 V. Andrew Spourgeon , S. Mrudula , S. Anil , V. Pravallika, Dr. G. A. V. Rama Chandra Rao. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET43815

Publish Date : 2022-06-04

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here