Driver fatigue and distraction are major causes of road accidents worldwide. This paper presents DriveSafe-AI, a real-time browser-based driver monitoring system designed to detect drowsiness, yawning, head movement, and mobile phone usage using computer vision techniques.
The system integrates MediaPipe Face Mesh for 468 facial landmark detection, Eye As- pect Ratio (EAR) and Mouth Aspect Ratio (MAR) for behavioral analysis, and COCO-SSD object detection for distraction moni- toring. Unlike traditional systems, DriveSafe-AI operates entirely on the client side using TensorFlow.js, ensuring low latency and privacy preservation without transmitting video data to external servers. A rule-based risk scoring mechanism aggregates multiple behavioral indicators to classify driver state into safe, warning, and critical levels. Experimental evaluation demonstrates reliable real-time performance with minimal computational overhead. The proposed system provides an effective and privacy-preserving solution for intelligent driver safety monitoring.
Introduction
Driver fatigue and distraction are major causes of road accidents worldwide, contributing to millions of injuries and over 1.3 million deaths annually. Fatigue affects reaction time, awareness, and decision-making ability, increasing the risk of crashes. To address this issue, vision-based driver monitoring systems have been developed as non-intrusive solutions that detect drowsiness and distraction using cameras and computer vision techniques.
The proposed system, DriveSafe-AI, is a real-time browser-based driver monitoring framework designed to detect unsafe driving behaviors such as drowsiness, yawning, head pose deviation, and mobile phone usage. Unlike traditional systems that rely on cloud processing, DriveSafe-AI performs all computations locally on the user’s device using MediaPipe Face Mesh and TensorFlow.js, ensuring low latency and stronger privacy protection.
The system analyzes facial features and driver behavior using several indicators:
Eye Aspect Ratio (EAR) to detect eye closure and blinking patterns.
Mouth Aspect Ratio (MAR) to identify yawning.
Head pose estimation to detect when the driver is not facing the road.
Object detection models (COCO-SSD) to identify mobile phone usage.
These indicators are combined using a weighted risk scoring mechanism that classifies the driver’s condition into Safe, Warning, or Critical levels. The system operates in real time at 22–28 frames per second with low latency (80–120 ms), making it suitable for practical driver assistance.
Compared with previous driver monitoring approaches that rely on single indicators or cloud processing, DriveSafe-AI integrates multiple behavioral indicators with object detection while maintaining privacy by processing all data locally. No video frames, facial landmarks, or biometric data are transmitted or stored externally.
The system architecture includes five main components:
Webcam Module – Captures live video frames.
Face Landmark Detection – Extracts facial landmarks using MediaPipe.
Feature Extraction Module – Calculates EAR, MAR, and head orientation.
Object Detection Module – Detects mobile phone usage.
Risk Scoring Engine – Combines indicators to evaluate driver safety.
Performance evaluation is based on metrics such as detection accuracy, false positive rate, processing speed (FPS), and system latency. Overall, DriveSafe-AI provides a privacy-preserving, real-time, and lightweight driver monitoring solution that can run on standard consumer devices without specialized hardware, contributing to improved road safety.
Conclusion
This paper presented DriveSafe-AI, a real-time driver monitoring system designed to detect drowsiness, distraction, and unsafe driving behaviors using browser-based artificial intelligence. The proposed framework integrates facial landmark analysis, geometric feature extraction, and object detection techniques to identify critical behavioral indicators including eye closure, head pose deviation, yawning, and mobile phone usage.
Unlike cloud-based driver monitoring solutions, DriveSafe- AI operates entirely on the client side using TensorFlow.js and MediaPipe, ensuring that raw video frames and biometric data remain confined to the local device. This privacy-preserving architecture significantly reduces security risks associated with external data transmission while maintaining real-time respon- siveness.
Experimental evaluation under live operational conditions demonstrates stable performance at 22–28 FPS. The system ef- fectively classified driver states into safe, warning, and critical categories based on a weighted risk scoring mechanism. Real- time monitoring showed consistent detection of prolonged eye closure, directional head distraction, and handheld mobile device usage.
The results indicate that integrating multiple behavioral indicators reduces false alarms compared to single-feature detection systems. The proposed approach provides a scalable and lightweight solution suitable for browser-based deploy- ment without specialized hardware.
Future work will focus on adaptive threshold optimization, improved robustness under extreme lighting conditions, and integration with vehicular IoT systems for enhanced in-vehicle safety assistance. Additionally, incorporating lightweight deep learning models for edge deployment could further improve performance on resource-constrained devices.
Beyond individual driver assistance, browser-based monitoring frameworks such as DriveSafe-AI may support scalable deployment in fleet management systems, intelligent transportation platforms, and low-cost safety monitoring solutions. The privacy-preserving design enables deployment in privacy- sensitive environments without requiring centralized biometric data storage.
References
[1] World Health Organization, “Global status report on road safety 2023,” 2023.
[2] P. Philip et al., “Fatigue, sleep restriction and driving performance,” Accident Analysis Prevention, vol. 37, no. 3, pp. 473–478, 2005.
[3] C. D. Wickens, “Multiple resources and mental workload,” Human Factors, vol. 50, no. 3, pp. 449–455, 2008.
[4] Y. Dong, Z. Hu, K. Uchimura, and N. Murayama, “Driver inattention monitoring system for intelligent vehicles: A review,” IEEE Transactions on Intelligent Transportation Systems, vol. 12, no. 2, pp. 596–614, 2011.
[5] G. Sikander and S. Anwar, “A survey on driver fatigue monitoring systems,” IEEE Transactions on Intelligent Transportation Systems, vol. 20, no. 6, pp. 2339–2352, 2018.
[6] J. A. Stern, D. Boyer, and D. Schroeder, “Blink rate: A possible measure of fatigue,” Human Factors, vol. 36, no. 2, pp. 285–297, 1994.
[7] L. M. Bergasa, J. Nuevo, M. A. Sotelo, R. Barea, and E. Lopez, “Real- time system for monitoring driver vigilance,” in IEEE Transactions on Intelligent Transportation Systems, vol. 7, pp. 63–77, 2006.
[8] S. Hu and G. Zheng, “Driver fatigue detection based on convolutional neural network,” IEEE Access, vol. 6, pp. 38376–38386, 2018.
[9] S. Raj et al., “A survey on vision-based driver monitoring systems,” IEEE Access, vol. 9, pp. 133882–133910, 2021.
[10] C. Lugaresi et al., “Mediapipe: A framework for building perception pipelines,” arXiv preprint arXiv:1906.08172, 2019. MediaPipe frame- work (Face Mesh, pipelines, calculators).
[11] V. Bazarevsky, Y. Kartynnik, A. Vakunov, K. Raveendran, and M. Grundmann, “Blazeface: Sub-millisecond neural face detection on mobile gpus,” in CVPR Workshop on Computer Vision for Augmented and Virtual Reality (CV4AR/VR), 2019.
[12] D. Smilkov et al., “Tensorflow.js: Machine learning for the browser and node.js,” Google AI Blog / project pages, 2018. Browser inference and privacy-preserving client-side ML (TensorFlow.js).
[13] T. Soukupova´ and J. C? ech, “Real-Time Eye Blink Detection using Facial Landmarks,” in Proceedings of the 21st Computer Vision Winter Workshop (CVWW), 2016. original EAR formula / demonstration.
[14] P. Pandey et al., “Vision-based yawning detection for fatigue monitor- ing,” International Journal of Computer Applications, 2018.
[15] E. Murphy-Chutorian and M. M. Trivedi, “Head pose estimation and augmented reality: A review,” IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009. Classic survey of head pose methods.
[16] W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” European Conference on Computer Vision (ECCV), 2016.
[17] T.-Y. Lin, M. Maire, and S. e. a. Belongie, “Microsoft coco: Com- mon objects in context,” in European Conference on Computer Vision (ECCV), 2014.
[18] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection (yolo),” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[19] K. Nishiyama et al., “Lightweight convolutional neural network for real- time driver drowsiness detection,” IEEE Access, vol. 7, pp. 163123– 163134, 2019.
[20] M. Hassouneh and M. Mutaz, “Mouth opening detection for yawning analysis using facial landmarks,” IEEE International Conference on Image Processing, 2020.
[21] A. Doshi and M. Trivedi, “Head and eye gaze dynamics during visual attention shifts in driving scenarios,” IEEE Transactions on Intelligent Transportation Systems, vol. 13, no. 3, pp. 1265–1274, 2012.
[22] M. Ramzan, H. Khan, and I. Awan, “Driver drowsiness detection: A review of recent advances and future directions,” IEEE Access, vol. 10, pp. 65080–65105, 2022.
[23] J. Shepard and Y. Kwon, “Performance evaluation of in-browser machine learning frameworks,” ACM Web Conference, 2021.
[24] Y. Li et al., “Edge ai: On-device intelligence and privacy preservation,”IEEE Internet of Things Journal, 2022.
[25] M. J. Flores and J. M. Armingol, “Driver monitoring systems: An overview,” IEEE Intelligent Transportation Systems Magazine, 2018.
[26] W. W. Wierwille and L. A. Ellsworth, “Perclos: A valid psychophysi- ological measure of operator drowsiness,” Proceedings of the Human Factors and Ergonomics Society Annual Meeting, 1994. PERCLOS drowsiness metric reference.
[27] Q. Ji, Z. Zhu, and P. Lan, “Real-time eye, gaze, and face pose tracking for monitoring driver vigilance,” Real-Time Imaging, vol. 10, no. 5, pp. 357–377, 2004.
[28] B. Mandal, L. Li, G. Wang, and J. Lin, “Towards detection of bus driver fatigue based on robust visual analysis of eye state,” IEEE Transactions on Intelligent Transportation Systems, vol. 18, no. 3, pp. 545–557, 2016.
[29] Y. Liang et al., “A multi-modal approach for driver drowsiness detection using visual and physiological signals,” IEEE Transactions on Intelligent Transportation Systems, 2019.
[30] Z. Zhang et al., “Driver fatigue detection using cnn-lstm hybrid net- work,” IEEE Access, vol. 7, pp. 102254–102263, 2019.
[31] A. Chowdhury et al., “Vision-based driver monitoring systems: A review,” IEEE Transactions on Intelligent Vehicles, 2020.
[32] S. Taamneh et al., “Driver risk assessment using machine learning and computer vision,” IEEE Intelligent Transportation Systems Conference, 2017.
[33] e. a. Zhang, “Face alignment and geometric features for blink and yawn detection,” IEEE Access, 2019.
[34] S. Martin and M. Trivedi, “Head pose estimation for driver monitor- ing systems: A comparative study,” IEEE Transactions on Intelligent Transportation Systems, 2016.
[35] A. Bulat and G. Tzimiropoulos, “How far are we from solving the 2d 3d face alignment problem?,” in IEEE International Conference on Computer Vision (ICCV), 2017.
[36] H. Kim et al., “Real-time mobile phone usage detection for driver monitoring using deep learning,” IEEE Access, vol. 8, pp. 182430– 182441, 2020.
[37] X. Yan and Y. Wu, “Vision-based driver hand and mobile phone usage detection,” IEEE Transactions on Vehicular Technology, 2016.
[38] T. Erez et al., “Deep learning-based driver distraction detection,” IEEE Transactions on Intelligent Vehicles, 2019.
[39] B. Baheti, S. Gajre, and S. Talbar, “Detection of distracted driver using convolutional neural network,” IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2018.
[40] S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in NeurIPS, 2015.
[41] V. Kazemi and J. Sullivan, “One millisecond face alignment with an ensemble of regression trees,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1867–1874, 2014.
[42] e. a. Jiang, “Real-time face landmarking for mobile devices,” ACM Transactions on Multimedia Computing, 2019.
[43] Google and T. community, “Coco-ssd in tensorflow.js (coco-ssd): Browser object detection model,” 2020. Implementation resource used for in-browser phone detection.