Traditional computing devices leverage physical input devices for navigation, which might be challenging for physically disabled people. Webcam based eye tracking for hands-free navigation presents a hands-free human interaction system that enables users to control their computer cursor through eye movements. The system detects the face, eyes, and iris using MediaPipe in real time through a standard webcam and maps these features to screen coordinates, which are then used to control the cursor via PyAutoGUI. A calibration screen with nine reference points on different parts of the screen is used as a training dataset, as the system uses machine learning models for predicting the gaze. Additionally, slight head movements are also accounted for so as not to disrupt gaze tracking. The system also implements gaze-based navigation, like scrolling when dwelling upon the top or bottom of the screen, double blinking for single clicks, and triple blinking for double clicks. Future enhancements of the model can include voice-controlled navigation. The system demonstrates the viability of affordable, camera-based eye tracking as an effective assistive technology for computer accessibility.
Introduction
In today’s digital era, making technology accessible to all, especially individuals with motor impairments, is critical. Traditional input devices like keyboards and mice are often unusable for some users, and existing assistive eye-tracking systems tend to be costly, intrusive, and hardware-dependent.
This project proposes a video-based, hands-free navigation system using only a standard webcam and computer vision techniques, eliminating the need for specialized hardware or invasive devices. The system tracks facial landmarks and eye gaze with Google’s MediaPipe FaceMesh and employs polynomial regression to map gaze direction to cursor movement. It supports gesture-based commands such as double and triple blinks for clicking, and dwell-based scrolling for seamless interaction.
Key Features and Innovations:
Webcam-based eye tracking: Low-cost, non-intrusive, using MediaPipe for precise facial and iris landmark detection.
Personalized calibration: A nine-point calibration adapts the system to individual eye characteristics for improved accuracy.
Gesture-based controls: Double blinks simulate left clicks, triple blinks simulate right clicks, and vertical eye movements trigger scrolling.
Head movement compensation: Uses Kalman filtering and depth (z-index) tracking to maintain cursor accuracy despite natural head motion.
Real-time operation: Enables smooth and responsive control without physical input devices.
Challenges and Future Improvements:
Lighting variability affects iris detection.
Continuous recalibration needed for head movement.
Blink detection may vary between users.
Future enhancements could include voice commands and advanced machine learning for better gaze prediction.
Objectives:
Deliver an affordable, inclusive alternative to physical input devices.
Enhance accessibility for users with motor disabilities.
Implement a scalable system usable with existing hardware (standard webcams).
Enable full mouse functionality through eye movements and blinks.
Methodology:
System architecture: Webcam input, facial landmark detection, gaze estimation, and interaction control modules.
Data acquisition: Real-time video capture with facial feature extraction via MediaPipe.
Calibration: Nine-point gaze mapping with polynomial regression to personalize cursor control.
Cursor control: Real-time gaze tracking with noise reduction and head movement adjustment.
Gesture interaction: Blink detection for clicks and eye movements for scrolling.
Tools: Python, OpenCV, MediaPipe, PyAutoGUI, and Scikit-learn.
Conclusion
In this paper,we propose a webcam-based eye-tracking system that offers an effective, low-cost solution for hands-free computer interaction, leveraging standard hardware and real-time gaze estimation through MediaPipe. By integrating a robust 9-point calibration system, blink-based selection mechanisms, and head pose compensation, the system ensures accurate and stable control across varied user environments.The incorporation of polynomial regression and Kalman filtering enhances precision and responsiveness, while intuitive blink gestures enable seamless navigation, clicks, and scrolling. The approach not only demonstrates the viability of webcam-based gaze tracking but also advances digital accessibility particularly for users with motor impairments—by removing dependence on conventional input devices. Overall, the system signifies a meaningful step toward inclusive human-computer interaction, with potential for further extension into multimodal interfaces and adaptive assistive technologies.
References
[1] G. R. Chhimpa, A. Kumar, S. Garhwal, Dhiraj, F. Khan, and Y.-K. Moon, “Revolutionizing gaze-based human–computer interaction using iris tracking: A webcam-based low-cost approach with calibration, regression and real-time re-calibration,” IEEE Access, vol. 12, pp. 168256–168269, 2024.
[2] Z. Zhu and Q. Ji, “Novel eye gaze tracking techniques under natural head movement,” IEEE Transactions on Biomedical Engineering, vol. 54, no. 12, pp. 2246–2260, Dec. 2007.
[3] D. Cazzato, F. Dominio, R. Manduchi, and S. M. Castro, “Real-time gaze estimation via pupil center tracking,” Paladyn, Journal of Behavioral Robotics, vol. 9, no. 1, pp. 6–18, Feb. 2018.
[4] R. Valenti, N. Sebe, and T. Gevers, “Combining head pose and eye location information for gaze estimation,” IEEE Transactions on Image Processing, vol. 21, no. 2, pp. 802–815, Feb. 2012.
[5] Y.-M. Cheung and Q. Peng, “Eye gaze tracking with a web camera in a desktop environment,” IEEE Transactions on Human-Machine Systems, vol. 45, no. 4, pp. 419–430, Aug. 2015.
[6] N. Aunsri and S. Rattarom, “Novel eye-based features for head pose-free gaze estimation with web camera: New model and low-cost device,” Ain Shams Engineering Journal, vol. 13, no. 5, Sep. 2022.
[7] E. Whitmire, L. Trutoiu, R. Cavin, D. Perek, B. Scally, J. Phillips, et al., “EyeContact: Scleral coil eye tracking for virtual reality,” in Proc. ACM Int. Symp. Wearable Comput., pp. 184–191, Sep. 2016.
[8] Y. Nam, B. Koo, A. Cichocki, and S. Choi, “GOM-face: GKP EOG and EMG-based multimodal interface with application to humanoid robot control,” IEEE Transactions on Biomedical Engineering, vol. 61, no. 2,
[9] pp. 453–462, Feb. 2014.
[10] J. San Agustin, H. Skovsgaard, E. Mollenbach, M. Barret, M. Tall,
[11] D. W. Hansen, et al., “Evaluation of a low-cost open-source gaze tracker,” in Proc. Symp. Eye-Tracking Res. Appl. (ETRA), pp. 77–80, 2010.
[12] T. Wu, P. Wang, Y. Lin, and C. Zhou, “A robust noninvasive eye control approach for disabled people based on Kinect 2.0 sensor,” IEEE Sensors Letters, vol. 1, no. 4, pp. 1–4, Aug. 2017.