This paper presents a real-time, multimodal human-computer interaction (HCI) system that combines slow eye blink detection with static hand gesture recognition to enable intuitive, touch-free communication. The system leverages Google’s MediaPipe framework to track facial and hand landmarks, using a custom algorithm to differentiate intentional slow blinks from natural ones. A single slow blink is interpreted as “No,” while a double slow blink triggers a “Yes” response. Simultaneously, the system recognizes six distinct hand gestures—such as thumbs up/down, peace, and pointing—to allow broader interaction capabilities. Voice feedback is integrated to enhance accessibility. The proposed system operates efficiently on standard hardware, achieving 94–96% accuracy. Designed for constrained environments such as assistive technology, sterile conditions, or public kiosks, the system demonstrates strong potential for inclusive and non-contact user interface design.
Introduction
The text presents a low-cost, contactless multimodal human–computer interaction (HCI) system designed for individuals who cannot speak or use traditional input devices due to paralysis, neurological disorders, or limited mobility, as well as for professional environments requiring hands-free and sterile interaction. Conventional tools such as keyboards, mice, touchscreens, and voice commands are often impractical in these contexts, creating a need for intuitive and accessible alternatives.
The proposed system combines eye blink detection and hand gesture recognition to enable effective communication and device control using only a standard webcam. Intentional slow blinks are used as binary inputs, where a double slow blink indicates “Yes” and a single slow blink indicates “No,” with blink duration and timing rules distinguishing deliberate actions from natural blinking. Hand gestures such as thumbs up/down, open palm, fist, peace sign, and pointing provide additional command options, enabling richer interaction.
The system leverages MediaPipe’s face and hand landmark tracking for real-time detection without requiring specialized hardware or computationally intensive models. Recognized gestures and blinks are confirmed through speech feedback, enhancing usability for visually impaired users and accessibility-focused applications.
Evaluation results show high performance, with 93% blink detection accuracy and 95% hand gesture recognition accuracy, operating in real time at 18–20 FPS under varying conditions. User feedback highlighted the effectiveness of speech output and ease of use.
Overall, the system offers a robust, affordable, and accessible solution for speech-free and hands-free interaction, empowering individuals with severe mobility or speech impairments and supporting specialized environments such as hospitals, rehabilitation centers, and hazardous workplaces.
Conclusion
We present a low-cost, real-time interaction system combining eye and hand modalities to enable touch-free, binary and gesture-based control. The system shows potential for practical deployment in assistive environments and can be further extended with gaze tracking and adaptive gestures.
References
[1] T. Soukupova and J. Cech, "Real-Time Eye Blink Detection Using Facial Landmarks," in Proceedings of the Computer Vision Winter Workshop (CVWW), 2016.
[2] S. Mittal, A. Sharma, and R. Verma, "Blink-based Human-Computer Interaction System for Assistive Communication," Universal Access in the Information Society (UAIS) Journal, vol. 18, no. 4, pp. 1025–1034, 2019.
[3] Google MediaPipe, "Hand and Face Tracking Pipelines," 2022. [Online]. Available: https://google.github.io/mediapipe/
[4] A. Smith and B. Jones, "Vision-Based Multimodal Interaction in Medical Environments," in Proceedings of the International Conference on Computer and Information Technology (ICCIT), 2017.
[5] K. Lee, M. Park, and J. Kim, "Eye and Hand Gesture Recognition for Touchless Interfaces," in Proceedings of the ACM International Conference on Multimodal Interaction (ICMI), 2018.