Vision-based cursor control technologies have made significant improvements, from hardware-dependent systems to more intelligent, accessible, and user-friendly solutions. Early techniques utilized niche devices such as Kinect and RGB-D cameras for gesture recognition, providing high accuracy but poor portability and robustness to environmental factors. The advent of webcam-based systems then opened up greater access through colored gloves, facial gesture recognition, and later, markerless hand tracking based on computer vision algorithms. Integration with gaze tracking and vestibulo-ocular reflex further improved accuracy for hands-free operation, although calibration and lighting sensitivity remained. More recent advancements have accepted multimodal systems involving gaze, speech, and lip detection, in addition to non-invasive EEG/EOG wearables and brain–computer interfaces for greater functionality. Simpler blink-based interfaces and smaller vision-based tactile sensors have also appeared to assist people with more severe mobility impairments, weighing simplicity with technical compromises.
Introduction
Over the past five years, vision-based cursor control has evolved from relying on specialized hardware to more versatile, multimodal systems. Early developments used devices like Microsoft Kinect V2 and RGB-D cameras for gesture sensing but faced issues with hardware discontinuation, cost, and sensitivity to lighting and occlusion. Webcam-based systems emerged, such as the Swift Controller using colored gloves for precise gesture detection, though at the expense of user comfort. Facial gesture interfaces provided alternatives for users with limited hand mobility but had limited gesture vocabularies.
Gaze-based control became more accessible through deep learning and regular webcams, despite calibration and lighting challenges. Enhancements like vestibulo-ocular reflex tracking improved accuracy but increased complexity. Markerless hand gesture recognition using OpenCV advanced accessibility but still struggled with lighting and limited testing on diverse populations.
By 2023, multimodal systems combining gaze, speech, and lip movement allowed complex interactions but demanded higher computational power. Privacy-conscious systems using EEG-EOG glasses and brain-computer interfaces offered promising non-camera-dependent control but faced challenges in comfort, training, and signal clarity.
Simplified blink-based cursor control was introduced for paralyzed users, offering immediate usability but limited precision. Additionally, innovative tactile sensing systems like ThinTact enabled ultra-thin, lensless interaction, trading off resolution and processing requirements.
Conclusion
In short, as an alternative to the limitations of RGB-based cursor control in illumination and occlusion [2, 5, 7], a thermovision alternative with infrared imaging and deep learning is a viable candidate. Employing thermal signatures, it promises greater robustness under different lighting situations and potentially better occlusion management compared to webcam-based [5, 7] and even RGB-D alternatives [2]. Deep learning, successful in gaze [5] and BCI control [11], can be applied to decode thermal hand movements for accurate cursor manipulation based on markerless gesture recognition breakthroughs [7, 8]. This follows the trend with the direction nowadays towards privacy-conscious sensing [10], with a non-visual input modality. Future work should be directed towards constructing deep models for thermals and comparing against existing RGB approaches, i.e., with colored gloves [3] and markerless schemes [7], to fully exploit the thermovision-based control.
References
[1] A. Sharma et al., “Virtual Mouse Control Using Colored Fingertips and Hand Gesture Recognition,” IEEE Int. Conf. Human-Computer Interaction, 2020.
[2] K. Singh and M. Patel, “Real-time Virtual Mouse System Using RGB-D Images and Fingertip Detection,” IEEE Access, vol. 8, 2020.
[3] S. Rahman et al., “Swift Controller: A Computer Vision Based Mouse Controller,” Proc. 2021 Int. Conf. Intelligent Systems, 2021.
[4] T. Liu and J. Wong, “Cursor Control Using Face Gestures,” Journal of Assistive Technologies, vol. 15, no. 2, 2021.
[5] R. Choudhury and P. Gupta, “Real-Time Webcam-Based Eye Tracking for Gaze Estimation,” IEEE Trans. Human-Machine Systems, vol. 52, no. 1, 2022.
[6] J. Smith et al., “Addressing the Eye-Fixation Problem in Gaze Tracking Using the Vestibulo-ocular Reflex,” ACM Trans. Interactive Intelligent Systems, vol. 12, no. 4, 2022.
[7] A. Kapoor and M. Verma, “Hand Gesture-Based Virtual Mouse using OpenCV,” IEEE Conf. Vision and Signal Processing, 2023.
[8] D. Wang and L. Zhang, “Vision-Powered Cursor Maneuvering,” Computer Vision and Image Understanding, vol. 235, 2023.
[9] F. Ali et al., “Advancing Multimodal Fusion in HCI,” IEEE Trans. Neural Networks and Learning Systems, vol. 34, no. 5, 2023.
[10] N. Kumar and Y. Chen, “Privacy-Preserving Eye Movement Classification with EOG-EEG Glasses,” IEEE Sensors Journal, vol. 24, no. 3, 2024.
[11] M. He and Z. Lin, “Enhancing Cursor Control with Motor Imagery and Deep Neural Networks,” IEEE Trans. Biomedical Engineering, vol. 71, no. 1, 2024.
[12] L. Brown et al., “An Analysis on Virtual Mouse Control using Human Eye,” IET Computer Vision, vol. 18, no. 2, 2024.
[13] P. Roy and A. Das, “Eyeball-Based Cursor Control for Paralyzed Individuals using Eye Blink Detection,” IEEE Conf. Rehabilitation Robotics, 2025.
[14] Y. Suzuki et al., “ThinTact: Thin Vision-Based Tactile Sensor by Lensless Imaging,” IEEE Trans. Robotics, vol. 41, no. 2, 2025.