This paper presents the development of a voice-activated human-following robot that integrates computer vision and sensor-based obstacle avoidance to enable autonomous navigation in dynamic environments. The system employs a Raspberry Pi 4 as the primary processing unit and a Pi Camera to perform real-time human detection using a lightweight YOLOv5 model. Voice commands are captured through a microphone and processed using speech recognition to activate or control the robot. Ultrasonic sensors are used to detect and avoid obstacles, enhancing safety and reliability. The robot successfully combines vision, voice, and proximity sensing into a low-cost, flexible platform suitable for applications such as personal assistance, industrial tool handling, and smart surveillance. Experimental results demonstrate effective human tracking, responsive voice control, and reliable navigation in indoor settings.
Introduction
Overview:
The integration of robotics and AI has enabled the development of intelligent human-following robots with applications in healthcare, logistics, smart homes, and more. This project presents a low-cost, voice-controlled human-following robot using YOLO-based computer vision, ultrasonic obstacle detection, and speech recognition, all built on a Raspberry Pi 4 platform.
Key Features:
Visual Tracking: Uses a Pi Camera and YOLOv5 Nano model for real-time human detection and tracking.
Voice Control: Allows users to issue hands-free commands (e.g., "start," "stop") using speech recognition.
Obstacle Avoidance: Employs ultrasonic sensors to detect obstacles and prevent collisions.
Hardware: 4-wheel robot chassis, Raspberry Pi, L298N motor driver, ultrasonic sensors, and 12V battery.
Captures live video for YOLO-based human detection.
Tracks person using centroid tracking to adjust movement.
Responds to voice commands for activation and direction.
Avoids obstacles with front and side-mounted ultrasonic sensors.
Raspberry Pi centrally manages input/output, processing, and control.
Evaluation & Results:
Human Detection Accuracy: ~91% in bright and ~84% in low-light conditions.
Voice Recognition Accuracy: ~92.5% in quiet and ~84.7% in noisy environments.
Obstacle Avoidance Success Rate: 95%, with a fast reaction time (~0.4s).
Runtime: ~45 minutes per charge; stable across 25+ test runs.
Innovation & Impact:
Seamlessly integrates speech recognition, deep learning, and sensor fusion.
Outperforms traditional IR-based systems in detection, interactivity, and safety.
Suitable for indoor human-robot interaction in homes, hospitals, labs, and warehouses.
Conclusion
This research presents the design and implementation of a voice-activated human-following robot that integrates computer vision, speech recognition, and ultrasonic sensing for autonomous navigation and human interaction. The system leverages a Raspberry Pi as the processing core, employing a lightweight YOLOv5 model for real-time human detection and tracking, alongside voice commands for hands-free control.
Experimental results demonstrated that the robot is capable of accurately following a human target within a defined range while avoiding obstacles with high reliability. Voice control performed well under normal conditions, offering an intuitive interface for activation and navigation. The integration of ultrasonic sensors added a safety layer, enabling the robot to operate effectively in indoor, dynamic environments.
The proposed system offers a cost-effective, modular, and flexible solution suitable for applications in personal assistance, smart homes, education, and warehouse logistics. Future improvements may include incorporating edge AI accelerators (e.g., Google Coral, NVIDIA Jetson), enabling outdoor capability using GPS modules, and expanding the gesture and voice command set for more advanced interaction.
References
[1] V. J. Lohar et al., “Human following robot for tool handling purpose,” IRJMETS, vol. 6, no. 5, pp. 8928–8930, May 2024.
[2] Y. Li et al., “AttMOT: Improving multiple-object tracking by introducing auxiliary pedestrian attributes,” arXiv preprint arXiv:2308.07537, 2023.
[3] H. Liu et al., “People detection and tracking using RGB-D cameras for mobile robots,” International Journal of Advanced Robotic Systems, vol. 13, pp. 1–11, 2016.
[4] B. X. Chen et al., “Integrating stereo vision with a CNN tracker for a person-following robot,” in ICVS 2017, Int. Conf. Computer Vision Systems, vol. 10528, pp. 300–313, Springer, 2017.
[5] Working of Kinect Sensor, available [online]: http://pages.cs.wisc.edu/ ahmad/Kinect.pdf , Accessed on Oct 07, 2017.
[6] J. An, X. Cheng, Q. Wang, H. Chen, J. Li, and S. Li, “Human action recognition based on Kinect,” Journal of Physics: Conference Series, vol. 1693, p. 012190, IOP Publishing, 2020.
[7] Burke, M & Brink, W(2010) Estimating Target Orientation With A Single Camera For use in a Human Following Robot, South Africa.
[8] Bajracharya M, Moghaddam B, Howard A, et al. A fast stereo-based system for detecting and tracking pedestrians from a moving vehicle. Int J Rob Res 2009; 28: 1466–1485.