This paper presents the design and implementation of a low cost gesture controlled robot using an ESP32 CAM based mobile platform with live video streaming, ultrasonic obstacle detection and wireless control via the ESP NOW protocol. The proposed System comprises an ESP8266/ESP12E based wearable transmitter unit equipped with an MPU 6050 inertial measurement unit (IMU) and an ESP32 CAM based receiver unit integrated with a differential drive robot chassis, LN98N motor driver, HC SR04 ultrasonic sensor and buzzer.
Gesture information acquired from the IMU is translated into motion commands and transmitted using ESP NOW to the ESP32 CAM, which simultaneously drives the motor, performs obstacle detection and streams real time video over a Wi Fi access point. The ESP NOW protocol is selected for its low latency, connectionless communication characteristics, which are advantageous for responsive robot tele operation compares to conventional TCP /IP or to end command latency below 40ms within a 10m line of sight indoor environment, while maintaining QVGA video streaming at 20 -24 frames per second. The integrated ultrasonic module detects frontal obstacles within 20cm and triggers an audible alert, enhancing operational safety.
Introduction
The document describes the design and implementation of a Gesture-Controlled Telepresence Robot using ESP32-CAM and ESP8266/ESP12E modules. The system enables real-time robot control through hand gestures, along with live video streaming and obstacle detection. It is designed to be low-cost, energy-efficient, and suitable for educational and remote applications.
???? Key Features:
Gesture-based control using an MPU6050 (accelerometer + gyroscope)
Real-time video streaming using ESP32-CAM with OV2640 camera
Wireless communication using ESP-NOW, enabling low-latency peer-to-peer control
Obstacle detection using HC-SR04 ultrasonic sensor
Buzzer alert system for safety
Bidirectional communication (control + video feedback)
???? System Architecture:
The system consists of two main parts:
Wearable Gesture Transmitter
Built using ESP8266/ESP12E and MPU6050
Reads hand tilt movements
Sends motion commands via ESP-NOW
Telepresence Robot
Built around ESP32-CAM
Controls motors using L298N motor driver
Streams live video through an HTTP server
Detects obstacles and triggers alerts
???? Working Principle:
Hand tilt is detected by the MPU6050.
Movement data is normalized and converted into motion commands.
ESP-NOW transmits commands with low latency (2–10 ms).
ESP32-CAM processes commands, controls motors, streams video, and monitors obstacles.
If an obstacle is detected within a threshold distance, a buzzer warning is activated.
???? Control Method:
A threshold-based state machine is used for gesture mapping:
Forward
Backward
Left
Right
Stop
Ultrasonic sensor readings are filtered using a moving average technique to reduce noise.
???? Advantages:
Low-cost and compact design
Real-time gesture control
Low-latency communication (ESP-NOW)
Integrated video streaming
Obstacle detection for safety
Suitable for telepresence, remote monitoring, and education
???? Result:
The system successfully integrates gesture sensing, wireless communication, motor control, video streaming, and obstacle detection on a resource-constrained microcontroller platform, demonstrating smooth real-time operation and live monitoring capabilities.
Conclusion
This paper has presented the design, implementation and evaluation of a low cost gesture control telepresence robot biased on an ESP 32 CAM receiver and an ESP8266 /12E transmitter. The system integrates IMU based gestures sensing, ESP NOW low latency wireless communication, real-time video streaming, ultrasonic obstacle detection and audible alerts within a compact robotic platform. Experimental results demonstrate that the proposed architecture can achieve responsive control with low command latency and high packet delivery ration while maintaining acceptable video streaming performance in indoor environment.
The modular nature of the design makes it suitable for educational laboratories, project based learning and hobbyist robotics, while the combination of telepresence and obstacle detection opens avenues for applications in remote inspection, simple surveillance and human- robot interaction scenarios future work will investigate adaptive gesture recognition using embedded machine learning models, multi robot control using a single transmitter, integration of additional sensors such as infrared or time of flight modules, and partial of loading of video analytics to edge or cloud platforms.
References
[1] Espressif Systems, “ESP-NOW User Guide for ESP8266 and ESP32,” Espressif Documentation, 2023.
[2] Espressif Systems, “ESP32-CAM Technical Reference Manual,” Espressif Systems, 2022.
[3] Random Nerd Tutorials, “ESP32-CAM AI Thinker Pinout Guide: GPIOs Usage Explained,” 2025.
[4] Random Nerd Tutorials, “Getting Started with ESP-NOW Using ESP8266 NodeMCU,” 2024.
[5] InvenSense, “MPU-6050 Six-Axis Motion Tracking Device Datasheet,” TDK-InvenSense, 2021.
[6] SparkFun Electronics, “HC-SR04 Ultrasonic Distance Sensor Datasheet and Application Guide,” 2022.
[7] R. Kumar and S. Patel, “Wi-Fi Controlled Robot Using ESP32-CAM with Live Video Streaming,” International Research Journal of Engineering and Technology, vol. 11, no. 6, pp. 2145-2149, 2024.
[8] A. Sharma and P. Gupta, “Smart Autonomous Robot with Obstacle Detection,” International Research Journal of Modernization in Engineering Technology and Science, vol. 12, no. 3, pp. 1123-1128, 2025.
[9] J. Park and H. Kim, “Gesture-Based Robot Control Using Inertial Sensors,” IEEE Sensors Journal, vol. 20, no. 14, pp. 7890-7897, 2020.