This paper presents a low-cost vision-guided quadcopter capable of detecting, locking onto, and autonomously tracking a predefined target while transmitting its GPS coordinates in real time. The system integrates an ESP32-based custom flight controller with a Raspberry Pi 5 vision processing module to implement onboard face recognition and closed-loop visual servoing. Unlike conventional UAV platforms that rely on proprietary autopilot hardware and GPU-based processing systems, the proposed architecture introduces a fully embedded perception–action framework in which visual deviation directly influences flight control. Flight stabilization is achieved using cascaded PID control combined with complementary filter-based sensor fusion using an MPU6050 IMU. Face detection is performed using MediaPipe, and recognition is implemented using the LBPH algorithm. Experimental results demonstrate stable hover within ±3° angular deviation, recognition accuracy of 80–90% at 3–5 m distance, and control latency below 50 ms. The proposed system offers a cost effective and efficient
Introduction
This research proposes a lightweight, low-cost vision-guided quadcopter UAV capable of detecting, recognizing, tracking, and following a predefined human face autonomously, while also transmitting the target's GPS location to the user. Unlike many existing UAV systems that rely on expensive proprietary autopilots and high-performance processors, the proposed design integrates an ESP32 flight controller and a Raspberry Pi 5 vision module to create a cost-effective closed-loop perception and control system.
Background and Motivation
UAVs are increasingly used in surveillance, inspection, disaster response, and security applications because they can operate in hazardous and inaccessible environments. Recent advances in computer vision have enabled UAVs to perform object detection and autonomous tracking. However, most current systems separate flight control from vision processing and often require powerful GPUs, increasing cost, complexity, and power consumption.
Literature Review
Previous studies demonstrated:
PID-based controllers can effectively stabilize quadcopters but lack vision-based tracking capabilities.
CNN and deep learning-based tracking systems provide high accuracy but require expensive GPU hardware and high computational power.
ESP32 and Raspberry Pi platforms offer low-cost solutions for real-time control and vision processing.
LBPH (Local Binary Pattern Histogram) and MediaPipe provide efficient face recognition and detection suitable for embedded systems.
Existing GPS-enabled UAV tracking systems often rely on commercial autopilot boards rather than custom low-cost architectures.
The review identifies a research gap in integrating vision-based tracking and GPS geo-localization into a single lightweight embedded UAV platform.
System Architecture
The UAV consists of:
A lightweight quadcopter frame (~900 g total weight).
Four 2300 KV BLDC motors with tri-blade propellers.
ESP32 WROOM-32U for flight control.
MPU6050 IMU for attitude estimation.
Raspberry Pi 5 with a 5 MP camera for vision processing.
NEO-6M GPS module for location tracking.
SPI communication between the Raspberry Pi and ESP32.
Software Design
Flight stabilization is achieved using a cascaded PID controller running at approximately 150 Hz.
Sensor fusion is implemented using a complementary filter combining gyroscope and accelerometer data.
Face detection is performed using MediaPipe, while face recognition uses the LBPH algorithm.
Tracking errors are calculated from the difference between the detected face position and image center, then translated into flight control adjustments.
Methodology
The system operates in three stages:
Flight Control and Stabilization – Maintains stable flight using PID control and IMU-based sensor fusion.
Face Detection and Recognition – Detects and identifies a predefined target face in real time using MediaPipe and LBPH.
Autonomous Tracking and GPS Reporting – Once the target is recognized, the UAV automatically follows it, keeps it centered in the camera frame, estimates distance from face size changes, and sends GPS coordinates to the user.
Safety Features
The UAV supports:
Manual mode
Search mode
Target lock mode
Fail-safe mechanisms include:
Re-entering search mode when the target is lost.
Hovering during communication loss.
Controlled landing when battery levels become critical.
Mathematical Modeling
The system is modeled using:
Newton–Euler equations for quadcopter dynamics.
PID control equations for attitude stabilization.
Complementary filter equations for sensor fusion.
Visual tracking error models for face-centered navigation.
GPS localization equations for target geo-tagging.
Simulation and Integration
Simulation results show successful integration of:
ESP32 flight controller,
Raspberry Pi vision system,
GPS module,
IMU sensor,
Wireless joystick controller.
The UAV operates through a closed-loop perception-control framework, where real-time visual feedback continuously updates flight commands, enabling stable autonomous face tracking and GPS-based surveillance.
Conclusion
The proposed vision-directed quadcopter incorporates the computer vision system into its embedded flight controller so that it can track and detect faces autonomously. The system consists of a camera module that is linked to a Raspberry Pi that performs real-time face detection and recognition via LBPH algorithm and the ESP32 microcontroller ensures flight stability via sensor feedback of the MPU6050 and GPS module. A control mechanism which is based on a PID control allows the motors to be controlled by Electronic Speed Controllers (ESCs), so that the motors will provide a stable hover and responsive navigation. It has been experimentally demonstrated that the system can achieve a present frame rate of, on average, 25 FPS and a face recognition accuracy of, on average, 80 90 percent with a short range of detection of 3 to 5 meters. The proposed architecture offers an efficient and cheap solution to intelligent UAV surveillance and autonomous tracking applications.
References
[1] S. Bouabdallah, P. Murrieri, and R. Siegwart, “Design and control of an indoor micro quadrotor,” Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), 2004, pp. 4393–4398.
[2] Y. Chen, Z. Wang, and Y. Qiao, “Real-time object tracking for UAV using deep learning,” IEEE Access, vol. 8,
[3] pp. 124321–124332, 2020.
[4] I. Sa, Z. Ge, F. Dayoub, B. Upcroft, T. Perez, and C. McCool, “DeepFruits: A fruit detection system using deep neural networks,” Sensors, vol. 16, no. 8, pp. 1222, 2016.
[5] T. Ali and R. Hussain, “Integration of ESP32 with sensors for IoT-based real-time control applications,” IEEE Access, vol. 9, pp. 103421–103430, 2021.
[6] G. Bradski, “The OpenCV library,” Dr. Dobb’s Journal of Software Tools, 2000.
[7] T. Ahonen, A. Hadid, and M. Pietikäinen, “Face description with local binary patterns: Application to face recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28, no. 12, pp. 2037–2041, 2006.
[8] C. Lugaresi et al., “MediaPipe: A framework for building perception pipelines,” arXiv preprint arXiv:1906.08172, 2019.
[9] E. Kaplan and C. Hegarty, Understanding GPS: Principles and Applications, 2nd ed. Norwood, MA, USA: Artech House, 2005.
[10] R. Patel and D. Singh, “GPS-based localization and tracking in unmanned aerial vehicles,” International Journal of Advanced Robotic Systems, vol. 17, no. 4, 2020.
[11] M. R. Kamel, T. Stastny, K. Alexis, and R. Siegwart, “Vision-based navigation and control of unmanned aerial vehicles,” IEEE Robotics and Automation Letters, vol. 2, no. 3, pp. 1213–1220, 2017.
[12] R. Mahony, V. Kumar, and P. Corke, “Multirotor aerial vehicles: Modeling, estimation, and control of quadrotor,” IEEE Robotics & Automation Magazine, vol. 19, no. 3, pp. 20–32, Sept. 2012.
[13] T. Madani and A. Benallegue, “Backstepping control for a quadrotor helicopter,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems (IROS), 2006, pp. 3255–32