This paper proposes a reinforcement learning (RL) framework for energy-optimal Unmanned Aerial Vehicle (UAV) trajectory planning. Unlike classical PID or graph-based planners, the proposed design explicitly incorporates physics-informed energy models into the reward structure. We formulate the trajectory generation problem as a Markov Decision Process (MDP) to minimize propulsion power consumption while maintaining flight stability. A theoretical comparative analysis suggests that this data-driven approach can overcome the limitations of static path planning by adapting to environmental disturbances such as wind. This framework provides a foundation for future empirical validation of energy-efficient autonomous flight.
Introduction
Unmanned Aerial Vehicles (UAVs) are increasingly used in delivery, surveillance, and inspection tasks, where energy efficiency directly affects mission range and duration. Traditional trajectory planning methods—such as PID control, A*-based path planning, and Model Predictive Control (MPC)—often optimize distance or time rather than actual energy consumption and can struggle under wind disturbances, nonlinear dynamics, or uncertain environments.
Reinforcement Learning (RL) provides a promising alternative by learning control policies through interaction with the environment, allowing UAVs to adapt to dynamic conditions without requiring exact analytical models. However, prior RL-based UAV studies mostly focus on mission completion or communication objectives, often overlooking energy usage.
This paper presents a physics-informed RL framework for energy-optimal UAV trajectory planning. The system integrates a simplified kinematic model, aerodynamic drag, and propulsion power calculations to directly link UAV trajectories, velocity profiles, and accelerations to energy consumption. Key elements include:
UAV Kinematic Model: Point-mass representation in 3D space with continuous acceleration and heading controls.
Aerodynamic and Propulsion Model: Drag force and propulsion power depend on velocity, allowing accurate energy estimation.
Energy Model: Total mission energy is computed by integrating propulsion power over time.
Problem Formulation: Energy-optimal trajectory planning is posed as a sequential decision-making problem under uncertainty, considering kinematic limits, mission constraints, and environmental disturbances.
MDP Formulation: The trajectory problem is modeled as a Markov Decision Process with state variables including position, velocity, and implicit energy, actions as continuous control commands, and a reward function encoding energy efficiency and flight smoothness.
The framework uses RL to optimize UAV trajectories for minimal energy consumption while maintaining flight feasibility and robustness to disturbances. This approach is positioned as an improvement over classical methods, enabling adaptive, energy-efficient control in dynamic and uncertain flight conditions.
Conclusion
This paper presents a reference-grounded framework for energy-optimal UAV trajectory planning using reinforcement learning. By formulating trajectory optimization as a continuous-state, continuous-action stochastic optimal control problem and solving it using policy-gradient reinforcement learning, the approach explicitly accounts for propulsion energy, flight smoothness, and environmental disturbances.
Through synthesis of prior studies and comparative analysis, the paper highlights how energy-aware reinforcement learning can address key limitations of classical UAV control and planning methods. In particular, reinforcement learning frameworks that directly incorporate energy consumption into the reward function are shown in the literature to produce smoother, more efficient trajectories and improved robustness to wind disturbances compared to controllers that optimize surrogate objectives.
While the framework is presented and analyzed within a simulation-based context, it provides a structured foundation for future empirical validation and real-world deployment. The proposed formulation and evaluation framework offers a clear pathway for integrating energy-aware learning into UAV trajectory planning, with potential applications in aerial delivery, surveillance, and long-endurance missions where energy efficiency is critical.
Overall, this work contributes a cohesive and principled perspective on energy-aware UAV trajectory optimization, bridging insights from optimal control and reinforcement learning while emphasizing practical considerations for robust and efficient autonomous flight.
References
[1] Aggarwal, S., & Kumar, N. (2020). Path planning techniques for unmanned aerial vehicles: A review, solutions, and challenges. Computer Communications, 149, 270–299.
[2] Tyrovolas, D., Mitsiou, N. A., Karagiannidis, G. K., et al. (2024). Energy-aware trajectory optimization for UAV-mounted RIS and full-duplex relay. IEEE Internet of Things Journal, 11(13), 24259–24272.
[3] de Carvalho, K. B., Batista, H. O. B., Fagundes-Junior, L. A., de Oliveira, I. R. L., & Brandão, A. S. (2025). Q-learning global path planning for UAV navigation with pondered priorities. Intelligent Systems with Applications, 25, 200485.
[4] Lee, W., Jeon, Y., Kim, T., & Kim, Y.-I. (2021). Deep reinforcement learning for UAV trajectory design considering mobile ground users. Sensors, 21(24), 8239.
[5] Seerangan, K., Nandagopal, M., Govindaraju, T., Manogaran, N., Balusamy, B., & Selvarajan, S. (2024). A novel energy-efficiency framework for UAV-assisted networks using adaptive deep reinforcement learning. Scientific Reports, 14, 22188.
[6] Rocha, L. G. S., Caldas, K. A. Q., Terra, M. H., Ramos, F., & Vivaldini, K. C. T. (2025). Dynamic Q-planning for online UAV path planning in unknown and complex environments. International Journal of Intelligent Robotics and Applications, 9, 1654–1674.
[7] Zeng, Y., Xu, J., & Zhang, R. (2019). Energy minimization for wireless communication with rotary-wing UAV. IEEE Transactions on Wireless Communications, 18(4).
[8] Wu, Q., Zeng, Y., & Zhang, R. (2018). Joint trajectory and communication design for multi-UAV enabled wireless networks. IEEE Transactions on Wireless Communications, 17(3).
[9] Nguyen, K. K., Nguyen, H. D., Le, L. B., & Tran, N. H. (2022). 3D UAV trajectory and data collection optimisation via deep reinforcement learning. IEEE Transactions on Communications, 70(2).
[10] Richards, A., & How, J. P. (2002). Aircraft trajectory planning with collision avoidance using model predictive control. AIAA Journal, 40(10).
[11] Gao, N., Zhang, S., Yang, L., & Li, J. (2021). Energy model for UAV communications: Experimental validation and model generalization. China Communications, 18(7).
[12] Sun, Y., Xu, D., & Zhang, L. (2021). Deep reinforcement learning for UAV trajectory design with energy constraints. IEEE Internet of Things Journal, 8(20).
[13] Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. IEEE Control Systems Magazine, 38(6).
[14] Yin, S., Zhao, M., & Zhang, Y. (2019). Intelligent trajectory design in UAV-aided communications with reinforcement learning. IEEE Transactions on Vehicular Technology, 68(9).
[15] Nagabandi, A., Kahn, G., Fearing, R. S., & Levine, S. (2018). Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. ICRA.
[16] Schulman, J., et al. (2017). Proximal Policy Optimization Algorithms. NeurIPS.
[17] Haarnoja, T., et al. (2018). Soft Actor-Critic: Off-policy maximum entropy deep reinforcement learning. ICML.
[18] Chen, H., et al. (2023). Deep reinforcement learning for UAV tracking control under wind disturbances. IEEE Transactions on Instrumentation and Measurement.
[19] Khanzada, H. R., Maqsood, A., & Basit, A. (2025). Reinforcement learning for UAV flight controls: Evaluating continuous-space reinforcement learning algorithms for fixed-wing UAVs. PLOS ONE, 20(10).
[20] Kiss, B., Ballagi, Á., & Kuczmann, M. (2025). Investigation of energy-efficient UAV control: Analysis of PID and MPC performance. Engineering Proceedings, 113(1).
[21] Lian, F., et al. (2025). Energy-aware path planning for UAVs in dynamic wind environments. Drones, 9(12).
[22] Wang, J., et al. (2024). Deep reinforcement learning-based wind disturbance rejection control strategy for UAV. Drones, 8(11).
[23] Chan, J. H., et al. (2024). Reinforcement learning-based drone simulators: Survey, practice, and challenges. Artificial Intelligence Review.
[24] Xu, D., & Chen, G. (2022). Autonomous and cooperative control of UAV clusters using reinforcement learning. The Aeronautical Journal.
[25] De Alba, A., et al. (2025). Optimizing UAV task allocation with enhanced battery efficiency using semi-Markov decision processes. Journal of Intelligent & Robotic Systems, 111, Art. no. 86.
[26] Khan, M. F. H. A., et al. (2023). Performance evaluation of reinforcement learning algorithms for UAV flight control under wind disturbances. PLOS ONE, 18(9), e0334219.
[27] Niaraki, J., Roghair, J., & Jannesari, A. (2020). Energy-aware goal selection and path planning of UAV systems via reinforcement learning. arXiv:2003.05461.