A Survey on Agentic AI-Assisted UAV Swarm Systems Using Multi-Agent Deep Reinforcement Learning

Authors: Kamalakshi Naganna , Manasa G, Pragati Sharma , Soumyashree R A, Tejashwini S N

DOI Link: https://doi.org/10.22214/ijraset.2026.82488

Abstract

Currently, the use of UAV swarm systems is widespread for large-scale monitoring and surveillance purposes, demanding effective coordination. While the Multi-Agent Deep Reinforcement Learning methods such as MADDPG leverage decentralization by providing each agent with the ability to act individually depending on the condition of its local environment, the learning-based approach is ineffective because of the redundant explorations and coordination that lack structure. In our review, we focus on recent advancements in the domain of UAV swarm systems, with an accent on inefficiencies inherent in MARL approaches, making it hard to achieve efficient coverage of the region of interest. The paper is devoted to hybrid approaches, which combine MARL approaches and spatial planning approaches, specifically Voronoi partitioning.

Introduction

Unmanned Aerial Vehicle (UAV) swarm systems have become an important technology for large-scale surveillance, monitoring, disaster management, environmental observation, and search operations. Unlike single UAV systems, swarm-based approaches use multiple autonomous agents that cooperate to achieve better coverage, reliability, and fault tolerance. However, traditional UAV coordination methods based on fixed paths, optimization, and heuristic strategies struggle in dynamic environments due to limited adaptability and scalability.

To overcome these limitations, Reinforcement Learning (RL) and Multi-Agent Reinforcement Learning (MARL) have been introduced, enabling UAVs to learn optimal actions through interaction with the environment. Among MARL algorithms, Multi-Agent Deep Deterministic Policy Gradient (MADDPG) is widely used because it supports continuous control and allows centralized training with decentralized execution. However, existing MADDPG-based systems often depend heavily on reward design, which can cause redundant exploration, inefficient coverage, and poor task allocation.

Recent research focuses on combining MARL with spatial planning techniques such as Voronoi partitioning, which divides operational areas among UAVs to reduce overlap and improve coverage efficiency. Additionally, emerging Agentic AI approaches introduce higher-level reasoning, planning, and adaptive decision-making capabilities for autonomous UAV coordination.

The background study explains that UAV swarms depend on three major characteristics: autonomy, cooperation, and scalability. While decentralized systems improve robustness, they face challenges related to communication limits, battery constraints, and environmental uncertainty. RL models treat UAV coordination as a Markov Decision Process, where agents learn policies by maximizing rewards. However, multi-agent environments introduce challenges such as non-stationarity and partial observability, which are addressed through frameworks like Centralized Training and Decentralized Execution (CTDE).

The literature review highlights various approaches:

MADDPG-based methods improve UAV coordination, tracking, and path planning but remain dependent on reward engineering.
Voronoi-based approaches improve task distribution and reduce redundant exploration.
Graph Neural Network (GNN) and attention-based MARL models enhance communication and cooperation but increase computational complexity.
Transformer-based and Agentic AI models provide better reasoning and adaptability but require significant computational resources.

The comparative analysis shows that most existing UAV swarm studies focus on applications such as surveillance, target tracking, collision avoidance, and communication optimization. However, many approaches lack structured spatial coordination, causing inefficient coverage and scalability problems.

The major research gaps identified are:

Lack of explicit spatial coordination among UAV agents.
High dependency on carefully designed reward functions.
Scalability issues in large UAV swarms.
Limited integration of planning techniques with learning-based methods.
High computational requirements of advanced MARL architectures.
Insufficient focus on coverage efficiency and redundancy reduction.
Limited practical adoption of Agentic AI frameworks.

Future research should focus on hybrid UAV swarm systems that combine spatial planning, MARL, and intelligent reasoning. Important directions include scalable coordination methods, dynamic task reallocation, lightweight MARL architectures, Agentic AI-based mission planning, and real-world deployment through sim-to-real transfer techniques.

Conclusion

This survey examined recent advancements in UAV swarm coordination systems using Multi-Agent Reinforcement Learning (MARL), with particular emphasis on coordination efficiency, scalability, adaptability, and intelligent decision-making. The reviewed studies demonstrate that algorithms such as MADDPG, MAPPO, TD3, graph-based MARL, and hierarchical reinforcement learning have significantly improved decentralized control and cooperative behaviour among UAV agents operating in dynamic environments. These approaches enable UAV swarms to perform complex tasks such as surveillance, target tracking, exploration, and collision avoidance with minimal human intervention. Despite these advancements, the survey identified several persistent limitations across existing research works. Most MARL-based approaches rely heavily on reward-driven learning to achieve cooperation among UAV agents. As a result, problems such as redundant exploration, inefficient area coverage, communication overhead, and high computational complexity continue to affect overall system performance. In addition, many existing frameworks lack explicit spatial coordination mechanisms, leading to overlapping exploration and suboptimal workload distribution among UAV agents. The analysis further highlights that integrating lightweight spatial planning techniques with learning-based coordination frameworks offers a promising solution to these challenges. Spatial decomposition approaches such as Voronoi partitioning enable efficient region allocation and reduce redundant exploration from the initial stages of deployment. When combined with reinforcement learning algorithms, such hybrid frameworks can improve both coordination efficiency and adaptability in dynamic operational environments. Furthermore, emerging research involving Agentic AI and intelligent planning frameworks indicates a growing shift toward higher-level autonomous reasoning in UAV swarm systems. By integrating structured planning, adaptive learning, and goal-driven coordination, future UAV swarm architectures can achieve more scalable, reliable, and efficient autonomous operations. In conclusion, future research should focus on unified hybrid frameworks that combine spatial planning, cooperative reinforcement learning, and adaptive reasoning mechanisms.

References

[1] Y. Zhao, H. Liu, and X. Chen, “Multi-Weight MADDPG for Cooperative UAV Swarm Coordination in Dynamic Environments,” IEEE Access, vol. 11, pp. 112345–112358, 2023. [2] J. Wang and L. Jiao, “CER-MADDPG: Coordinated Experience Replay for Multi-Agent UAV Systems,” Sensors, vol. 24, no. 3, pp. 1456–1472, 2024. [3] X. Kong, Z. Li, and Y. Sun, “Reinforcement Learning-Based UAV Target Assignment and Path Planning,” Drones, vol. 8, no. 2, pp. 88–104, 2024. [4] Q. Chen, Y. Huang, and M. Zhang, “Enhanced MADDPG-Based Task Assignment for Multi-UAV Systems,” Drones, vol. 7, no. 9, pp. 512–528, 2023. [5] R. Bista, S. Acharya, and P. Shrestha, “Comparative Analysis of MARL Algorithms for UAV Communication Systems,” IEEE Access, vol. 13, pp. 44521–44539, 2025. [6] H. Liu, T. Zhao, and Y. Wang, “AoI-Aware UAV Coverage Optimization Using MADDPG and Voronoi Partitioning,” IEEE Internet of Things Journal, vol. 12, no. 4, pp. 7852–7865, 2025. [7] K. Sun, J. Wu, and X. Li, “Energy-Efficient Multi-Agent Reinforcement Learning for UAV Swarm Surveillance,” Electronics, vol. 14, no. 2, pp. 276–291, 2025. [8] P. Lmaréchal, A. Durand, and S. Roy, “Voronoi-Based Pursuit Coordination Using MADDPG for Multi-UAV Tracking,” Robotics and Autonomous Systems, vol. 171, pp. 104532, 2023. [9] Y. Dong, H. Zhou, and F. Lin, “Voronoi Partition-Based UAV Exploration for Efficient Area Coverage,” Sensors, vol. 24, no. 8, pp. 3321–3338, 2024. [10] L. Zhao, M. Qian, and T. Xu, “Coverage Optimization in UAV Swarms Using MAPPO and Voronoi Partitioning,” IEEE Access, vol. 10, pp. 77812–77827, 2022. [11] A. Arranz, M. Ortega, and J. Perez, “Adaptive UAV Surveillance Systems Using Sensor-Driven Decision Making,” Journal of Intelligent & Robotic Systems, vol. 109, no. 5, pp. 1–17, 2023. [12] F. Westheider, T. Braun, and R. Keller, “Adaptive Path Planning for UAVs in Dynamic Environments,” Aerospace Science and Technology, vol. 136, pp. 108210, 2023. [13] G. Collignon, P. Martin, and S. Lefevre, “Wildfire Monitoring Using Multi-Agent Reinforcement Learning-Based UAV Swarms,” IEEE Transactions on Geoscience and Remote Sensing, vol. 63, pp. 1–14, 2025. [14] R. Patel, K. Sharma, and A. Verma, “TD3-Based Wildfire Tracking and Adaptive UAV Navigation,” IEEE Access, vol. 12, pp. 66455–66470, 2024. [15] N. Sharma, V. Gupta, and S. Rao, “Trajectory Optimization for Multi-UAV Systems Using Reinforcement Learning,” Electronics, vol. 13, no. 6, pp. 1145–1160, 2024. [16] J. Kim, D. Lee, and H. Park, “UAV Swarm-Based Wildfire Monitoring and Disaster Assessment,” IEEE Access, vol. 11, pp. 90441–90457, 2023. [17] Y. Wu, X. Deng, and P. Luo, “Reinforcement Learning-Based Obstacle Avoidance for UAV Navigation,” Robotics and Autonomous Systems, vol. 166, pp. 104418, 2023. [18] T. Huang, S. Li, and Y. Feng, “Collision Avoidance in UAV Swarms Using Multi-Agent Reinforcement Learning,” Sensors, vol. 25, no. 1, pp. 302–318, 2025. [19] R. Singh, A. Mishra, and K. Tiwari, “Formation Control in Multi-UAV Systems Using Reinforcement Learning,” Aerospace, vol. 11, no. 3, pp. 188–204, 2024. [20] J. Garcia, M. Torres, and E. Alvarez, “MARLander: Multi-Agent Reinforcement Learning for UAV Navigation and Landing,” IEEE Robotics and Automation Letters, vol. 9, no. 7, pp. 6221–6230, 2024. [21] Y. Fan, H. Chen, and Q. Zhao, “Graph-Based Multi-Agent Reinforcement Learning for UAV Swarm Coordination,” IEEE Transactions on Neural Networks and Learning Systems, vol. 36, no. 2, pp. 2140–2155, 2025. [22] X. Li, W. Yang, and H. Xu, “Transformer-Based Multi-Agent Reinforcement Learning for UAV Coordination,” IEEE Access, vol. 13, pp. 22018–22034, 2025. [23] Z. Zhang, Y. Liu, and F. Wang, “Graph Neural Network-Based Coordination for UAV Swarm Systems,” Neurocomputing, vol. 592, pp. 127–141, 2024. [24] M. Ahmed, S. Rahman, and T. Islam, “Graph Attention Reinforcement Learning for Cooperative UAV Systems,” Expert Systems with Applications, vol. 248, pp. 123456, 2024. [25] H. Park, J. Lee, and K. Choi, “Communication-Efficient GNN Framework for UAV Swarm Coordination,” Sensors, vol. 24, no. 11, pp. 4522–4538, 2024. [26] T. Nguyen, P. Tran, and M. Hoang, “Agentic AI for Autonomous UAV Swarm Coordination,” arXiv preprint arXiv:2601.04567, 2026. [27] S. Sapkota, A. Karki, and R. Bhandari, “Agentic AI-Driven UAV Systems: A Survey,” IEEE Access, vol. 13, pp. 91221–91245, 2025. [28] D. Rodriguez, P. Silva, and A. Moreno, “CoordField: Field-Based Task Allocation for UAV Swarm Coordination,” Robotics and Autonomous Systems, vol. 184, pp. 104812, 2025. [29] J. Miller, E. Brown, and K. Adams, “RALLY: Large Language Model-Based Navigation for Autonomous UAV Systems,” arXiv preprint arXiv:2508.03452, 2025. [30] B. Anderson, H. Cooper, and L. Evans, “Agentic UAV Deployment Models for Autonomous Swarm Operations,” IEEE Access, vol. 14, pp. 11542–11560, 2026. [31] A. Roy, S. Ghosh, and P. Dutta, “MAGNNET: Multi-Agent Graph Neural Networks for UAV Task Allocation,” Knowledge-Based Systems, vol. 312, pp. 112845, 2025. [32] R. Mehta, V. Jain, and A. Kulkarni, “Energy-Efficient Reinforcement Learning for UAV Swarm Systems,” Sensors, vol. 25, no. 4, pp. 1882–1899, 2025. [33] S. Verma, R. Gupta, and P. Sharma, “Hierarchical Reinforcement Learning for Multi-UAV Coordination,” IEEE Transactions on Intelligent Transportation Systems, vol. 26, no. 3, pp. 3201–3215, 2025. [34] E. Brown, J. Carter, and M. Davis, “Reinforcement Learning-Based Target Search in UAV Swarm Systems,” IEEE Robotics and Automation Letters, vol. 10, no. 1, pp. 514–521, 2025. [35] K. Lee, H. Kim, and J. Park, “AC-MASAC: Advanced Multi-Agent Soft Actor-Critic for UAV Swarm Coordination,” IEEE Access, vol. 14, pp. 22045–22063, 2026.

Copyright

Copyright © 2026 Kamalakshi Naganna , Manasa G, Pragati Sharma , Soumyashree R A, Tejashwini S N. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET82488

Publish Date : 2026-05-13

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here