The paper addresses the challenge of efficient resource allocation in 5G wireless networks amid growing user demands and diverse application needs. Traditional static or rule-based methods struggle to adapt to dynamic network conditions and varying Quality of Service (QoS) requirements. To overcome this, the paper explores the use of reinforcement learning (RL) algorithms—Deep Q-Network (DQN), Proximal Policy Optimization (PPO), and Mean Field Q-learning (MFQ)—for intelligent, adaptive management of network resources.
A custom simulation environment is created using a multi-dimensional dataset reflecting real-world network scenarios across different times, device types, and usage domains (e.g., healthcare, public safety). This environment is compatible with OpenAI Gym, enabling evaluation of RL agents in dynamic conditions. The trained models are deployed in a Django backend with a React frontend, allowing users to input real parameters and receive resource allocation decisions.
The study includes a comparative analysis of the three RL algorithms based on performance metrics like cumulative rewards, convergence speed, and decision quality. It highlights the advantages of RL approaches in managing complex 5G network slices, which support heterogeneous services such as enhanced Mobile Broadband (eMBB), ultra-reliable low latency communications (URLLC), and massive Machine-Type Communications (mMTC).
The paper also details the implementation of the system, which utilizes Python libraries (Pandas, NumPy, Matplotlib, PyTorch) and a custom RL environment simulating network states and actions. This environment enables RL agents to learn effective allocation policies by receiving rewards based on reducing packet delay and loss.
Conclusion
In the network allocation environment, Mean Field Q-learning (MFQ) and Proximal Policy Optimization (PPO) demonstrated significantly better cumulative rewards compared to Deep Q-Network (DQN). MFQ achieved the highest performance due to its ability to model average agent behavior, resulting in more stable learning and effective decision-making under variable network conditions. PPO closely followed, leveraging its policy-gradient approach to adapt smoothly to changing states like delay and loss. DQN lagged behind, likely due to its discrete action-value estimation and limitations in handling continuous or noisy reward signals, leading to less optimal performance overall.
References
[1] Faheem, M. T., Saidahmed, Ibrahim, M. Z., & Elshennawy, N. M. (n.d.). Efficient resource allocation of latency-aware slices for 5G networks.
[2] Han Bin, & Hans D. Schotten. (n.d.). Machine learning for network slicing resource management: A comprehensive survey.
[3] Moreira, R., Rodrigues, L. F., Moreira, & Carvalho, T. C. (n.d.). Resource allocation influence on application performance in sliced test.
[4] Chien-Nguyen Nhu, & Minho Park. Dynamic network slice scaling assisted by attention-based prediction in 5G core network.
[5] Amr Abo-Elenen, Alaa Awad Abdellatif, Aiman Erbad, & Amr Mohamed. A deep reinforcement learning-powered framework for accurate predictive network slicing allocation.
[6] Qiang Liu, Nakjung Choi, & Tao Han. Deep reinforcement learning for end-to-end network slicing: Challenges and solutions.
[7] Jaehoon Koo, Veena B. Mendiratta, Muntasir Raihan Rahman, & Anwar Walid. Deep reinforcement learning for network slicing with heterogeneous resource requirements and time-varying traffic dynamics.
[8] Tu Nguyen, et al. Efficient embedding VNFs in 5G network slicing: A deep reinforcement learning approach.
[9] [9] Yi Shi, Yalin E. Sagduyu, & Tugba Erpek. Reinforcement learning for dynamic resource optimization in 5G radio access network slicing.