This paper presents the implementation of a semi-open world 3D role-playing game (RPG) featuring an adaptive non- player character (NPC) driven by reinforcement learning. The system is developed using the Unity game engine integrated with the Unity ML-Agents toolkit, enabling the antagonist to learn and evolve its behavior based on player interactions across multiple gameplay episodes. The game environment is structured into three interconnected biomes—a mansion, a forest, and a tunnel—each introducing progressively complex objectives and challenges. The NPC is designed to detect and pursue the player based on movement patterns, while also responding dynamically to environmental stimuli such as a flashlight-based defense mechanism that temporarily immobilizes it. Through iterative training, the NPC improves its decision-making and pursuit strategies, resulting in more intelligent and unpredictable gameplay. The implementation demonstrates the effectiveness of reinforcement learning in creating adaptive, context-aware, and behaviorally evolving game agents, contributing to more immersive and dynamic player experiences.
Introduction
The text describes the development of a 3D semi-open world survival RPG game that uses reinforcement learning (RL) to create an adaptive, intelligent non-player character (NPC) antagonist.
Traditional game AI systems rely on scripted rules and finite state machines, which make NPC behavior predictable and repetitive. To overcome this, the proposed system uses machine learning—specifically reinforcement learning—implemented through Unity ML-Agents to enable an NPC that learns and improves through interaction with the player.
The game is built in the Unity engine and includes three environments: a mansion, forest, and tunnel. The antagonist (a demon NPC) learns to detect, chase, and respond to the player’s movements. A key gameplay mechanic is a flashlight that can temporarily disable the NPC, adding strategy to the game.
The system architecture includes:
Player interaction module (movement and controls)
Environment module (multi-biome game world)
AI observation module (tracks player behavior and environment state)
Reinforcement learning agent (NPC trained using rewards)
Navigation system (NavMesh) for movement and pathfinding
The NPC learns using a reward-based system, where successful chasing is rewarded and inefficient behavior is penalized. Over time, this allows the antagonist to develop smarter and less predictable strategies, improving gameplay realism and difficulty.
The research highlights key related work in:
Traditional game AI (rule-based systems with limited adaptability)
Reinforcement learning methods like DQN and PPO
Tools like Unity ML-Agents
Procedural content generation for improving replayability
The main research gap identified is that existing systems rarely combine reinforcement learning with real-time 3D gameplay in a unified, efficient way.
Conclusion
This paper presented the design and implementation of a semi-open world 3D role-playing game featuring an adaptive non- player character (NPC) driven by reinforcement learning. By integrating the Unity game engine with the Unity ML-Agents toolkit, the proposed system enables the antagonist to learn and evolve its behavior based on player interactions across multiple gameplay episodes.
The results demonstrate that reinforcement learning significantly enhances NPC intelligence, allowing for dynamic and context-aware behavior. The agent exhibited improved pursuit strategies, efficient navigation, and adaptability to changing player actions. The incorporation of gameplay mechanics such as procedural key generation and flashlight-based interaction further contributed to creating a challenging and engaging environment.
Compared to traditional scripted AI systems, the proposed approach offers greater flexibility and unpredictability, lead- ing to a more immersive player experience. The successful implementation highlights the practical feasibility of applying reinforcement learning techniques in real-time game environments.
However, the system also presents certain limitations, including the computational cost associated with training and the sensitivity of performance to reward design and hyperparameter tuning. These factors indicate the need for further optimization and refinement.
Future work can focus on extending the system to multi-agent environments, improving the reward structure for more complex behaviors, and integrating advanced AI models such as vision-based perception or hybrid reinforcement learning approaches. Additionally, optimizing the system for deployment on resource-constrained platforms could further enhance its applicability.
Overall, this work demonstrates the potential of reinforcement learning in transforming game AI and provides a strong foundation for the development of intelligent, adaptive, and interactive NPCs in modern gaming systems.
References
[1] Millington and J. Funge, Artificial Intelligence for Games, 2nd ed. Boca Raton, FL, USA: CRC Press, 2009.
[2] G. N. Yannakakis and J. Togelius, Artificial Intelligence and Games. Cham, Switzerland: Springer, 2018.
[3] R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, 2nd ed. Cambridge, MA, USA: MIT Press, 2018.
[4] V. Mnih et al., “Human-level control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015.
[5] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal Policy Optimization Algorithms,” arXiv preprint arXiv:1707.06347, 2017.
[6] A. Juliani et al., “Unity ML-Agents Toolkit,” 2018. [Online]. Available: https://github.com/Unity-Technologies/ml- agents
[7] D. Silver et al., “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587,
[8] pp. 484–489, 2016.
[9] O. Vinyals et al., “Grandmaster level in StarCraft II using multi-agent reinforcement learning,” Nature, vol. 575, pp. 350–354, 2019.
[10] K. Cobbe et al., “Leveraging Procedural Generation to Benchmark Reinforcement Learning,” ICML, 2020.
[11] C. Zhang et al., “A Study on Overfitting in Deep Reinforcement Learning,” arXiv preprint arXiv:1804.06893, 2018.
[12] N. Shaker, J. Togelius, and M. J. Nelson, Procedural Content Generation in Games. Cham, Switzerland: Springer, 2016.
[13] J. Togelius, G. N. Yannakakis, K. O. Stanley, and C. Browne, “Search-based procedural content generation: A taxon- omy and survey,” IEEE Trans. Comput. Intell. AI Games, vol. 3, no. 3, pp. 172–186, 2011.
[14] A. Summerville et al., “Procedural Content Generation via Machine Learning (PCGML),” IEEE Trans. Games, vol. 10, no. 3, pp. 257–270, 2018.
[15] OpenAI, “Learning dexterous in-hand manipulation,” arXiv:1808.00177, 2018.
[16] M. G. Bellemare et al., “The Arcade Learning Environment: An evaluation platform for general agents,” J. Artif. Intell. Res., vol. 47, pp. 253–279, 2013.
[17] L. Han et al., “Comparison of Q-Learning and PPO for continuous and discrete environments,” 2023.
[18] D. Duncan, “Using reinforcement learning to train in-game non-player characters,” 2024.