Modern power grids with the increasing volatility of electricity markets and the growing deployment of battery energy storage systems (BESS) require intelligent energy management strategies to maximize economic benefits while preserving system performance. A Reinforcement Learning (RL) based energy management framework utilizing the Q-learning is presented to optimize BESS operation for peak demand reduction. By formulating the problem as a Markov Decision Process, the Q-learning agent learns an optimal charging and discharging policy through interaction with fluctuating market prices and operational constraints. The proposed approach allows the Battery Energy Storage System (BESS) to carry out energy arbitrage to maximize economic benefits and maintaining demand-supply balance. Simulation results demonstrate that the Q-learning framework achieves superior profitability and battery energy efficiency compared to conventional rule-based optimization method. The results highlight the potential of reinforcement learning techniques, particularly Q-learning, in enabling adaptive and autonomous energy management in complex and uncertain environments.
Introduction
The increasing use of variable renewable energy sources like solar and wind, combined with unpredictable electricity demand and dynamic pricing, makes power grid management complex and challenging. These fluctuations cause grid instabilities, frequency deviations, and financial losses. Traditional battery energy management systems (BEMS) based on fixed, rule-based controls are inadequate for such dynamic conditions.
To address this, reinforcement learning (RL), especially Q-learning, offers a promising adaptive control approach. Unlike supervised learning, RL learns optimal strategies through interaction with the environment and feedback via rewards or penalties. This research develops a Q-learning-based method to optimize battery charging and discharging schedules, factoring in renewable energy variability and real-time price fluctuations. The goal is to enhance energy utilization and economic returns compared to static control methods.
The paper reviews related works on BEMS using Q-learning and highlights their limitations, such as slow convergence or limited adaptability to volatile markets. The proposed method improves upon these by using hybrid Q-learning with adaptive exploration strategies, prioritized experience replay, and dynamic reward shaping, enabling faster learning and better performance.
The methodology models the battery system as an RL agent operating in a Markov Decision Process, with states including time, price, battery charge level, and load demand. The agent chooses actions (charge, discharge, idle) to maximize cumulative profit over a 24-hour cycle. Simulations show this approach outperforms conventional rule-based strategies in profit and efficiency.
Conclusion
The proposed Reinforcement Learning (RL)-based battery energy management system demonstrated significant improvements over the conventional heuristic baseline. Through the Q-learning approach, the RL agent effectively learned optimal charge and discharge strategies by responding dynamically to time-varying electricity prices, load demands, and the battery’s state of charge. The system achieved a 6.09% higher monetary profit, indicating superior economic performance. Additionally, the RL policy maintained a high absolute battery energy efficiency difference of 26.6% showing effective utilization of the battery with minimal losses. The battery energy efficiency may vary considering other constraints. The enhanced energy throughput further highlights the system’s ability to capitalize on market fluctuations more proactively than the baseline strategy. Overall, the results confirm that the RL-based approach is a promising solution for intelligent, adaptive battery energy management.
Future work can incorporate focussing on extension of the RL framework like considering temperature-effects, degradation of battery, and even more composite demand response problems to take into account real-world scenarios. Implementation of advanced RL algorithms has potential of improving learning efficiency as well as scalability for wider systems. However, integration of renewable energy sources like wind and solar with BESS would bring highly effective, sustainable and economic energy solutions. The implementation and the testing of these RL based systems in real world scenarios, as in physical micro grids under different operating conditions would justify its practicality. Ultimately, exploration of multiple-agent reinforcement learning would lead to optimization of energy management in interconnected decentralized energy resources.
References
[1] “A review of hybrid renewable energy systems: Solar and wind-powered solutions: Challenges, opportunities, and policy implications | Request PDF,” ResearchGate, Dec. 2024, doi: 10.1016/j.rineng.2023.101621.
[2] M. S. Bakare, A. Abdulkarim, M. Zeeshan, and A. N. Shuaibu, “A comprehensive overview on demand side energy management towards smart grids: challenges, solutions, and future direction,” Energy Informatics, vol. 6, no. 1, p. 4, Mar. 2023, doi: 10.1186/s42162-023-00262-7.
[3] “Battery energy storage control using a reinforcement learning approach with cyclic time-dependent Markov process | Request PDF,” ResearchGate, doi: 10.1016/j.ijepes.2021.107368.
[4] Y. Tavakol-Moghaddam, M. Boroushaki, and M. Astaneh, “Reinforcement learning for battery energy management: A new balancing approach for Li-ion battery packs,” Results in Engineering, vol. 23, p. 102532, Sep. 2024, doi: 10.1016/j.rineng.2024.102532.
[5] B. Jang, M. Kim, G. Harerimana, and J. W. Kim, “Q-Learning Algorithms: A Comprehensive Classification and Applications,” IEEE Access, vol. 7, pp. 133653–133667, 2019, doi: 10.1109/ACCESS.2019.2941229.
[6] Y. Liu, Q. Lu, Z. Yu, Y. Chen, and Y. Yang, “Reinforcement Learning-Enhanced Adaptive Scheduling of Battery Energy Storage Systems in Energy Markets,” Energies, vol. 17, no. 21, Art. no. 21, Jan. 2024, doi: 10.3390/en17215425.
[7] E. O. Arwa and K. A. Folly, “Improved Q-learning for Energy Management in a Grid-tied PV Microgrid,” SAIEE Africa Research Journal, vol. 112, no. 2, pp. 77–88, Jun. 2021, doi: 10.23919/SAIEE.2021.9432896.
[8] “Battery Energy Management in a Microgrid Using Batch Reinforcement Learning.” Accessed: Jul. 12, 2025. [Online]. Available: https://www.mdpi.com/1996-1073/10/11/1846
[9] J. Li, C. Wang, and H. Wang, “Deep reinforcement learning for wind and energy storage coordination in wholesale energy and ancillary service markets,” Energy and AI, vol. 14, p. 100280, Oct. 2023, doi: 10.1016/j.egyai.2023.100280.
[10] S. Kim and H. Lim, “Reinforcement Learning Based Energy Management Algorithm for Smart Energy Buildings,” Energies, vol. 11, no. 8, Art. no. 8, Aug. 2018, doi: 10.3390/en11082010.
[11] Y. Wang et al., “A comprehensive review of battery modeling and state estimation approaches for advanced battery management systems,” Renewable and Sustainable Energy Reviews, vol. 131, no. C, 2020, Accessed: May 24, 2025. [Online]. Available:
https://ideas.repec.org//a/eee/rensus/v131y2020ics1364032120303063.html
[12] Y. Yang, J. He, C. Chen, and J. Wei, “Balancing Awareness Fast Charging Control for Lithium-Ion Battery Pack Using Deep Reinforcement Learning,” IEEE Transactions on Industrial Electronics, vol. 71, no. 4, pp. 3718–3727, Apr. 2024, doi: 10.1109/TIE.2023.3274853.
[13] W. Li et al., “Deep reinforcement learning-based energy management of hybrid battery systems in electric vehicles,” Journal of Energy Storage, vol. 36, p.
102355, Apr. 2021, doi: 10.1016/j.est.2021.102355.
[14] “30 Years of Lithium?Ion Batteries - Li - 2018 - Advanced Materials - Wiley Online Library.” Accessed: Jul. 09, 2025. [Online]. Available:
https://advanced.onlinelibrary.wiley.com/doi/abs/10.1002/adma.201800561
[15] J. Xu, X. Mei, H. Wang, H. Shi, Z. Sun, and Z. Zou, “A model based balancing system for battery energy storage systems,” Journal of Energy Storage, vol. 49,
p. 104114, May 2022, doi: 10.1016/j.est.2022.104114.
[16] M. Zhang et al., “A Review of SOH Prediction of Li-Ion Batteries Based on Data-Driven Algorithms,” Energies, vol. 16, no. 7, Art. no. 7, Jan. 2023, doi: 10.3390/en16073167.
[17] Q. Hassan, S. Algburi, A. Z. Sameen, H. M. Salman, and M. Jaszczur, “A review of hybrid renewable energy systems: Solar and wind-powered solutions: Challenges, opportunities, and policy implications,” Results in Engineering, vol. 20, p. 101621, Dec. 2023, doi: 10.1016/j.rineng.2023.101621.
[18] W. Qiang and Z. Zhongli, “Reinforcement learning model, algorithms and its application,” in 2011 International Conference on Mechatronic Science, Electric Engineering and Computer (MEC), Aug. 2011, pp. 1143–1146. doi: 0.1109/MEC.2011.6025669.
[19] O. Al-Ani and S. Das, “Reinforcement Learning: Theory and Applications in HEMS,” Energies, vol. 15, no. 17, Art. no. 17, Jan. 2022, doi: 10.3390/en15176392.
[20] Y. Wang et al., “A comprehensive review of battery modeling and state estimation approaches for advanced battery management systems,” Renewable and Sustainable Energy Reviews, vol. 131, no. C, 2020, Accessed: Jul. 09, 2025. [Online]. Available:
https://ideas.repec.org//a/eee/rensus/v131y2020ics1364032120303063.html
[21] M. Pouyan, A. Mousavi, S. Golzari, and A. Hatam, “Improving the performance of Q-learning using simultanouse Q-values updating,” in 2014 International Congress on Technology, Communication and Knowledge (ICTCK), Nov. 2014, pp. 1–6. doi: 10.1109/ICTCK.2014.7033528.
[22] K. Hu et al., “A review of research on reinforcement learning algorithms for multi-agents,” Neurocomputing, vol. 599, p. 128068, Sep. 2024, doi: 10.1016/j.neucom.2024.128068.
[23] A. Kumar and D. Singh, “Adaptive epsilon greedy reinforcement learning method in securing IoT devices in edge computing,” Discov Internet Things, vol. 4, no. 1, p. 27, Nov. 2024, doi: 10.1007/s43926-024-00080-7.
[24] “bstj27-1-109.pdf.” Accessed: Jul. 21, 2025. [Online]. Available: https://vtda.org/pubs/BSTJ/vol27-1948/articles/bstj27-1-109.pdf
[25] Z. Lin, D. Li, and Y. Zou, “Energy efficiency of lithium-ion batteries: Influential factors and long-term degradation,” Journal of Energy Storage, vol. 74, p. 109386, Dec. 2023, doi: 10.1016/j.est.2023.109386.
[26] “Green_access_efficiency_paper.pdf.” Accessed: Jul. 22, 2025. [Online]. Available:
https://pure.manchester.ac.uk/ws/portalfiles/portal/74265359/Green_access_efficiency_paper.pdf