This study presents a comparative evaluation of four predictive models—Support Vector Regression (SVR), XGBoost, Random Forest, and a Hybrid Stacked Ensemble model—designed to estimate the Remaining Useful Life (RUL) and State of Health (SoH) of lithium-ion batteries using data from a distributed Battery Management System (BMS). The models were assessed based on multiple metrics, including Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), training time, and inference time, with a focus on real-time deployment feasibility. The results show that while Random Forest is the most robust base model, the Hybrid Stacked model, which integrates predictions from SVR, XGBoost, and Random Forest via a Random Forest meta-learner, delivers superior performance. The Hybrid model achieved a 53% reduction in MAE and a 55% reduction in RMSE compared to the best base model. Visual and statistical analysis further supports the Hybrid model\'s accuracy, stability, and applicability to real-time battery health management. The findings suggest that ensemble methods, particularly stacking, offer substantial improvements in predictive reliability and generalization.
Introduction
Accurate prediction of Remaining Useful Life (RUL) and State of Health (SoH) of lithium-ion batteries is essential for sustainable energy systems. The Battery Management System (BMS) monitors battery performance, but its raw data are often noisy, complex, and require extensive preprocessing before use in predictive models. This study evaluates multiple machine learning approaches—Support Vector Regression (SVR), XGBoost, Random Forest, and a Hybrid Stacked Ensemble Model—for predicting battery RUL and SoH. Preprocessing steps include outlier removal, feature normalization, and class balancing using SMOTE, aiming to enhance prediction accuracy and stability. Evaluation metrics include Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE), along with computational efficiency analysis. Results indicate that the hybrid stacked ensemble significantly outperforms individual models.
The BMS dataset contains variables such as cell voltage, temperature, module current, module power, state of charge (SOC), SoH, anomaly scores, and latency. Correlation analysis revealed strong positive relationships between module current and power, and negative correlations between cell temperature and SOC, while latency showed weak dependence on other variables.
Data preprocessing involved handling missing values, removing duplicates, detecting and adjusting outliers, feature engineering (e.g., rolling averages, rates of change), and standardizing features for machine learning. SMOTE was used to address class imbalance, particularly for rare failure events. The cleaned dataset was split into 80% training and 20% testing sets to ensure unbiased evaluation.
The evaluation methodology used preprocessed real-time BMS data, with strict separation of training and testing data to prevent leakage. For the hybrid stacked ensemble, predictions from base learners (SVR, XGBoost, Random Forest) were fed into a Random Forest meta-learner. Experiments were conducted on a workstation with Intel i7 CPU, 32 GB RAM, and NVIDIA RTX 3060 GPU using Python 3.10 and standard libraries (scikit-learn, XGBoost, numpy, pandas, matplotlib).
Performance metrics:
MAE measured the average absolute deviation between predicted and actual values, indicating overall model reliability.
RMSE penalized larger errors, highlighting robustness against noise and abrupt changes in battery degradation.
The study demonstrates that careful preprocessing combined with ensemble modeling can significantly improve the accuracy and stability of battery RUL and SoH predictions, supporting safer and more efficient energy storage management.
Conclusion
This study demonstrated that ensemble learning techniques, particularly the Hybrid Stacked model, offer significant advantages in predicting the Remaining Useful Life (RUL) and State of Health (SoH) of lithium-ion batteries compared to traditional single-model approaches [13]. The Hybrid Stacked model outperformed the base models—SVR, XGBoost, and Random Forest—in terms of both predictive accuracy and generalization ability, achieving a substantial reduction in error metrics (MAE and RMSE). The model\'s performance is further supported by its ability to leverage the complementary strengths of the base learners, reducing biases and improving robustness to noisy data. Additionally, the Hybrid model demonstrated practical applicability for real-time BMS deployment, with manageable training and inference times. However, limitations in the dataset, such as its focus on stationary degradation patterns, suggest future avenues for improvement, including the incorporation of more diverse operational conditions and online learning techniques [14]. The findings confirm that Hybrid Stacked models are a powerful and scalable solution for battery health monitoring, supporting predictive maintenance, optimized charging strategies, and enhanced decision-making in EVs and energy storage systems.
References
[1] K. Bakirov, \"Application of hybrid Model for Data Analysis in Hydroponic System,\" Technologies , 2025.
[2] A. Wilson, \"Rceent Advances inThermal Imaging and its Applications Using Machine Learning: A Review,\" IEEE Sensors Journal, vol. 23, no. 4, 2023.1.10.
[3] A. R. Moumen, \"Adaptive traffic lights based on traffic flow prediction using machine learning models,\" International Journal of Electrical and Computer Engineering , vol. 13, no. 5, p. 581, 2023.6.23.
[4] L. A. S. C. R. Riley, \"Evaluation of clinical prediction models (part 2): how to undertake an external validation study,\" British medical journal, vol. 384, 2024.1.15.
[5] 4. H. N. K. S. A. D. Aida Brankovic1, \"Explainable machine learning for real time deterioration alert prediction to guide pre emptive treatment,\" Scientific Reports , vol. 12, no. 1, 2022.7.11.
[6] S. M. A. S. L. Q. A. D. L. O. B. M. Matthias Heinrich1*, \"Detection of cleaning interventions on photovoltaic modules with machine learning,\" Applied Energy, vol. 263, 2020.2.20.
[7] M. K. b. A. T. B. T. c. M. A. d. Ali Aldrees a, \"Evaluation of water quality indexes with novel machine learning and SHapley Additive ExPlanation (SHAP) approaches,\" Journal of Water Process Engineering , 2024.1.16.
[8] F. H. ?. Y. S. Mohammed Al Saleem, \"Explainable machine learning methods for predicting water treatment plant features under varying weather conditions,\" Results in Engineering, vol. 21, 2024.3.1.
[9] O. A. Semra Ta?abat, \"Using Long-Short Term Memory Networks with Genetic Algorithm to PredictEngine Condition,\" GAZI UNIVERSITY JOURNAL OF SCIENCE, vol. 35, no. 3, p. 1210, 2021.10.6
[10] *. M. M. S. 2. S. B. 2. Nahar F. Alshammari 1, \"Comprehensive Analysis of Multi-Objective OptimizationAlgorithms for Sustainable Hybrid Electric VehicleCharging Systems,\" mathematics , vol. 11, no. 7, p. 175, 2023.4.5.
[11] b. G. C. b. B. S. a. M. G. d. F. M. a. Leandro Masello a, \"Using contextual data to predict risky driving events: A novel methodology from explainable artificial intelligence,\" Accident Analysis and Prevention , vol. 184, p. 106, 2023.2.26.
[12] H. M. A. M. M. S. Shimaa Barakat, \"Eco-Efficient Mobility: Comparative Optimization of PV-Wind EV Charging Solutions,\" The Economist, 2023.12.19.
[13] E. S. C. T. G. Thodoris Garefalakis1? ?, \"Predicting risky driving behavior with classification algorithms: results from a large scale field trial and simulator experiment,\" European TransportResearch Review, vol. 16, no. 1, 2024.11.21.
[14] F.-L. Luo, Machine Learning for Future WirelessCommunications, wiley, 2019.12.13.