RTAverse: A Machine Learning-Based Analysis and Forecasting of Road Traffic Accidents in Angeles City

Authors: Raven Y. Butial, Ivan T. David, Kyla Marie P. De Leon, Rianne Louisa R. Magno, Melissa M. Pantig

DOI Link: https://doi.org/10.22214/ijraset.2026.78253

Abstract

Road traffic accidents (RTAs) in Angeles City remain a persistent public-safety concern, yet operational planning is often reactive because risk signals are not transformed into actionable, forward-looking evidence. This study presents RTAverse, a private web-based decision-support system that operationalizes official accident records into spatiotemporal forecasting and hotspot-level risk mapping for authorized stakeholders (e.g., LGU and traffic enforcement units). Following CRISP-DM, RTA records from Camp Tomas J. Pepito (2015–2024) were consolidated and preprocessed through automated cleaning (canonicalized headers, spatiotemporal deduplication, and barangay-name normalization), temporal transformation (datetime parsing and engineered season/time attributes), and feature reduction for modeling. Seven learning algorithms (Decision Tree, Random Forest, AdaBoost, XGBoost, k-NN, Naive Bayes, and SVM) were screened using error-based forecasting metrics; Random Forest and XGBoost achieved the lowest initial errors (MAE ? 0.22–0.25). Under sequential time-series evaluation, XGBoost produced the most consistent performance, and a Poisson-objective XGBoost achieved cross-validation MAE scores of 0.39, 0.19, 0.18, and 0.13 (overall MAE = 0.22), reflecting improved suitability for count-based outcomes. A hybrid variant strengthened spatial utility by integrating time-of-day clustering into hotspot forecasts, yielding absolute error 0–1 for 20 of 26 hotspots in the final period. Feature importance analysis indicated late-night time clusters as the strongest predictors (nearly 70% of the importance score), followed by rolling temporal trends. Expert evaluation using ISO/IEC 25010 and TAM affirmed the dashboard’s usability and perceived usefulness. Overall, RTAverse demonstrates how privacy-preserving, localized accident data can be modeled as an evolving urban risk system and translated into practical forecasts that support preventive traffic-safety planning in Angeles City.

Introduction

The text addresses the challenge of road traffic accidents (RTAs) in urban areas, where risks are unevenly distributed across locations and time due to factors like road design, human behavior, and traffic patterns. Traditional approaches rely on reactive reporting, which limits the ability to prevent accidents. To improve safety, there is a need for predictive, data-driven systems that can forecast high-risk areas and time periods.

To address this, the study introduces RTAverse, a machine learning–based forecasting and visualization system developed for Angeles City. The system uses historical accident data (2015–2024) to predict accident risks and identify hotspots. It includes a data pipeline, automated preprocessing, model retraining, and a dashboard for visualizing trends and risks, supporting decision-making by local authorities.

Key features of the system include:

Forecasting accident risks across time and location
Hotspot identification and risk categorization (low, moderate, high)
Interactive dashboard for data visualization and monitoring
Secure, private access for authorized administrators

The methodology follows the CRISP-DM framework, involving data collection, preprocessing, feature engineering, and model evaluation. Various machine learning models were tested, with XGBoost (Poisson-based) performing best due to its accuracy in handling count data and temporal patterns.

Results show that ensemble models (especially XGBoost and Random Forest) outperform others, with XGBoost achieving the highest accuracy in time-series forecasting. Incorporating temporal features (like time-of-day and lag data) further improves prediction reliability.

Conclusion

RTAverse demonstrates how official accident records can be transformed into a privacy-preserving forecasting and visualization dashboard for Angeles City that supports authorized stakeholders in identifying temporal patterns and actionable hotspots. Across experiments, XGBoost and Random Forest consistently outperformed simpler baselines. XGBoost produced the most stable results under sequential evaluation, and a Poisson-objective configuration aligned better with count-based crash data, achieving an overall MAE of 0.22. While Random Forest tended to underpredict in high-frequency areas, the hybrid XGBoost variant improved hotspot-level interpretability and near-term spatial utility, supporting operational usebeyond aggregate accuracy. The cleaned dataset contained 2,780 historical records and was sufficient for model validation, but remaining gaps motivate further enrichment.

References

[1] Ackaah, W., Apuseyine, B. A., & Afukaar, F. K. (2020). Road traffic crashes at night-time: Characteristics and risk factors. International Journal of Injury Control and Safety Promotion, 27(3), 392-399. https://doi.org/10.1080/17457300.2020.1785508 [2] Agoylo, J. C. (2024). GIS-based traffic accident hotspot prediction using machine learning. International Journal of Advanced Research in Computer Science, 15(2), 45–53. https://doi.org/10.22541/au.173347433.37543456/v1 [3] Al-Mistarehi, B. W., Alomari, A. H., Imam, R., & Mashaqba, M. (2022). Using Machine Learning Models to Forecast Severity Level of Traffic Crashes by R Studio and ArcGIS. Frontiers in Built Environment, 8. https://doi.org/10.3389/fbuil.2022.860805 [4] Amorim, B. D. S. P., Firmino, A. A., Baptista, C. D. S., Júnior, G. B., Paiva, A. C. D., & Júnior, F. E. D. A. (2023). A machine learning approach for classifying road accident hotspots. ISPRS International Journal of Geo-Information, 12(6), 227. https://doi.org/10.3390/ijgi12060227 [5] Assi, K., Rahman, S. M., Mansoor, U., & Ratrout, N. (2020). Predicting crash injury severity with machine learning algorithm synergized with clustering technique: A promising protocol. International journal of environmental research and public health, 17(15), 5497. https://doi.org/10.3390/ijerph17155497 [6] Berhanu, Y., Schröder, D., Wodajo, B. T., & Alemayehu, E. (2024). Machine Learning for Predictions of Road Traffic Accidents and Spatial Network Analysis for Safe Routing on Accident and Congestion-Prone Road Networks. Results in Engineering, 23, 102737. https://doi.org/10.1016/j.rineng.2024.102737 [7] Behboudi, N., Moosavi, S., & Ramnath, R. (2024). Recent advances in traffic accident analysis and prediction: A comprehensive review of machine learning techniques. [8] arXiv preprint arXiv:2406.13968. https://doi.org/10.48550/arXiv.2406.13968 [9] Silva, P. B., Andrade, M., & Ferreira, S. (2020). Machine learning applied to road safety modeling: A systematic literature review. Journal of traffic and transportation engineering (English edition), 7(6), 775-790. https://doi.org/10.1016/j.jtte.2020.07.004 [10] Datu, N. H. (2023, March). Road traffic accidents analysis using association rule mining and descriptive analytics. In AIP Conference Proceedings (Vol. 2508, No. 1). AIP Publishing. https://doi.org/10.1063/5.0117371 [11] Dong, C., & Chang, N. (2023). Overview of the identification of traffic accident-prone locations driven by big data. Digital Transportation and Safety, 2(1), 67-76. https://doi.org/10.48130/DTS-2023-0006 [12] Pitarque, A., & Guillen, M. (2022). Interpolation of quantile regression to estimate driver’s risk of traffic accident based on excess speed. Risks, 10(1), 19. https://doi.org/10.3390/risks10010019 [13] Quistberg, D. A., Hessel, P., Rodriguez, D. A., Sarmiento, O. L., Bilal, U., Caiaffa, W. T., ... & Roux, A. V. D. (2022). Urban landscape and street-design factors associated with road-traffic mortality in Latin America between 2010 and 2016 (SALURBAL): an ecological study. The Lancet Planetary Health, 6(2), e122-e131. https://doi.org/10.1016/S2542-5196(21)00323-5 [14] Dorado, D., & Aviles, J. (2024, July). Machine Learning Regression Model Development and Data Visualization of Road Accident in Urdaneta City, Pangasinan, Philippines. In Proceedings of the 2024 6th Asia Conference on Machine Learning and Computing (pp. 27-32). https://doi.org/10.1145/3690771.3690785 [15] Libnao, M., Misula, M., Andres, C., Mariñas, J., & Fabregas, A. (2023). Traffic incident prediction and classification system using naïve bayes algorithm. Procedia Computer Science, 227, 316-325. https://doi.org/10.1016/j.procs.2023.10.530

Copyright

Copyright © 2026 Raven Y. Butial, Ivan T. David, Kyla Marie P. De Leon, Rianne Louisa R. Magno, Melissa M. Pantig. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET78253

Publish Date : 2026-03-13

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here