To get a realistic picture of cellular network QoS, you need to guess and study traffic patterns. There are a number of methods that cellular network planners predict traffic. But when datasets are really large, traditional methods take a lot of time and resources. We offer AML-CTP (Adaptive Machine Learning-based Cellular Traffic Prediction), a new algorithm that learns from a small, accurate dataset to make predictions more accurate and less complicated. We use Min-Max Scaler to normalize data, Select-K-Best to choose features, and PCA to reduce the number of dimensions. To locate training clusters that are very similar, we employ DBSCAN and Kernel Density. We test SVM, Linear Regression, Decision Tree, Light Gradient Boosting, and XGBoost on a Cellular LTE dataset from an Egyptian enterprise. The Decision Tree method obtained the highest R² score of 96%, and the extension XGBoost model had an unexpected 98%, which means that it might be better at predicting cellular traffic.
Introduction
The rapid growth of smartphone usage and data-heavy applications has led to a dramatic rise in cellular network traffic, creating challenges in maintaining Quality of Service (QoS) due to increased latency, reduced throughput, and higher congestion. Accurate traffic prediction is therefore essential for efficient network resource allocation. Traditional machine learning (ML) models—such as SVM, Linear Regression, and Decision Trees—have been used for traffic forecasting, but they struggle with scalability and accuracy when handling large, complex, and dynamic datasets.
To overcome these limitations, researchers have increasingly adopted data reduction and clustering techniques such as PCA, feature selection, DBSCAN, and density-based clustering. These methods help minimize redundancy, reduce computational load, and improve prediction accuracy. Building on these advancements, the proposed Adaptive Machine Learning-Based Cellular Traffic Prediction (AML-CTP) system integrates advanced processing, feature engineering, and modern ML models—including LightGBM and XGBoost—for more efficient and accurate cellular traffic prediction in 4G/5G networks.
Related Work
Past studies have explored reinforcement learning for scheduling, ML-based LTE performance analysis, energy-efficient relay deployment, expert systems for data analysis, smart handover prediction, LSTM-based deep learning models, fusion models for LTE forecasting, PCA-based dimensionality reduction, density-based clustering, seasonal SVR, ensemble clustering, and scalable big data systems. Collectively, this body of research highlights the growing shift toward advanced ML and clustering solutions for improved cellular traffic forecasting.
Methodology
The proposed system processes network traffic data through a structured pipeline involving:
Data preprocessing: normalization, PCA, Select-K-Best, and clustering (DBSCAN, Kernel Density).
Model training: ML models including SVM, Linear Regression, Decision Tree, LightGBM, and XGBoost.
Evaluation: using R², RMSE, and MAPE to assess accuracy.
System features: secure admin login, data upload, processing modules, and real-time traffic prediction.
Implementation
The system includes modules for dataset loading, preprocessing, feature selection, dimensionality reduction, clustering, visualization, model training/testing, admin authentication, and real-time prediction.
Algorithms
Five major algorithms (SVM, Linear Regression, Decision Tree, LightGBM, and XGBoost) are used, each tailored to handle traffic classification or forecasting with improved performance through feature reduction and clustering.
Results
Decision Tree achieved 96% R², while XGBoost delivered the best performance with 98% R², demonstrating strong predictive accuracy. LightGBM reached about 94% R², and SVM/LR performed moderately (85–90%). PCA, Select-K-Best, and density-based clustering significantly enhanced model efficiency without reducing accuracy. The overall system provided reliable real-time traffic prediction and improved QoS, proving suitable for next-generation cellular networks.
Conclusion
The AML-CTP system successfully demonstrates that adaptive machine learning, combined with data reduction and clustering techniques, can accurately predict cellular network traffic while reducing computational complexity. Among the tested models, the extended XGBoost algorithm achieved the highest prediction accuracy, highlighting the effectiveness of advanced boosting methods. Overall, the system improves resource allocation, enhances Quality of Service (QoS), and provides a practical solution for real-time cellular traffic management.
References
[1] H. Huang, Z. Hu, Y. Wang, Z. Lu, X. Wen, and B. Fu, ‘‘Train a central traffic prediction model using local data: A spatio-temporal network based on federated learning,’’ Eng. Appl. Artif. Intell., vol. 125, Oct. 2023, Art. no. 106612.
[2] R. L. Devi and V. Saminadan, ‘‘Machine learning based traffic prediction system in green cellular networks,’’ in Proc. 1st Int. Conf. Comput. Sci. Technol. (ICCST), Chennai, India, Nov. 2022, pp. 593–596.
[3] D. Alekseeva, N. Stepanov, A. Veprev, A. Sharapova, E. S. Lohan,and A. Ometov, ‘‘Comparison of machine learning techniques applied to traffic prediction of real wireless network,’’ IEEE Access, vol. 9, pp. 159495–159514, 2021.
[4] C. Zhang, H. Zhang, J. Qiao, D. Yuan, and M. Zhang, ‘‘Deep transfer learning for intelligent cellular traffic prediction based on cross-domain big data,’’ IEEE J. Sel. Areas Commun., vol. 37, no. 6, pp. 1389–1401, Jun. 2019.
[5] H. Xia, X. Wei, Y. Gao, and H. Lv, ‘‘Traffic prediction based on ensemble machine learning strategies with bagging and LightGBM,’’ in Proc. IEEE Int. Conf. Commun. Workshops (ICC Workshops), May 2019, pp. 1–6.
[6] M. Nashaat, I. E. Shaalan, and H. Nashaat, ‘‘LTE downlink scheduling with soft policy gradient learning,’’ in Proc. 8th Int. Conf. Adv. Mach. Learn. Technol. Appl. (AMLTA), 2022, pp. 224–236.
[7] N. H. Mohammed, H. Nashaat, S. M. Abdel-Mageid, and R. Y. Rizk, ‘‘A framework for analyzing 4G/LTE—A real data using machine learning algorithms,’’ in Proc. Int. Conf. Adv. Intell. Syst. Inform., 2021, pp. 826–838.
[8] S. M. M. AboHashish, R. Y. Rizk, and F. W. Zaki, ‘‘Energy efficiency optimization for relay deployment in multi-user LTE-advanced networks,’’ Wireless Pers. Commun., vol. 108, no. 1, pp. 297–323, Sep. 2019.
[9] E. T. Ogidan, K. Dimililer, and Y. K. Ever, ‘‘Machine learning for expert systems in data analysis,’’ in Proc. 2nd Int. Symp. Multidisciplinary Stud. Innov. Technol. (ISMSIT), Oct. 2018, pp. 1–5.
[10] R. Rizk and H. Nashaat, ‘‘Smart prediction for seamless mobility in FHMIPv6 based on location based services,’’ China Commun., vol. 15, no. 4, pp. 192–209, Apr. 2018.
[11] H. Nashaat, ‘‘QoS-aware cross layer handover scheme for high-speed vehicles,’’ KSII Trans. Internet Inf. Syst., vol. 12, no. 1, pp. 135–158, Jan. 2018.
[12] H. D. Trinh, L. Giupponi, and P. Dini, ‘‘Mobile traffic prediction from raw data using LSTM networks,’’ in Proc. IEEE 29th Annu. Int. Symp. Pers., Indoor Mobile Radio Commun. (PIMRC), Sep. 2018, pp. 1827–1832.
[13] S. T. Nabi, Md. R. Islam, Md. G. R. Alam, M. M. Hassan, S. A. AlQahtani, G. Aloi, and G. Fortino, ‘‘Deep learning based fusion model for multivariate LTE traffic forecasting and optimized radio parameter estimation,’’ IEEE Access, vol. 11, pp. 14533–14549, 2023.
[14] A. Vabalas, E. Gowen, E. Poliakoff, and A. J. Casson, ‘‘Machine learning algorithm validation with a limited sample size,’’ PLoS ONE, vol. 14, no. 11, Nov. 2019, Art. no. e0224365.
[15] S. Chahboun and M. Maaroufi, ‘‘Principal component analysis and machine learning approaches for photovoltaic power prediction: A comparative study,’’ Appl. Sci., vol. 11, no. 17, p. 7943, Aug. 2021.
[16] P. Ghasemi, M. Aslani, D. K. Rollins, and R. C. Williams, ‘‘Principal component neural networks for modeling, prediction, and optimization of hot mix asphalt dynamics modulus,’’ Infrastructures, vol. 4, no. 3, p. 53, Aug. 2019.
[17] S. Chakraborty, ‘‘Analysis and study of incremental DBSCAN clustering algorithm,’’ Int. J. Enterp. Comput. Bus. Syst., vol. 1, no. 2, Jul. 2011.
[18] A. Ram, S. Jalal, A. S. Jalal, and M. Kumar, ‘‘A density based algorithm for discovering density varied clusters in large spatial databases,’’ Int. J. Comput. Appl., vol. 3, no. 6, pp. 1–4, Jun. 2010.
[19] Q. Lin and J. Son, ‘‘A close contact identification algorithm using kernel density estimation for the ship passenger health,’’ J. King Saud Univ. Comput. Inf. Sci., vol. 35, no. 6, Jun. 2023, Art. no. 101564.
[20] B. Guo, L. Tian, J. Zhang, Y. Zhang, L. Yu, J. Zhang, and Z. Liu, ‘‘A clustering algorithm based on joint kernel density for millimeter wave radio channels,’’ in Proc. 13th Eur. Conf. Antennas Propag. (EuCAP), Mar. 2019, pp. 1–5.
[21] Y. Jia, S. Tao, R. Wang, and Y. Wang, ‘‘Ensemble clustering via co-association matrix self-enhancement,’’ IEEE Trans. Neural Netw. Learn. Syst., pp. 1–12, 2023.
[22] N. Arinik, V. Labatut, and R. Figueiredo, ‘‘Characterizing and comparing external measures for the assessment of cluster analysis and community detection,’’ IEEE Access, vol. 9, pp. 20255–20276, 2021.
[23] W.-C. Hong, ‘‘Application of seasonal SVR with chaotic immune algorithm in traffic flow forecasting,’’ Neural Comput. Appl., vol. 21, no. 3, pp. 583–593, Apr. 2012.
[24] D. E. Birba, ‘‘A comparative study of data splitting algorithms for machine learning model selection,’’ KTH Royal Inst. Technol., Tech. Rep., 2020.
[25] H. Hu, Y. Wen, T.-S. Chua, and X. Li, ‘‘Toward scalable systems for big data analytics: A technology tutorial,’’ IEEE Access, vol. 2, pp. 652–687, 2014.
[26] J. Wang, J. Tang, Z. Xu, Y. Wang, G. Xue, X. Zhang, and D. Yang, ‘‘Spatiotemporal modeling and prediction in cellular networks: A big data enabled deep learning approach,’’ in Proc. IEEE INFOCOM IEEE Conf. Comput. Commun., May 2017, pp. 1–9.
[27] B. Mahdy, H. Abbas, H. Hassanein, A. Noureldin, and H. Abou-zeid, ‘‘A clustering-driven approach to predict the traffic load of mobile networks for the analysis of base stations deployment,’’ J. Sensor Actuator Netw., vol. 9, no. 4, p. 53, Nov. 2020.
[28] D. Alekseeva, N. Stepanov, A. Veprev, A. Sharapova, E. S. Lohan, and A. Ometov, ‘‘Comparison of machine learning techniques applied to traffic prediction of real wireless network,’’ IEEE Access, vol. 9, pp. 159495–159514, 2021.
[29] J. Riihijarvi and P. Mahonen, ‘‘Machine learning for performance prediction in mobile cellular networks,’’ IEEE Comput. Intell. Mag., vol. 13, no. 1, pp. 51–60, Feb. 2018.
[30] S. Chaudhary and R. Johari, ‘‘ORuML: Optimized routing in wireless networks using machine learning,’’ Int. J. Commun. Syst., vol. 33, no. 11, p. e4394, Jul. 2020