Ijraset Journal For Research in Applied Science and Engineering Technology
Authors: Sahil Chougule, Harsh Patgave, Naimish Benade, Atharv Shinde, Prof. Jagdish Ingale
DOI Link: https://doi.org/10.22214/ijraset.2026.83520
Certificate: View Certificate
The global sports prediction and analytics ecosystem represents a rapidly expanding domain, with the international sports betting market valued at USD 100.9 billion in 2024 and projected to reach USD 187.4 billion by 2030 (CAGR 11%). Simultaneously, the sports analytics market is forecast to grow at a CAGR of 15.6% through 2033, driven by the proliferation of data-driven decision-making in professional sports. Accurate pre-match outcome prediction remains a technically challenging problem due to the stochastic nature of competitive sports, the heterogeneity of relevant features across disciplines, and the scarcity of unified frameworks that generalise across multiple sports. Existing approaches predominantly target a single sport or tournament, rely on limited feature sets, and fail to provide systematic ablation evidence for their design choices. Statistical models and shallow machine learning methods achieve modest accuracy in the range of 54–67% for football and 58–78% for cricket pre-match prediction, while deep learning approaches frequently underperform on structured tabular sports data due to data scarcity and overfitting. This paper proposes a unified, feature-engineered prediction framework — denoted UniSportXGB — that integrates sport-specific domain knowledge with gradient-boosted decision trees (XGBoost) to predict pre-match outcomes across association football (English Premier League, Spanish La Liga, UEFA Champions League, German Bundesliga) and cricket (Indian Premier League, ICC Cricket World Cup). The framework systematically constructs 28 features per sport, including rolling form windows, home/away performance splits, head-to-head records, venue win rates, toss influence (cricket), and squad fatigue proxies. Four models — Logistic Regression, Random Forest, XGBoost, and a Multilayer Perceptron — are trained and compared under identical experimental conditions. UniSportXGB achieves 68.3% accuracy and F1-score of 0.67 on football outcome prediction, and 71.4% accuracy with F1-score of 0.70 on cricket, outperforming the strongest baseline (Random Forest) by 4.2 and 3.8 percentage points respectively. Statistical significance is confirmed via paired t-test (p < 0.01) across all comparisons. SHAP feature importance analysis reveals that rolling five-match form and venue win rate are the most discriminative predictors in both sports. UniSportXGB provides a reproducible, open-access prediction toolkit applicable to coaching analytics, fan engagement systems, and responsible sports intelligence platforms, with direct relevance to the growing sports-data industry.
This paper presents UniSportXGB, a unified machine-learning framework for predicting match outcomes in both football and cricket. The growing economic importance of sports, driven by massive global audiences, expanding sports analytics, and a rapidly growing sports betting industry, has increased demand for accurate pre-match prediction systems. While machine learning has become the dominant approach for sports forecasting, most existing studies focus on a single sport, limiting cross-sport applicability and comparison.
The study addresses this gap by developing a common prediction framework that can be applied to two structurally different sports—football and cricket—using the same feature-engineering and modelling pipeline. The objective is to predict match outcomes from pre-match data only, without using live or in-game information. For football, outcomes are classified as home win, draw, or away win, while for cricket, outcomes are classified as a win for either competing team.
Three key research questions guide the study:
The authors identify four major limitations in previous sports prediction research:
To address these shortcomings, the paper introduces several contributions:
The literature review shows that XGBoost consistently performs well in both football and cricket prediction tasks, often outperforming deep-learning approaches on structured tabular datasets. Previous studies have demonstrated the importance of form-based metrics, head-to-head records, venue characteristics, and team-quality indicators, but none have provided a unified framework across multiple sports.
Methodologically, the framework treats match prediction as a supervised classification problem. Historical match data are converted into feature vectors, and models are trained using cross-entropy loss. The primary model, XGBoost, employs an ensemble of decision trees with regularization and gradient-boosting optimization, making it particularly effective for structured sports data.
This paper addressed the problem of pre-match outcome prediction across two structurally distinct sports — association football and cricket — within a unified machine learning framework. The growing importance of sports analytics in a USD 100.9 billion global betting market, combined with the fragmented and sport-specific nature of prior prediction research, motivated the development of UniSportXGB: a domain-aware, gradient-boosted prediction system trained and evaluated across six tournaments spanning both sports.
[1] Deloitte, \"Annual Review of Football Finance 2024,\" Deloitte Sports Business Group, Manchester, UK, 2024. [Online]. Available: https://www.deloitte.com/uk/en/services/financial-advisory/analysis/annual-review-football-finance.html [2] Board of Control for Cricket in India (BCCI), \"IPL 2023 Season Report,\" Mumbai, India, 2023. [3] Grand View Research, \"Sports Betting Market Size & Share Report, 2025–2030,\" Grand View Research, San Francisco, CA, 2024. [Online]. Available: https://www.grandviewresearch.com/industry-analysis/sports-betting-market-report [4] IMARC Group, \"Sports Analytics Market: Global Industry Trends, Share, Size, Growth, Opportunity and Forecast 2025–2033,\" IMARC Group, New York, NY, 2024. [5] J. Dixon and S. Coles, \"Modelling Association Football Scores and Inefficiencies in the Football Betting Market,\" Appl. Stat., vol. 46, no. 2, pp. 265–280, 1997. doi: 10.1111/1467-9876.00065 [6] R. Bunker, C. Yeung, and K. Fujii, \"Machine Learning for Sports Prediction: A Meta-Analytic Review of Methods and Outcomes,\" ACM Trans. Intell. Syst. Technol., vol. 15, no. 2, pp. 1–35, 2024. doi: 10.1145/3632394 [7] A. Narayanan, P. Mehta, and S. Iyer, \"XGBoost and LightGBM for English Premier League Match Prediction Using Team Form and Player Market Values,\" in Proc. Int. Conf. Data Science and Machine Learning (ICDSML 2024), pp. 112–119, 2024. [8] P. Hassard and D. Kerr, \"Predicting Football Match Outcomes Using Event Data and Machine Learning Algorithms,\" in Proc. 35th Irish Systems and Signals Conference (ISSC 2024), Derry/Londonderry, UK, Jun. 2024. doi: 10.1049/icp.2024.1567 [9] R. P. Bunker and F. Thabtah, \"A Machine Learning Framework for Sport Result Prediction,\" Appl. Comput. Inform., vol. 15, no. 1, pp. 27–33, 2019. doi: 10.1016/j.aci.2017.09.005 [10] Z. Khan, M. Ali, and R. Patel, \"Logistic Regression and Artificial Neural Networks for English Premier League Match Result Prediction,\" Int. J. Comput. Sci. Inf. Technol., vol. 16, no. 3, pp. 45–57, 2024. [11] J. Štemberk, O. P?ibyl, and V. Markovi?, \"Comparative Analysis of Machine Learning Methods for Football Match Outcome Prediction,\" in Proc. Int. Conf. Intelligent Systems and Applications (ISA 2023), Prague, Czech Republic, pp. 88–95, 2023. [12] S. Almalki, A. Al-Harbi, and M. Al-Otaibi, \"Deep Neural Networks for Football Match Outcome Prediction Using Historical Match Data,\" J. King Saud Univ. Comput. Inf. Sci., vol. 35, no. 8, p. 101724, 2023. doi: 10.1016/j.jksuci.2023.101724 [13] C. Yeung, R. Sit, and K. Fujii, \"Evaluating Soccer Match Prediction Models: A Deep Learning Approach and Feature Optimization for Gradient-Boosted Trees,\" arXiv preprint arXiv:2309.14807, Sep. 2023. [14] S. Vanithas, \"Forecasting Premier League Match Outcomes for the 22/23 Season Using Deep Learning,\" Medium / Towards AI, 2023. [Online]. Available: https://towardsai.net [15] S. Chakraborty, A. Mondal, A. Bhattacharjee, A. Mallick, R. Santra, S. Maity, and L. Dey, \"Cricket Data Analytics: Forecasting T20 Match Winners Through Machine Learning,\" Int. J. Knowl.-Based Intell. Eng. Syst., vol. 28, no. 1, pp. 85–102, 2024. doi: 10.3233/KES-230060 [16] T. Chen and C. Guestrin, \"XGBoost: A Scalable Tree Boosting System,\" in Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining (KDD 2016), San Francisco, CA, pp. 785–794, 2016. doi: 10.1145/2939672.2939785 [17] C. Yeung, R. Sit, and K. Fujii, \"A Generalizable Machine Learning Approach for Match Outcome Prediction in Football,\" arXiv preprint arXiv:2505.01902, May 2025. [18] M. J. Dixon and P. F. Pope, \"The Value of Statistical Forecasts in the UK Association Football Betting Market,\" Int. J. Forecasting, vol. 20, no. 4, pp. 697–711, 2004. doi: 10.1016/j.ijforecast.2003.12.011 [19] O. Hubá?ek, G. Šourek, and F. Železný, \"Exploiting Sports-Reference Data for Soccer Match Outcome Prediction,\" in Proc. ECML-PKDD Workshop on Machine Learning and Data Mining for Sports Analytics, Würzburg, Germany, 2019. [20] D. Berrar, P. Lopes, and W. Dubitzky, \"Incorporating Domain Knowledge in Machine Learning for Soccer Outcome Prediction,\" Mach. Learn., vol. 108, no. 1, pp. 97–126, 2019. doi: 10.1007/s10994-018-5747-8 [21] I. Wickramasinghe, \"Applications of Machine Learning in Cricket: A Systematic Review,\" Mach. Learn. Appl., vol. 10, p. 100435, Dec. 2022. doi: 10.1016/j.mlwa.2022.100435 [22] P. N. Gour and M. F. Khan, \"Ensemble-Based IPL Match Winner Prediction Using Multi-Model Machine Learning Approaches,\" Int. J. Adv. Res. Comput. Commun. Eng., vol. 14, no. 11, pp. 89–97, Nov. 2025. [23] A. Singh and P. Kumar, \"Real-Time Win Probability Estimation in T20 Cricket Using Gradient Boosting Models,\" Sports Analytics Review, vol. 4, no. 1, pp. 22–35, 2022. [24] D. Shah and A. Sharma, \"Effect of Run Rate, Required Run Rate, and Wickets on IPL Match Prediction Using Regression Analysis,\" Int. J. Sports Sci., vol. 11, no. 2, pp. 45–58, 2021. [25] F. Pedregosa et al., \"Scikit-learn: Machine Learning in Python,\" J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011. [26] A. Shrikumar, P. Greenside, and A. Kundaje, \"Learning Important Features Through Propagating Activation Differences,\" in Proc. 34th Int. Conf. Machine Learning (ICML 2017), Sydney, Australia, pp. 3145–3153, 2017. [27] S. M. Lundberg and S.-I. Lee, \"A Unified Approach to Interpreting Model Predictions,\" in Advances in Neural Information Processing Systems (NeurIPS 2017), vol. 30, pp. 4765–4774, 2017. [28] J. M. Oliva-Lozano, M. Vidal, F. Yousefian, R. Cost, and T. J. Gabbett, \"Predicting the Match Outcome in the 2023 FIFA Women\'s World Cup and Analysis of Influential Features,\" J. Hum. Kinet., vol. 93, pp. 45–58, 2025. doi: 10.5114/jhk/195563 [29] D. W. Hosmer, S. Lemeshow, and R. X. Sturdivant, Applied Logistic Regression, 3rd ed. Hoboken, NJ: Wiley, 2013. [30] L. Breiman, \"Random Forests,\" Mach. Learn., vol. 45, no. 1, pp. 5–32, 2001. doi: 10.1023/A:1010933404324 [31] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, MA: MIT Press, 2016. [32] StatsBomb, \"StatsBomb Open Data,\" GitHub, Dec. 2023. [Online]. Available: https://github.com/statsbomb/open-data [33] football-data.org, \"Football Data API,\" 2024. [Online]. Available: https://www.football-data.org [34] cricsheet.org, \"Cricsheet: Ball-by-Ball Cricket Data,\" 2024. [Online]. Available: https://cricsheet.org [35] Kaggle, \"European Soccer Database,\" Kaggle Datasets, 2016. [Online]. Available: https://www.kaggle.com/datasets/hugomathien/soccer [36] RapidAPI, \"Cricket Live Data API,\" RapidAPI Hub, 2024. [Online]. Available: https://rapidapi.com/hub/cricket [37] Precedence Research, \"Sports Analytics Market Size, Share, Growth Report, 2025–2034,\" Precedence Research, Ottawa, Canada, 2024. [38] V. Patel and R. Ghosh, \"Dynamic Win Prediction in Cricket Using Ball-by-Ball Data and Machine Learning Fusion Techniques,\" J. Predict. Model. Sports, vol. 6, no. 2, pp. 77–91, 2023. [39] M. Waskom, \"Seaborn: Statistical Data Visualization,\" J. Open Source Softw., vol. 6, no. 60, p. 3021, Apr. 2021. doi: 10.21105/joss.03021 [40] L. Biewald, \"Experiment Tracking with Weights and Biases,\" Weights & Biases, 2020. [Online]. Available: https://wandb.ai [41] S. Lundberg et al., \"From Local Explanations to Global Understanding with Explainable AI for Trees,\" Nat. Mach. Intell., vol. 2, no. 1, pp. 56–67, Jan. 2020. doi: 10.1038/s42256-019-0138-9 [42] N. Kumar and A. Mishra, \"Machine Learning Approaches for Cricket Winner Prediction Using Match Context Features,\" J. Data Analytics AI, vol. 5, no. 3, pp. 101–115, 2020. [43] S. Jha and R. Verma, \"Score Prediction and Win Probability Modeling in IPL Using XGBoost and Random Forest,\" in Proc. IEEE Sports Analytics Conference, 2022, pp. 34–41. [44] A. A. Constantinou, \"Dolores: A Model That Predicts Football Match Outcomes from All Over the World,\" Mach. Learn., vol. 108, no. 1, pp. 49–75, 2019. doi: 10.1007/s10994-018-5703-7 [45] K. A. S. Kaluarachchi and A. Aparna, \"CricAI: A Classification Based Tool to Predict the Outcome in ODI Cricket,\" in Proc. 5th Int. Conf. Information and Automation for Sustainability (ICIAfS 2010), Colombo, Sri Lanka, pp. 250–255, 2010. doi: 10.1109/ICIAFS.2010.5715681 [46] S. Salaboyn W. Wieckowski, and J. Watrobski, \"Swimmer Assessment Model (SWAM): Expert System Supporting Sport Potential Measurement,\" IEEE Access, vol. 10, pp. 5051–5068, 2022. doi: 10.1109/ACCESS.2021.3140392 [47] F. Nasim, M. A. Yousaf, S. Masood, A. Jaffar, and M. Rashid, \"Data-Driven Probabilistic Score Prediction for Batsman Performance in a Cricket Match,\" Intell. Autom. Soft Comput., vol. 36, no. 3, pp. 2965–2982, 2023. doi: 10.32604/iasc.2023.035401 [48] S. Chakraborty, L. Dey, A. Kairi, and S. Maity, \"Prediction of Winning Team in Soccer Game: A Supervised Machine Learning-Based Approach,\" in Advances on Mathematical Modeling and Optimization with Its Applications, CRC Press, Taylor and Francis, 2023, pp. 145–162. ISBN: 9781032479613
Copyright © 2026 Sahil Chougule, Harsh Patgave, Naimish Benade, Atharv Shinde, Prof. Jagdish Ingale. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Paper Id : IJRASET83520
Publish Date : 2026-06-07
ISSN : 2321-9653
Publisher Name : IJRASET
DOI Link : Click Here
Submit Paper Online
