Machine learning (ML) has become a powerful tool in sports analytics, especially for predicting cricket match outcomes. In this study, we compare the performance of five commonly used ML algorithms—Random Forest, XGBoost, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Naïve Bayes—for predicting the results of white-ball cricket matches (ODIs and T20s). We used historical match data with features like team names, toss winner, toss decision, venue, and match outcome. After preprocessing and training the models, we evaluated their performance using metrics such as accuracy, precision, recall, F1-score, and AUC. Our results show that XGBoost performed the best overall, followed closely by Random Forest. This study helps in understanding which algorithms work best for cricket match prediction and could be useful for teams, analysts, and even fans.
Introduction
One-day internationals (ODIs) in cricket are challenging to predict due to multiple fluctuating factors like team composition, toss results, and venue conditions. Advances in machine learning (ML) and the availability of structured match data have enabled the development of predictive models, though cricket’s inherent unpredictability still limits accuracy.
This research compares five ML algorithms—Random Forest, XGBoost, Support Vector Machine (SVM), K-Nearest Neighbors (KNN), and Naïve Bayes—in predicting ODI outcomes using a dataset from ESPN and Kaggle. The dataset includes key pre-match features such as teams, toss winner, toss decision, venue, and final match result, filtered for clear outcomes.
Models were trained and evaluated using accuracy, precision, recall, F1-score, and AUC metrics. Results showed XGBoost as the best performer (80% accuracy, 0.84 AUC), closely followed by Random Forest. Both excel at capturing complex, nonlinear relationships in data and handling overfitting. Simpler models like Logistic Regression, SVM, and KNN performed moderately or less well, with KNN struggling due to noise sensitivity.
The study highlights ML’s potential to provide actionable insights for team strategy, performance evaluation, and sports analytics, while emphasizing the importance of selecting appropriate models for complex, variable-rich sports data.
Conclusion
This study explored the effectiveness of various machine learning models in predicting match outcomes in white-ball cricket, focusing on ODI and T20 formats. Using consistent inputfeaturesliketeamnames,tossresults,andvenuedetails, itevaluatedmodelssuchasRandomForest,XGBoost,SVM, KNN, and Naïve Bayes. Results showed that ensemble models—particularlyRandomForestandXGBoost—offered superior accuracy and flexibility. While simpler models like SVM and Naïve Bayes are easier to interpret, they struggle with the complex, nonlinear nature of cricket data.
The study underscores machine learning’s potential in sports analytics when backed by reliable data and relevant features. Future work could enhance these models by integrating player stats, form, real-time updates, and environmentalfactors,makingpredictionsmorevaluablefor teams, analysts, and audiences alike.
References
[1] Pedregosaetal.,\"Scikit-learn:MachinelearninginPython,\"JMLR,vol.12,pp.2825–2830,2011.
[2] H. Liu, C. Li, and J. Liu, \"Cricket match outcome prediction using machine learning techniques,\" J. Sports Analytics, vol. 7, no. 4, pp. 239–255, 2021.
[3] R. Kumar and A. Sharma, \"Match outcome prediction in T20 cricket using ML algorithms,\" Procedia Comput. Sci., vol. 167, pp. 2310–2319, 2020.
[4] A. Shaikh, S. Deshmukh, and P. Kulkarni, \"Performance metrics in imbalanced datasets for sports analytics,\" Int. J. Comput. Appl., vol. 184, no. 18, pp. 9–15, 2022.
[5] M. Jain and M. Rajan, \"F1 Score as a metric in sports classification,\" in Proc. ICCIDS, 2019, vol. 142, pp. 315–320.
[6] A. Singh et al., \"T20 World Cup winner prediction using ML,\" Int. J. Comput. Appl., vol. 975, pp. 8887, 2020.
[7] A. Nimmagadda et al., \"Cricket score and winning prediction using data mining,\" Int. J. Adv. Res. Dev., vol. 3, no. 3, pp. 299–302, 2018.
[8] S. Riyanto et al., \"Analysis of performance metrics in imbalanced multi-class text classification,\" Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 6, 2023.
[9] M. Dalal, \"Cricket match analytics and prediction using ML,\" Int. J. Comput. Appl., vol. 186, no. 26, pp. 1–5, 2024.
[10] A. Pathak and A. Wadhwa, \"Outcome prediction in T20 cricket using ML,\" J. Stat. Optim. Data Sci., vol. 10, no. 1, pp. 15–25, 2024.
[11] S. Singh et al., \"Predicting match outcomes in cricket using ML,\" ResearchGate, 2025.
[12] G. Kumarapandiyan et al., \"Predicting high run chases in T20 using ML,\" Alexandria Eng. J., vol. 64, pp. 1–10, 2025.
[13] S. Bonacorso, \"AI applications in football analytics,\" J. Sports Sci., vol. 42, no. 2, pp. 123–134, 2024.
[14] Y. Yeo and S. Park, \"Computer science in sport,\" Int. J. Comput. Sci. Sport, vol. 23, no. 1, pp. 45–60, 2024.
[15] T. Atikah et al., \"Performance metrics in imbalanced classification,\" Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 6, 2023.
[16] A. Shaikh et al., \"Performance metrics for sports data,\" Int. J. Comput. Appl., vol. 184, no. 18, pp. 9–15, 2022.
[17] R. Kumar and A. Sharma, \"T20 cricket outcome prediction,\" Procedia Comput. Sci., vol. 167, pp. 2310–2319, 2020.
[18] S. Riyanto et al., \"Imbalanced data classification metrics,\" Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 6, 2023.
[19] M. Dalal, \"ML-based cricket prediction,\" Int. J. Comput. Appl., vol. 186, no. 26, 2024.
[20] A. Pathak and A. Wadhwa, \"T20 match prediction via ML,\" J. Stat. Optim. Data Sci., vol. 10, no. 1, 2024.
[21] M. Jain and M. Rajan, \"F1 Score in sports ML,\" in Proc. ICCIDS, 2019.
[22] S. Singh et al., \"ML for cricket outcome prediction,\" ResearchGate, 2025.
[23] T. Atikah et al., \"Metrics in imbalanced multi-class data,\" Int. J. Adv. Comput. Sci. Appl., vol. 14, no. 6, 2023.
[24] S. Bonacorso, \"AI in football codes,\" J. Sports Sci., vol. 42, no. 2, pp. 123–134, 2024.