This project presents a novel framework for automated cricket commentary generation using a combination of deep learning, computer vision, and natural language processing techniques. The system is designed to analyze cricket match footage and generate relevant play-by-play commentary without human intervention. Leveraging Vision Transformers (ViT) for frame-level visual feature extraction, the framework accurately identifies key game events such as \"Four\", \"Six\", and \"Bowled\". For each detected event, the system retrieves or generates contextually appropriate commentary using pre-trained language models like GPT-2, enhanced with a curated commentary dataset. The commentaries are evaluated using precision, recall, and F1-score against ground truth data. The application includes a user-friendly Streamlit interface that enables users to upload videos, view extracted events, hear generated commentary via gTTS, and assess model performance. Designed for both professional and amateur-level cricket games—especially those lacking live commentary—this framework aims to enhance viewer engagement, accessibility, and post-game analysis through automated, intelligent commentary.
Introduction
The project focuses on developing an AI-driven system for automated cricket commentary, addressing the challenge that many local or amateur matches lack live expert commentators. By leveraging modern deep learning techniques, specifically Vision Transformers (ViT) for video feature extraction and GPT-2 for natural language generation, the system generates real-time, context-aware cricket commentary. It also converts text to speech using Google Text-to-Speech (gTTS), enabling live audio narration.
Key techniques include:
Vision Transformers (ViT): Extract rich spatial-temporal features from match videos for accurate event detection.
GPT-2: Produces fluent, diverse, and context-relevant commentary text based on detected events.
NLP Filtering: Removes repetitive or irrelevant commentary to enhance viewer experience.
gTTS: Converts text commentary into audio for real-time playback.
Evaluation Metrics: Uses precision, recall, and F1-score to assess the accuracy and relevance of generated commentary compared to ground truth annotations.
The system mimics a human commentator’s process, ensuring timely, engaging, and accurate descriptions of cricket events like Six, Four, and Bowled. Real-time processing and adaptive learning optimize performance and responsiveness.
Evaluation results demonstrate strong performance, particularly high recall and balanced precision across key cricketing events, validating the approach’s effectiveness. The system is designed for scalability and can adapt to diverse match types, providing an accessible solution for automated sports commentary.
Conclusion
Based on the experimental evaluation of the automatic cricket commentary system, it can be concluded that the Vision Transformer (ViT)-based approach effectively identifies key cricketing events and generates context-aware commentary with high precision.
The model consistently achieved high recall scores across all tested events—Batsman Action, Bowled, and Four—indicating its robust ability to detect relevant gameplay actions.
Among the evaluated events, the Bowled category exhibited the best performance, with an F1-Score of 0.85, followed by Four with 0.84, and Batsman Action with 0.82. The consistently high Recall (1.0) in all categories highlights the system’s strength in capturing all relevant instances, while Precision values ranging from 0.70 to 0.75 demonstrate reliable commentary generation with minimal false positives.
These findings validate the capability of Vision Transformers in learning meaningful spatio-temporal patterns from cricket videos, enabling accurate, real-time commentary generation. The approach proves to be scalable for extending commentary across diverse match scenarios and can be further enhanced with deeper semantic analysis, emotion modeling, and multilingual support in future work.
References
[1] P. Andrews, O. Nordberg, N. Borch, F. Guribye, and M. Fjeld, \"Designing for Automated Sports Commentary Systems,\" Proc. ACM Int. Conf. Interactive Media Experiences, pp. 1–10, Jun. 2024.
[2] I. Arora and A. Choudhary, \"Automatic Cricket Commentary Generation: A Review,\" Int. J. Adv. Eng. Manag. (IJAEM), vol. 6, no. 9, pp. 50–58, Sep. 2021. ISSN: 2395-5252.
[3] A. D. Gujar and A. Nandgirwar, \"Identify Cricket Shots Using Linear Regression,\" Int. J. Creat. Res. Thoughts (IJCRT), vol. 12, no. 4, pp. 150–160, Apr. 2024. ISSN: 2320-2882.
[4] K. Javed, K. B. Bajwa, H. Malik, and A. Irtaza, \"An Efficient Framework for Automatic Highlights Generation from Sports Videos,\" IEEE Signal Process.Lett., vol. 23, no. 7, pp. 954–958, Jul. 2016. doi: 10.1109/LSP.2016.2573042.
[5] D. Karmaker, A. Z. M. E. Chowdhury, M. S. U. Miah, M. A. Imran, and M. H. Rahman, \"Cricket Shot Classification Using a Motion Vector,\" Proc. 2nd Int. Conf. Comput. Technol. Inf. Manag. (ICCTIM), Johor Bahru, Malaysia, pp. 125–129, 2015.doi: 10.1109/ICCTIM.2015.7224605.
[6] R. Kumar, D. Santhadevi, and B. Janet, \"Outcome Classification in Cricket Using Deep Learning,\" Proc. IEEE Int. Conf. Cloud Comput. Emerg. Markets (CCEM), Bengaluru, India, pp. 1–5, 2019. doi: 10.1109/CCEM48499.2019.00009.
[7] M. Mahajan, S. Kulkarni, M. Kulkarni, A. Sabale, and A. Thakar, \"Deep Learning in Cricket: A Comprehensive Survey of Shot Detection and Performance Analysis,\" Int. Res. J. Adv. Eng. Hub (IRJAEH), vol. 6, pp. 1–10, Jun. 2024. ISSN: 2584-2137.
[8] A. P. Nirgude, R. D. Sonone, S. V. Sonawane, R. S. Ahire, and B. Bodkhe, \"A Thorough Survey for Cricket Shot Analysis Using Deep Learning,\" Int. J. Sci. Res. Dev. (IJSRD), vol. 10, no. 2, pp. 200–206, Feb. 2022. ISSN: 2321-0613.