This paper presents a hybrid ensemble system for real-time sentiment analysis of YouTube comments, integrating VADER lexicon-based analysis with RoBERTa transformer-based deep learning. The proposed web application leverages the YouTube Data API v3 to process up to 100,000 comments per session, including live stream chat, and classifies sentiments into five categories: Very Positive, Positive, Neutral, Negative, and Very Negative. The ensemble model prioritizes VADER compound scores with RoBERTa confidence as a contextual fallback, improving classification accuracy over single-model baselines. Results demonstrate robust sentiment distribution analysis with comprehensive visualizations including pie charts and word clouds, making the tool practical for content creators, platform analysts, and brand monitoring. The Flask-based interface provides accessibility for non-technical users, bridging the gap between NLP research and real-world social media analysis.
Introduction
The text describes a YouTube comment sentiment analysis system designed to handle large-scale, real-time opinion mining using a hybrid AI approach.
With the massive volume of YouTube comments, manual analysis is not feasible. Sentiment analysis helps classify user opinions, but YouTube data is challenging due to slang, sarcasm, emojis, and multilingual content. To address this, the study proposes a Flask-based web application that combines VADER (lexicon-based model) and RoBERTa (transformer-based deep learning model) in an ensemble system for improved accuracy.
The system collects up to 100,000 comments using the YouTube API, preprocesses them (cleaning text while retaining emojis), and analyzes sentiment using both models. VADER handles simple and lexically clear text quickly, while RoBERTa captures deeper contextual meaning. Their outputs are combined using an adaptive fusion method and categorized into five sentiment levels from Very Negative to Very Positive.
The system also includes visualization tools such as pie charts and word clouds, and allows export of results for further analysis. It supports both regular videos and live-stream chat analysis.
Experimental results on 250,000 comments across multiple categories show that the hybrid ensemble outperforms individual models, achieving 83.7% accuracy, better F1 score, and balanced performance across informal and complex text. It also finds a moderate positive correlation between sentiment and user engagement (likes).
Conclusion
This paper presented a hybrid ensemble sentiment analysis system for YouTube comments, combining VADER\'s social-media-adapted lexicon scoring with RoBERTa\'s transformer-based contextual understanding. The proposed system achieves 83.7% classification accuracy and 0.812 macro-F1 on a 250,000-comment benchmark, outperforming individual models. The Flask-based web application provides scalable, accessible sentiment analysis supporting up to 100,000 comments per session, including live stream chat processing, with rich visualization outputs.
Future work will explore: (1) multilingual sentiment analysis using mBERT or XLM-RoBERTa; (2) sarcasm-aware models integrating auxiliary irony detection; (3) fine-tuning RoBERTa on YouTube-specific labeled data; (4) real-time dashboard with streaming sentiment updates; and (5) integration of comment thread context for aspect-based sentiment analysis.
References
[1] Cisco, \"Cisco Annual Internet Report (2018-2023),\" Cisco Systems, White Paper, 2020.
[2] B. Liu, \"Sentiment Analysis and Opinion Mining,\" Synthesis Lectures on Human Language Technologies, vol. 5, no. 1, pp. 1-167, 2012.
[3] M. Thelwall, K. Buckley, G. Paltoglou, D. Cai, and A. Kappas, \"Sentiment strength detection in short informal text,\" Journal of the American Society for Information Science and Technology, vol. 61, no. 12, pp. 2544-2558, 2010.
[4] C. J. Hutto and E. Gilbert, \"VADER: A parsimonious rule-based model for sentiment analysis of social media text,\" in Proc. 8th Int. Conf. Weblogs and Social Media (ICWSM), 2014.
[5] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov, \"RoBERTa: A robustly optimized BERT pretraining approach,\" arXiv:1907.11692, 2019.
[6] B. Pang and L. Lee, \"Opinion mining and sentiment analysis,\" Foundations and Trends in Information Retrieval, vol. 2, no. 1-2, pp. 1-135, 2008.
[7] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, \"BERT: Pre-training of deep bidirectional transformers for language understanding,\" in Proc. NAACL, 2019, pp. 4171-4186.
[8] A. Mostafa, \"More than words: Social networks\' text mining for consumer brand sentiments,\" Expert Systems with Applications, vol. 40, no. 10, pp. 4241-4251, 2013.
[9] S. Ortigosa-Hernandez, J. D. Rodriguez, L. Alzate, M. Lucania, I. Inza, and J. A. Lozano, \"Measuring the class-imbalance extent of multi-class problems,\" Pattern Recognition Letters, vol. 34, no. 16, pp. 1969-1976, 2013.
[10] P. Nakov, A. Ritter, S. Rosenthal, F. Sebastiani, and V. Stoyanov, \"SemEval-2016 task 4: Sentiment analysis in Twitter,\" in Proc. SemEval, 2016, pp. 1-18.
[11] N. C. Dang, M. N. Moreno-Garcia, and F. De la Prieta, \"Sentiment analysis based on deep learning: A comparative study,\" Electronics, vol. 9, no. 3, p. 483, 2020.
[12] A. Onan, \"Sentiment analysis on massive open online course evaluations: A text mining and deep learning approach,\" Computer Applications in Engineering Education, vol. 29, no. 3, pp. 572-589, 2021.