This paper presents a Real-Time Opinion Mining of Trending Twitter Topics Using NLP: Leveraging Social Media for Dynamic Sentiment Insights using state-of-the-art NLP models. Tweets are fetched via Twitter API v2 using Tweepy with advanced query filtering to ensure linguistic relevance and diversity. Sentiment classification is performed using the BERT-based model `nlptown/bert-base-multilingual-uncased-sentiment`, mapping tweets to a 1 (extremely negative)–5 (extremely positive) star scale. Data visualization through bar graphs, pie charts, boxplots, and word clouds reveals key public opinion patterns. Additionally, tweets are grouped by sentiment and summarized using Facebook’s `bart-large-cnn` model. The system enables dynamic extraction of insights for trending topics, integrating sentiment mining, engagement analysis, and abstractive summarization.
Introduction
This research proposes an end-to-end NLP framework for analyzing real-time Twitter data on trending topics. The system collects, classifies, visualizes, and summarizes tweets using transformer-based models like BERT (for sentiment analysis) and BART (for summarization). It supports fine-grained sentiment scoring (1–5 scale) and provides actionable insights through visualizations and engagement analysis.
2. Literature Survey
Past studies in Twitter sentiment analysis evolved from:
Rule-based and API-based systems (rigid, context-insensitive),
To traditional ML approaches like Naive Bayes, SVM, MLP, and XGBoost (better flexibility but poor in semantic understanding),
To the current trend of using transformers (BERT/BART) for their deep contextual awareness and scalability across languages and informal language patterns like sarcasm.
Challenges across all prior systems include handling:
Informal language,
Sarcasm and idioms,
Class imbalance, and
Limited adaptability to evolving content.
3. Objectives
Build a real-time sentiment analysis pipeline using BERT for multilingual, context-aware classification on a 1–5 scale.
Generate abstractive summaries for positive and non-positive tweet clusters using BART.
Provide interactive visualizations and perform manual validation for benchmarking and improved reliability.
4. Methodology
The pipeline consists of the following stages:
A) Tweet Fetching & Preprocessing
Tweets are fetched using Tweepy with Twitter API v2.
Preprocessing includes lemmatization, removing URLs, special characters, and mentions.
Sentiment-specific summaries effectively reflect opposing views (concerns vs. optimism).
Evaluated manually due to lack of reference summaries.
Model Performance:
BERT significantly outperforms traditional ML models (e.g., SVM, Naive Bayes) with highest accuracy (~78%) and F1-score (~0.50).
Conclusion
This paper delivers a comprehensive, modular, and scalable pipeline for real-time sentiment-aware engagement analysis and abstractive summarization of trending Twitter topics. Leveraging a multilingual BERT classifier, the system consistently outperformed traditional machine learning models in accuracy and macro-averaged F1-score, particularly in handling short, informal, and multilingual social media text. The BART-based summarization module effectively condensed sentiment-specific tweet clusters into fluent, contextually relevant narratives, with quality assured through human-centric evaluation. A diverse set of visualizations—including sentiment distribution charts, engagement plots, and keyword-based word clouds—offered interpretable and actionable insights into public opinion and engagement dynamics. While challenges such as API rate limits, sarcasm misclassification, and potential information loss in summarization remain, the framework’s adaptability makes it suitable for academic research, policy analysis, brand monitoring, and media reporting. Future work will focus on integrating streaming-based tweet collection, geospatial sentiment mapping, interactive dashboards, and expanded multilingual datasets to further enhance analytical depth and real-world applicability.
References
[1] K. Patel and K. Shah, \"Opinion Mining about a Product by Analyzing Public Tweets in Twitter,\" International Journal of Emerging Technology and Advanced Engineering, vol. 4, no. 1, 2014.
[2] A. Bhutani, et al., \"Tweet Sentiment Classification using TF-IDF and Machine Learning Algorithms,\" International Journal of Computer Sciences and Engineering, vol. 6, no. 9, 2018.
[3] S. Yadav and D. K. Vishwakarma, \"A Comparative Study of Sentiment Analysis Techniques: Naive Bayes, SVM, and MLP on Twitter Data,\" Procedia Computer Science, vol. 165, pp. 325–332, 2019.
[4] Z. Madhoushi, et al., \"Evaluating Traditional Classifiers on Large-Scale Twitter Data,\" in Proc. Int. Conf. on Computer and Knowledge Engineering (ICCKE), 2019.
[5] P. Kaur and G. Singh, \"Hybrid Naive Bayes and Decision Tree for Multiclass Sentiment Analysis,\" International Journal of Computer Applications, 2020.
[6] S. Acharya, \"Extractive Text Summarization Using Machine Learning,\" Capstone Project, 2022.
[7] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, \"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,\" in Proc. NAACL-HLT, 2019.
[8] M. Lewis, Y. Liu, N. Goyal, et al., \"BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,\" in Proc. ACL, 2020.
[9] T. Chen and C. Guestrin, \"XGBoost: A Scalable Tree Boosting System,\" in Proc. 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 785–794, 2016.
[10] F. Pedregosa, G. Varoquaux, A. Gramfort, et al., \"Scikit-learn: Machine Learning in Python,\" Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[11] Tweepy Developers, \"Tweepy: Twitter for Python,\" [Online]. Available: https://www.tweepy.org
[12] Hugging Face, \"Transformers: State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0,\" [Online]. Available:
https://huggingface.co/transformers