A Real-Time Opinion Mining of Trending Twitter Topics Using NLP: Leveraging Social Media for Dynamic Sentiment Insights

Authors: Kudumula Venkat Sai Karthik Yadav, G. Praveen Babu

DOI Link: https://doi.org/10.22214/ijraset.2025.73692

Abstract

This paper presents a Real-Time Opinion Mining of Trending Twitter Topics Using NLP: Leveraging Social Media for Dynamic Sentiment Insights using state-of-the-art NLP models. Tweets are fetched via Twitter API v2 using Tweepy with advanced query filtering to ensure linguistic relevance and diversity. Sentiment classification is performed using the BERT-based model `nlptown/bert-base-multilingual-uncased-sentiment`, mapping tweets to a 1 (extremely negative)–5 (extremely positive) star scale. Data visualization through bar graphs, pie charts, boxplots, and word clouds reveals key public opinion patterns. Additionally, tweets are grouped by sentiment and summarized using Facebook’s `bart-large-cnn` model. The system enables dynamic extraction of insights for trending topics, integrating sentiment mining, engagement analysis, and abstractive summarization.

Introduction

This research proposes an end-to-end NLP framework for analyzing real-time Twitter data on trending topics. The system collects, classifies, visualizes, and summarizes tweets using transformer-based models like BERT (for sentiment analysis) and BART (for summarization). It supports fine-grained sentiment scoring (1–5 scale) and provides actionable insights through visualizations and engagement analysis.

2. Literature Survey

Past studies in Twitter sentiment analysis evolved from:

Rule-based and API-based systems (rigid, context-insensitive),
To traditional ML approaches like Naive Bayes, SVM, MLP, and XGBoost (better flexibility but poor in semantic understanding),
To the current trend of using transformers (BERT/BART) for their deep contextual awareness and scalability across languages and informal language patterns like sarcasm.

Challenges across all prior systems include handling:

Informal language,
Sarcasm and idioms,
Class imbalance, and
Limited adaptability to evolving content.

3. Objectives

Build a real-time sentiment analysis pipeline using BERT for multilingual, context-aware classification on a 1–5 scale.
Generate abstractive summaries for positive and non-positive tweet clusters using BART.
Provide interactive visualizations and perform manual validation for benchmarking and improved reliability.

4. Methodology

The pipeline consists of the following stages:

A) Tweet Fetching & Preprocessing

Tweets are fetched using Tweepy with Twitter API v2.
Preprocessing includes lemmatization, removing URLs, special characters, and mentions.

B) Sentiment Classification (BERT)

Uses nlptown/bert-base-multilingual-uncased-sentiment.
Classifies tweets on a 1–5 scale, improving granularity and multilingual support.

C) Engagement-Based Filtering

Filters tweets based on likes, retweets, replies to prioritize highly engaging content.
Periodic manual sampling ensures label accuracy and supports continuous model evaluation.

D) Visualization

Includes bar/pie charts, scatter plots, box plots, and word clouds to reflect sentiment distribution, tweet engagement, and popular keywords.

E) Grouped Sentiment Clustering

Tweets are split into positive (4–5 stars) and non-positive (1–3 stars) groups for targeted summarization.

F) Abstractive Summarization (BART)

Uses facebook/bart-large-cnn to generate summaries.
Summaries are manually evaluated for fluency and contextual relevance.

G) Evaluation & Benchmarking

Manual ground truth labels are used to benchmark BERT against traditional models (Naive Bayes, SVM, XGBoost, MLP).
Metrics: Accuracy, Precision, Recall, F1-score using scikit-learn.

H) System Deployment

Built in Python using Hugging Face, Tweepy, and Google Colab.
Designed to be modular and scalable, supporting future tasks like topic modeling or geolocation tracking.

5. Results & Analysis

Key insights from applying the system to tweets on “AI replacing human jobs”:

Sentiment Distribution:
- Majority of tweets show extreme negativity (1-star) followed by extreme positivity (5-star)—indicating a polarized sentiment trend.
- Moderate opinions (2–4 stars) are fewer.
Engagement Insights:
- 1-star tweets have highest engagement (likes/retweets), showing that negative sentiment drives virality.
- Positive tweets receive significantly lower interaction.
Summarization:
- Sentiment-specific summaries effectively reflect opposing views (concerns vs. optimism).
- Evaluated manually due to lack of reference summaries.
Model Performance:
- BERT significantly outperforms traditional ML models (e.g., SVM, Naive Bayes) with highest accuracy (~78%) and F1-score (~0.50).

Conclusion

This paper delivers a comprehensive, modular, and scalable pipeline for real-time sentiment-aware engagement analysis and abstractive summarization of trending Twitter topics. Leveraging a multilingual BERT classifier, the system consistently outperformed traditional machine learning models in accuracy and macro-averaged F1-score, particularly in handling short, informal, and multilingual social media text. The BART-based summarization module effectively condensed sentiment-specific tweet clusters into fluent, contextually relevant narratives, with quality assured through human-centric evaluation. A diverse set of visualizations—including sentiment distribution charts, engagement plots, and keyword-based word clouds—offered interpretable and actionable insights into public opinion and engagement dynamics. While challenges such as API rate limits, sarcasm misclassification, and potential information loss in summarization remain, the framework’s adaptability makes it suitable for academic research, policy analysis, brand monitoring, and media reporting. Future work will focus on integrating streaming-based tweet collection, geospatial sentiment mapping, interactive dashboards, and expanded multilingual datasets to further enhance analytical depth and real-world applicability.

References

[1] K. Patel and K. Shah, \"Opinion Mining about a Product by Analyzing Public Tweets in Twitter,\" International Journal of Emerging Technology and Advanced Engineering, vol. 4, no. 1, 2014. [2] A. Bhutani, et al., \"Tweet Sentiment Classification using TF-IDF and Machine Learning Algorithms,\" International Journal of Computer Sciences and Engineering, vol. 6, no. 9, 2018. [3] S. Yadav and D. K. Vishwakarma, \"A Comparative Study of Sentiment Analysis Techniques: Naive Bayes, SVM, and MLP on Twitter Data,\" Procedia Computer Science, vol. 165, pp. 325–332, 2019. [4] Z. Madhoushi, et al., \"Evaluating Traditional Classifiers on Large-Scale Twitter Data,\" in Proc. Int. Conf. on Computer and Knowledge Engineering (ICCKE), 2019. [5] P. Kaur and G. Singh, \"Hybrid Naive Bayes and Decision Tree for Multiclass Sentiment Analysis,\" International Journal of Computer Applications, 2020. [6] S. Acharya, \"Extractive Text Summarization Using Machine Learning,\" Capstone Project, 2022. [7] J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, \"BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,\" in Proc. NAACL-HLT, 2019. [8] M. Lewis, Y. Liu, N. Goyal, et al., \"BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension,\" in Proc. ACL, 2020. [9] T. Chen and C. Guestrin, \"XGBoost: A Scalable Tree Boosting System,\" in Proc. 22nd ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 785–794, 2016. [10] F. Pedregosa, G. Varoquaux, A. Gramfort, et al., \"Scikit-learn: Machine Learning in Python,\" Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011. [11] Tweepy Developers, \"Tweepy: Twitter for Python,\" [Online]. Available: https://www.tweepy.org [12] Hugging Face, \"Transformers: State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0,\" [Online]. Available: https://huggingface.co/transformers

Copyright

Copyright © 2025 Kudumula Venkat Sai Karthik Yadav, G. Praveen Babu. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET73692

Publish Date : 2025-08-15

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here