The Smart News Aggregator and Sentiment Analyzer is an advanced system designed to address the challenges of information overload and media bias by leveraging Natural LanguageProcessing (NLP) and Machine Learning (ML). This platform aggregates news from multiple sources, categorizes content, and performs sentiment analysis to classify articles as positive, negative, or neutral. Utilizing techniques such as TF-IDF for topic modelling and BERT/VADER for sentiment analysis, the system provides users with summarized, sentiment-labelled news. Experimental results demonstrate an accuracy of 92% with BERT, outperforming traditional models. The platform aims to enhance media literacy and reduce misinformation by offering transparent, bias-aware news consumption.
Introduction
In the digital age, news consumers face information overload, media bias, and fake news. A Smart News Aggregator system addresses these issues by collecting news from multiple sources and using Natural Language Processing (NLP) to analyze sentiment and reduce bias.
Key Problems:
Too much irrelevant content (information overload).
Media outlets often have ideological bias.
Fake news spreads rapidly and reduces trust.
Objectives:
Build a real-time news aggregator.
Apply sentiment analysis to detect emotional tone.
Deliver summarized and bias-aware news.
Improve media literacy by showing sentiment trends and bias.
Literature Survey Highlights:
BERT, RoBERTa, and XLNet outperform older models in detecting subtle sentiment and sarcasm.
VADER and TextBlob are efficient for real-time or lightweight analysis but less accurate on complex texts.
Past systems used TF-IDF, topic modeling, and social media sentiment tracking, but lacked integrated bias detection.
Methodology:
System Architecture:
News Aggregator: Pulls real-time data from APIs (e.g., NewsAPI, RSS).
Preprocessing: Includes tokenization, stemming, and stopword removal.
Sentiment Analysis: Uses BERT for deep sentiment understanding or VADER for speed.
Algorithms Used:
TF-IDF: For topic detection.
Sentiment Scoring:
Positive: ≥ 0.05
Negative: ≤ -0.05
Neutral: Between these thresholds.
Technologies:
Frontend: ReactJS, Bootstrap
Backend: Node.js, Express
Database: MongoDB
NLP Libraries: Hugging Face (BERT), NLTK (VADER)
Results:
On a dataset of 1,000 news articles:
BERT achieved 92% accuracy, F1-score: 0.91
VADER achieved 85% accuracy, F1-score: 0.83
Real-time response latency was under 2 seconds.
Conclusion
The system successfully mitigates information overload and bias. Future enhancements include:
• Multilingual support for global applicability.
• Blockchain integration for source authenticity.
Ethical Implications: Promotes media literacy and combats misinformation.
References
[1] Y. Zhang, \"BERT for Sentiment Analysis,\" IEEE Access, 2021.
[2] R. Patel, \"TF-IDF in News Aggregators,\" Scopus Journal, 2020.
[3] S. Kumar, \"VADER for Social Media Sentiment Tracking,\" UGC Approved Journal, 2019.
[4] A. Gupta, \"RoBERTa and XLNet for Sarcasm Detection in News,\" International Journal of Computer Applications, 2021.