The proliferation of social media platforms has dramatically accelerated the spread of both genuine and false information. Twitter, in particular, has emerged as a primary source of breaking news, but its open nature has made it equally prone to the circulation of misleading or fabricated stories. This research presents a web-based application for fake news detection using Natural Language Processing (NLP), Term Frequency–Inverse Document Frequency (TF-IDF) feature extraction, and the Passive Aggressive Classifier (PAC) for classification. The system processes textual inputs—either from the user or random samples—cleans and tokenizes the text, converts it into numerical vectors, and predicts whether the content is FAKE or REAL. The model was evaluated on two datasets: the Kaggle Fake News Dataset and the Simon Fraser University Fake News Dataset, achieving an accuracy between 92% and 97%. Developed using the Flask web framework, the application offers an intuitive interface for real-time fake news detection
Introduction
1. Overview
Social media, especially Twitter, has become a key source of real-time news. However, its speed and openness make it vulnerable to fake news. Manual fact-checking is too slow, so automated detection systems are needed.
This project proposes a lightweight machine learning approach using:
TF-IDF for feature extraction
Passive Aggressive Classifier (PAC) for real-time classification
The system is built for fast, scalable, browser-based fake news detection.
2. Literature Survey
LSTM: High accuracy but computationally heavy.
Reinforcement Learning + Blockchain: Good security but slow and complex.
Weakly Supervised Learning: Reduces manual labeling but may add noise.
Proposed Method: TF-IDF + PAC offers a balance of speed, simplicity, and accuracy.
3. Objective
To build a real-time fake news detection system for social media by:
Cleaning and preprocessing text using NLP techniques.
Converting text into TF-IDF vectors.
Using PAC for fast online learning.
Deploying as a Flask web app for real-time user interaction.
4. Methodology
Data Source: Kaggle dataset with ~56,715 labeled articles ("FAKE" or "REAL").
Preprocessing:
Lowercasing, removing stopwords/symbols
Tokenization & stemming
Feature Extraction:
TF-IDF to highlight rare, meaningful words
Model:
PAC, which updates only on incorrect predictions
Trained on 80% data, tested on 20%
Deployment:
Flask app for users to test tweets or headlines
5. Algorithms Used
TF-IDF:
TF: Frequency of a word in a document
IDF: Rarity of the word across all documents
Helps highlight key terms for classification
PAC (Passive Aggressive Classifier):
Passive when prediction is correct (no change)
Aggressive when prediction is wrong (updates model)
Efficient for real-time learning from streaming data
6. Results
System tested using real news headlines from the dataset
False Negatives (FN): 196 (REAL predicted as FAKE)
Performance Metrics:
Accuracy: 97.36%
Precision: 99.93%
Recall: 97.40%
F1-Score: 98.65%
Conclusion
The developed fake news detection system combines TF-IDF feature extraction with the Passive Aggressive Classifier to deliver accurate and efficient classification of news as real or fake. It achieved around 97% accuracy on the Kaggle dataset and 92% on the SFU dataset, with low false prediction rates. The model’s lightweight design enables fast processing, while its integration into a Flask-based web application ensures ease of use for real-time verification. This approach proves that optimized traditional machine learning methods can effectively combat misinformation on online platforms.
References
[1] S. Sharma, M. Saraswat, and A. K. Dubey, “Fake News Detection on Twitter,” Int. J. Web Inf. Syst., 2022.
[2] T. R. M. et al., “Cyber Sleuth: Harnessing NLP and Blockchain for Twitter-Based Fake NewsDetection,”AIMLA, 2024.
[3] S. Helmstetter, H. Paulheim, “Weakly Supervised Learning for Fake News Detection on Twitter,”ASONAM,2018.
[4] T. Bhatia, B. Manaskasemsak, “Detecting Fake News Sources on Twitter Using Deep Neural Networks,” ICET,2023