A Web-Based Application for Fake News Detection in Twitter with Machine Learning

Authors: Nagula Rishitha, Dr. D. Karunakar Reddy

DOI Link: https://doi.org/10.22214/ijraset.2025.74445

Abstract

The proliferation of social media platforms has dramatically accelerated the spread of both genuine and false information. Twitter, in particular, has emerged as a primary source of breaking news, but its open nature has made it equally prone to the circulation of misleading or fabricated stories. This research presents a web-based application for fake news detection using Natural Language Processing (NLP), Term Frequency–Inverse Document Frequency (TF-IDF) feature extraction, and the Passive Aggressive Classifier (PAC) for classification. The system processes textual inputs—either from the user or random samples—cleans and tokenizes the text, converts it into numerical vectors, and predicts whether the content is FAKE or REAL. The model was evaluated on two datasets: the Kaggle Fake News Dataset and the Simon Fraser University Fake News Dataset, achieving an accuracy between 92% and 97%. Developed using the Flask web framework, the application offers an intuitive interface for real-time fake news detection

Introduction

1. Overview

Social media, especially Twitter, has become a key source of real-time news. However, its speed and openness make it vulnerable to fake news. Manual fact-checking is too slow, so automated detection systems are needed.

This project proposes a lightweight machine learning approach using:

TF-IDF for feature extraction
Passive Aggressive Classifier (PAC) for real-time classification
The system is built for fast, scalable, browser-based fake news detection.

2. Literature Survey

LSTM: High accuracy but computationally heavy.
Reinforcement Learning + Blockchain: Good security but slow and complex.
Weakly Supervised Learning: Reduces manual labeling but may add noise.
Proposed Method: TF-IDF + PAC offers a balance of speed, simplicity, and accuracy.

3. Objective

To build a real-time fake news detection system for social media by:

Cleaning and preprocessing text using NLP techniques.
Converting text into TF-IDF vectors.
Using PAC for fast online learning.
Deploying as a Flask web app for real-time user interaction.

4. Methodology

Data Source: Kaggle dataset with ~56,715 labeled articles ("FAKE" or "REAL").
Preprocessing:
- Lowercasing, removing stopwords/symbols
- Tokenization & stemming
Feature Extraction:
- TF-IDF to highlight rare, meaningful words
Model:
- PAC, which updates only on incorrect predictions
- Trained on 80% data, tested on 20%
Deployment:
- Flask app for users to test tweets or headlines

5. Algorithms Used

TF-IDF:

TF: Frequency of a word in a document
IDF: Rarity of the word across all documents
Helps highlight key terms for classification

PAC (Passive Aggressive Classifier):

Passive when prediction is correct (no change)
Aggressive when prediction is wrong (updates model)
Efficient for real-time learning from streaming data

6. Results

System tested using real news headlines from the dataset
Confusion Matrix:
- True Positives (TP): 7345 (REAL correctly predicted)
- True Negatives (TN): 55 (FAKE correctly predicted)
- False Positives (FP): 5 (FAKE predicted as REAL)
- False Negatives (FN): 196 (REAL predicted as FAKE)

Performance Metrics:

Accuracy: 97.36%
Precision: 99.93%
Recall: 97.40%
F1-Score: 98.65%

Conclusion

The developed fake news detection system combines TF-IDF feature extraction with the Passive Aggressive Classifier to deliver accurate and efficient classification of news as real or fake. It achieved around 97% accuracy on the Kaggle dataset and 92% on the SFU dataset, with low false prediction rates. The model’s lightweight design enables fast processing, while its integration into a Flask-based web application ensures ease of use for real-time verification. This approach proves that optimized traditional machine learning methods can effectively combat misinformation on online platforms.

References

[1] S. Sharma, M. Saraswat, and A. K. Dubey, “Fake News Detection on Twitter,” Int. J. Web Inf. Syst., 2022. [2] T. R. M. et al., “Cyber Sleuth: Harnessing NLP and Blockchain for Twitter-Based Fake NewsDetection,”AIMLA, 2024. [3] S. Helmstetter, H. Paulheim, “Weakly Supervised Learning for Fake News Detection on Twitter,”ASONAM,2018. [4] T. Bhatia, B. Manaskasemsak, “Detecting Fake News Sources on Twitter Using Deep Neural Networks,” ICET,2023

Copyright

Copyright © 2025 Nagula Rishitha, Dr. D. Karunakar Reddy. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET74445

Publish Date : 2025-09-30

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here