Fake News Detection using Machine Learning

Authors: Aiman Zaheer, Dr. Rohitashwa Pandey

DOI Link: https://doi.org/10.22214/ijraset.2026.81892

Abstract

The exponential growth of social media has created an unprecedented environment for the rapid dissemination of misinformation. Existing fake news detection systems predominantly rely on either textual features or propagation patterns in isolation, limiting their effectiveness against sophisticated misinformation campaigns. This paper proposes an advanced multi-model framework that integrates deep learning architectures — specifically Bidirectional Long Short-Term Memory (BiLSTM) and a fine-tuned BERT (Bidirectional Encoder Representations from Transformers) model — alongside classical machine learning approaches including Support Vector Machines (SVM), K-Nearest Neighbors (KNN), Random Forest, and XGBoost. The models are further combined through an ensemble voting strategy to maximize classification accuracy. Experiments conducted on the LIAR and FakeNewsNet benchmark datasets demonstrate that the proposed BERT-based ensemble achieves a peak accuracy of 96.4%, outperforming individual classical models by a significant margin. Additionally, model explainability is incorporated through SHAP (SHapley Additive explanations) values, enabling interpretable prediction outputs. The results validate the superiority of combining deep contextual embeddings with classical feature engineering for robust fake news detection.

Introduction

This paper presents an AI-based Fake News Detection System that uses Machine Learning (ML), Deep Learning (DL), and Explainable AI (XAI) techniques to identify and classify fake news on social media and online platforms. The rapid growth of social media has increased the spread of misinformation, making automated fake news detection essential. Traditional fact-checking and rule-based approaches are inadequate due to the massive volume and speed of online content.

The proposed framework compares several classification models, including KNN, SVM, Random Forest, XGBoost, BiLSTM, and BERT, using the LIAR and FakeNewsNet benchmark datasets. Text data undergoes preprocessing steps such as noise removal, tokenization, stop-word removal, lemmatization, and normalization. Classical ML models use TF-IDF features, while deep learning models employ GloVe embeddings and BERT contextual embeddings.

To improve performance, the study introduces an ensemble learning model that combines the predictions of top-performing classifiers through weighted soft voting. Additionally, SHAP (SHapley Additive Explanations) is integrated to provide transparent, word-level explanations for model decisions, enhancing interpretability and user trust.

Experimental results show that deep learning models outperform traditional ML methods, with BERT achieving 95.3% accuracy on the LIAR dataset. The proposed ensemble model achieved the highest accuracy of 96.4%, demonstrating improved precision, recall, and robustness across datasets. SHAP analysis revealed that hedging terms, emotional language, and lack of credible sources are strong indicators of fake news.

Conclusion

This paper presented a comprehensive multi-model framework for fake news detection that systematically benchmarks classical machine learning approaches against state-of-the-art deep learning architectures. The proposed ensemble model, combining BERT, BiLSTM, SVM, Random Forest, and XGBoost through weighted soft voting, achieved peak accuracies of 96.4% and 95.6% on the LIAR and FakeNewsNet datasets respectively, surpassing all individual model baselines. The integration of SHAP-based explainability addresses a critical limitation of black-box deep learning models by providing interpretable, token-level justification for classification decisions. This feature is particularly valuable for deployment in sensitive domains such as electoral processes and public health communication. Future research directions include extending the framework to multilingual and code-mixed (Hinglish) fake news detection, incorporating multimodal features from news images and videos, and developing real-time detection pipelines capable of processing high-velocity social media streams. The application of continual learning strategies to handle concept drift in evolving misinformation patterns also represents a promising avenue for further investigation.

References

Copyright

Copyright © 2026 Aiman Zaheer, Dr. Rohitashwa Pandey. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET81892

Publish Date : 2026-05-04

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here