The spread of false information on the internet and social media has emerged as a major global problem, with impacts on politics, public health, the economy and social harmony. The quick dissemination of false and manipulative information can lead to confusion, panic and ignorance among the people. Traditional technologies for false news identification are generally based on manual verification and rule-based algorithms, which are inefficient to handle large-scale and continuously growing digital information. To address these limitations, this paper proposes an Explainable Machine Learning Framework for Fake News Detection using Natural Language Processing (NLP), Ensemble Learning, and Explainable Artificial Intelligence (XAI). The proposed framework utilizes text preprocessing, TF-IDF feature extraction, feature optimization, Random Forest classification, and ensemble learning techniques for intelligent fake news classification. SHAP explainability analysis is incorporated to improve transparency and interpretability of prediction decisions. A Flask-based real-time fake news detection interface was also developed to enable browser-based news verification. Experimental evaluation demonstrated high fake news detection accuracy, improved robustness, reduced false positives, and enhanced interpretability. The proposed framework provides an intelligent, scalable, explainable, and practical solution for modern fake news detection systems. It achieved 99.61% accuracy, 99.85% precision, 99.41% recall, and 99.63% F1-score.
Introduction
This study presents an Explainable Machine Learning Framework for Fake News Detection that combines Natural Language Processing (NLP), Machine Learning (ML), Ensemble Learning, SHAP Explainable AI, and a Flask-based real-time interface. The rapid growth of social media and online news platforms has increased the spread of fake news, which can negatively impact public opinion, healthcare, politics, economic stability, and social harmony. Traditional detection methods based on manual verification and keyword filtering are often ineffective against dynamically generated misinformation and lack contextual understanding.
The proposed framework addresses these limitations through a multi-stage process that includes dataset collection, text preprocessing, TF-IDF feature extraction, feature optimization using Chi-Square selection, machine learning classification, ensemble learning, SHAP-based explainability, and real-time detection. News articles are cleaned through tokenization, stopword removal, punctuation removal, and lowercase conversion before being transformed into numerical vectors for analysis.
The system employs Decision Tree, Random Forest, and Logistic Regression classifiers, with ensemble voting techniques improving classification robustness and accuracy. A key feature of the framework is the integration of SHAP (SHapley Additive Explanations), which identifies the most influential textual features behind classification decisions, enhancing transparency and user trust.
Using publicly available datasets containing both fake and real news articles, the framework successfully detects misinformation while providing understandable explanations for its predictions. Overall, the proposed model offers an intelligent, accurate, explainable, and real-time solution for combating fake news in digital media environments.
Conclusion
This paper proposed an Explainable Machine Learning Framework for Fake News Detection using NLP, Ensemble Learning, SHAP Explainability Analysis, and Flask-based real-time detection interface. The proposed framework effectively identified fake news articles using intelligent textual feature analysis and explainable machine learning techniques. Experimental evaluation demonstrated high classification accuracy, reduced false positives, improved robustness, and enhanced transparency. The integration of SHAP explainability improved trustworthiness and interpretability of prediction decisions. The proposed framework provides an intelligent, scalable, explainable, and practical solution for modern fake news detection systems and contributes toward combating digital misinformation effectively.
References
[1] K. Shu et al., “Fake News Detection on Social Media: A Data Mining Perspective,” ACM SIGKDD Explorations Newsletter, vol. 19, no. 1, pp. 22–36, 2017.
[2] S. Lundberg and S. Lee, “A Unified Approach to Interpreting Model Predictions,” Advances in Neural Information Processing Systems, pp. 4765–4774, 2017.
[3] T. Mikolov et al., “Efficient Estimation of Word Representations in Vector Space,” arXiv preprint arXiv:1301.3781, 2013.
[4] S. Hochreiter and J. Schmidhuber, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[5] F. Pedregosa et al., “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
[6] C. Bisaillon, “Fake and Real News Dataset,” Kaggle, 2020.
[7] D. Dua and C. Graff, “UCI Machine Learning Repository,” University of California, Irvine, 2019.