The digital era has ushered in unprecedented access to information but also a surge in fake news, undermining public trust, swaying political landscapes, and threatening societal stability. This research introduces a robust, AI-driven fake news detection system that integrates advanced Natural Language Processing (NLP) and machine learning to distinguish authentic news from fabricated content. The system leverages a fine-tuned DistilBERT transformer model for deep contextual text analysis, complemented by Logistic Regression and Random Forest classifiers, unified through a weighted ensemble approach to achieve a remarkable 93.2% accuracy. Trained on a huge samples of 100,000 articles from Kaggle, ISOT, and FakeNewsNet, spanning politics, health, and technology, the system ensures the broad applicability. A scalable web platform, built with Flask, FastAPI, and styled using Tailwind CSS, offers an intuitive interface for users to submit articles, view real-time predictions, and explore interactive visualizations like attention heatmaps and feature importance charts. Explainable AI (XAI) techniques enhance transparency, fostering trust among journalists, educators, policymakers, and the public. Deployed on a cloud-native architecture with PostgreSQL and Celery, the system complies with GDPR and India’s IT Act, 2000, ensuring data privacy and scalability. This solution empowers stakeholders to combat misinformation effectively, bridging the gap between advanced AI and practical, user-centric applications.
Introduction
The rapid expansion of digital platforms has revolutionized information sharing but also led to the widespread issue of fake news—misleading content that threatens public trust and democratic processes. Traditional fact-checking is slow and limited, while existing automated detection systems face challenges like lack of transparency and difficulty handling diverse data.
This research introduces a comprehensive AI-driven fake news detection system combining a fine-tuned DistilBERT model with Logistic Regression and Random Forest classifiers, integrated via a weighted ensemble to enhance accuracy. The system features a user-friendly web interface with explainability tools (attention heatmaps, feature importance charts, SHAP, and LIME) to foster transparency and trust. It’s scalable, cloud-native, GDPR-compliant, and designed for journalists, educators, researchers, and the public.
The system’s architecture includes thorough data preprocessing, multi-model inference, explainability layers, and a backend built with Flask and FastAPI, supported by PostgreSQL and Celery for asynchronous processing. The frontend, developed using React.js and Tailwind CSS, offers real-time classification, interactive visualizations, report downloads, and accessibility features.
Performance evaluation on 15,000 articles shows the ensemble achieves 93.2% accuracy with robust precision and recall, operating efficiently with GPU acceleration and supporting high concurrency. User testing reports high satisfaction with usability, explainability features, and report generation, confirming the system’s practical value in combating misinformation.
Conclusion
This fake news detection system offers a powerful, transparent, and user-centric solution to one of the most pressing challenges of the digital age: misinformation. By integrating DistilBERT’s advanced contextual analysis with the statistical rigor of Logistic Regression and Random Forest, the system achieves a robust 93.2% accuracy, outperforming standalone models. XAI components, including attention heatmaps, SHAP, and feature importance charts, make the system’s decisions transparent, fostering trust among journalists, educators, policymakers, and the public. Compliance with GDPR and India’s IT Act, 2000, ensures ethical deployment across diverse contexts, from media organizations to educational institutions.
The system is designed as a decision-support tool, augmenting human judgment rather than replacing it. It empowers users to verify news authenticity quickly, reducing the time and effort required for fact-checking. Its ability to handle diverse topics, from political propaganda to health misinformation, makes it a versatile asset in combating the global misinformation crisis.
References
[1] Ahmed, H., Traore, I., & Saad, S. (2018). Detecting opinion spams and fake news using neural and ensemble-based models.
[2] Conroy, N. J., Rubin, V. L., & Chen, Y. (2015). Workshop on Multimodal Deception Detection.
[3] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018).
[4] Hassan, N., Li, C., & Tremayne, M. (2017). ClaimBuster: The First-Ever End-to-End Fact-Checking System. Proceedings of the VLDB Endowment.
[5] Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780.
[6] Jain, S., & Wallace, B. C. (2019). Attention is not Explanation. arXiv:1902.10186.
[7] Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019).