The emergence of online misinformation has produced fake news detection as a significant field of study. Machine learning algorithms like Logistic Regression and Naïve Bayes are commonly used to detect fake news because they are efficient and interpretable. This research compares the performance of the two algorithms in identifying fake news based on accuracy, precision, recall, F1-score, and the time taken to execute. While precision is vital in classification, execution time becomes important in real-time scenarios. The research compares the models with respect to handling large data sets and classifying news articles successfully. Furthermore, the computational cost of each algorithm is compared to establish its usability for large-scale applications. The results indicate fundamental trade-offs between speed and precision, stressing the importance of optimal models in preventing misinformation. Future work may consider hybrid methods or deep learning methods to enhance rates of detection while ensuring computational efficiency.
Introduction
Recent advancements in Artificial Intelligence (AI) and Natural Language Processing (NLP) have significantly improved the analysis of digital content, making it increasingly critical to identify misinformation in online news and social media. Fake news detection, a key task in text classification, aims to distinguish between genuine and deceptive news articles, supporting applications like media regulation and automated fact-checking.
Two commonly used machine learning models for fake news detection are Logistic Regression and Naïve Bayes:
Logistic Regression is a discriminative, probabilistic model that estimates the likelihood of news being fake or real based on textual features. It is favored for its simplicity, interpretability, and effectiveness in binary classification tasks, especially with high-dimensional data transformed using techniques like TF-IDF.
Naïve Bayes is a generative model based on Bayes’ Theorem, assuming feature independence. Despite this often unrealistic assumption, it is computationally efficient and performs well with large text datasets. Multinomial Naïve Bayes combined with TF-IDF has shown high accuracy in fake news detection.
The study seeks to compare the accuracy and classification performance of these models. Both were trained and tested on a Kaggle dataset comprising real and fake news articles, processed using standard text-cleaning techniques (e.g., punctuation removal, lowercasing) and TF-IDF vectorization. A consistent methodology was used, with an 80/20 train-test split and identical preprocessing for both models.
F1-score: Balanced measure combining precision and recall.
A confusion matrix was used to further analyze performance through True Positives, True Negatives, False Positives, and False Negatives.
Key Takeaways:
Logistic Regression offers strong interpretability and works well with structured and balanced data but struggles with complex feature interactions.
Naïve Bayes is fast, simple, and surprisingly effective even with its independence assumption, making it suitable for large-scale applications.
Both models serve as foundational tools in fake news detection, and comparing them reveals trade-offs between interpretability, performance, and computational efficiency.
Conclusion
The effects of fake news detection algorithms go beyond accuracy in classification. Minimizing False Negatives is paramount, since undetected fake news has the potential to spread unchallenged and cause misinformation. Although Logistic Regression provides greater recall, enhancing Naïve Bayes\' sensitivity without loss of precision is one of the important challenges for future work.
A different use of such models is in automatic content moderation, where social media websites would flag or filter false news articles prior to their dissemination to large groups of people. This would lower the cost of human moderation while enhancing real-time detection effectiveness. But ethical issues arise—excessive filtering can be in violation of freedom of speech or add algorithmic bias, which would require open AI development.
Computationally, machine learning models are resource-intensive, commonly using power-hungry infrastructure. To reduce the environmental footprint, firms must seek out power-efficient model training and carbon-free computing solutions. This research was performed on a solar-powered system, highlighting a possible sustainable solution for AI research.
Dataset bias is still a limitation since this research employs one single-source dataset, so generalizability can be limited. Future experiments would employ multi-source, multilingual, and real-time datasets in order to increase robustness. Scaling models up with bigger datasets and applying ensemble techniques such as Gradient Boosting or Transformer-based architectures could also enhance performance.
Future research must also investigate real-time classification enhancement using hybrid models, integrating Logistic Regression and Naïve Bayes with deep learning. As explainable AI (XAI) continues to advance, making these models more understandable to policymakers and users can build trust and usage in fake news detection systems.
References
[1] Kumar, Rahul, et al.\"Fake News Detection Using a Logistic Regression Model and Natural Language Processing Techniques.\"ResearchGate, 2023, https://www.researchgate.net/publication/372374145_Fake_News_Detection_Using_a_Logistic_Regression_Model_and_Natural_Language_Processing_Techniques.
[2] Patwa, Parth, et al.\"Fake News Detection Using Naïve Bayes Classifier.\" IEEE Xplore, 2017, https://ieeexplore.ieee.org/document/8100379.
[3] Chen, Jian, et al.\"Fake News Detection Approach Based on Logistic Regression in Online Social Networks.\"SpringerLink, 2022, https://link.springer.com/chapter/10.1007/978-981-19-9304-6_6.
[4] Zhang, Wei, et al. \"A Review of Machine Learning Approaches for Fake News Detection.\" arXiv, 2021. https://arxiv.org/abs/1904.05305
[5] Nguyen, Thanh, et al. \"A Comparative Analysis of Logistic Regression and Naïve Bayes in Fake News Classification.\" Elsevier, 2020. https://arxiv.org/abs/2009.13859
[6] Bennato, Davide, et al. \"A Classification Algorithm to Recognize Fake News Websites.\" arXiv, 2019. https://arxiv.org/abs/1904.05305
[7] Riego, Neil Christian R., and Danny Bell Villarba. \"Utilization of Multinomial Naïve Bayes Algorithm and Term Frequency Inverse Document Frequency (TF-IDF Vectorizer) in Checking the Credibility of News Tweet in the Philippines.\" arXiv, 30 May 2023, https://arxiv.org/abs/2306.00018.
[8] Ahmed, Rania Azad M. San, et al. \"Fake News Detection Using Naïve Bayes and Long Short-Term Memory Algorithms.\" IAES International Journal of Artificial Intelligence (IJ-AI), vol. 11, no. 2, June 2022, pp. 748–754, https://www.researchgate.net/publication/358004087_Fake_News_Detection_Using_Naive_Bayes_and_Long_Short_Term_Memory_algorithms.
[9] Hussain, Md Gulzar, et al.\"Detection of Bangla Fake News Using MNB and SVM Classifier.\" arXiv, 29 May 2020, https://arxiv.org/abs/2005.14627.
[10] Bisaillon, Clément. \"Fake and Real News Dataset.\" Kaggle, 2018, https://www.kaggle.com/datasets/clmentbisaillon/fake-and-real-news-dataset.
[11] Shu, Kai, et al. \"Fake News Detection on Social Media: A Data Mining Perspective.\" ACM SIGKDD Explorations Newsletter, vol. 19, no. 1, 2017, pp. 22–36.
[12] Vosoughi, Soroush, Deb Roy, and Sinan Aral. \"The Spread of True and False News Online.\" Science, vol. 359, no. 6380, 2018, pp. 1146–1151.