The exponential growth of digital media and social networking platforms has transformed the way information is created and consumed. While these platforms provide rapid access to news, they have also facilitated the widespread dissemination of fake news. Fake news refers to intentionally false or misleading information presented as authentic news, which can influence public opinion, disrupt social harmony, and erode trust in media. Traditional manual fact-checking approaches are time-consuming and inadequate for handling the massive volume of online content. Therefore, automated fake news detection systems are essential.
This paper proposes a machine learning-based approach for detecting fake news using Natural Language Processing (NLP) techniques. The proposed system pre-processes textual data and applies feature extraction methods such as Bag of Words (BoW) and Term Frequency–Inverse Document Frequency (TF-IDF). Multiple machine learning classifiers, including Logistic Regression, Naïve Bayes, Random Forest, and Support Vector Machine, are trained and evaluated. The performance of these models is assessed using accuracy, precision, recall, and F1-score. Experimental results demonstrate that classical machine learning models can effectively classify news articles as fake or real with reliable accuracy. The proposed system offers a lightweight and efficient solution for fake news detection and can be extended to real-time applications in the future.
Introduction
The rapid growth of the internet and social media has made information easily accessible but has also enabled the widespread circulation of fake news—false or misleading information intended to deceive. Fake news poses serious risks to society by influencing public opinion, elections, and responses to health crises. Traditional human-based fact-checking methods are slow and ineffective for handling the massive volume of online content.
To address this issue, recent advances in Machine Learning (ML) and Natural Language Processing (NLP) enable automated fake news detection by analyzing linguistic patterns and textual features. This paper proposes a machine learning-based fake news detection system using classical ML algorithms. The methodology includes data preprocessing, feature extraction using Bag of Words and TF-IDF, and training multiple classifiers to identify the most effective model.
The literature review shows that while deep learning models such as CNNs and LSTMs achieve high accuracy, they require large datasets and high computational resources. In contrast, classical ML models remain practical due to their simplicity, interpretability, and lower cost.
Experiments conducted on a benchmark dataset (e.g., Kaggle Fake News dataset) using an 80:20 train-test split demonstrate that Random Forest and Support Vector Machine classifiers deliver the best overall performance, with Logistic Regression also showing competitive results. Model evaluation is based on accuracy, precision, recall, and F1-score.
Overall, the study concludes that classical machine learning techniques provide an efficient and reliable solution for fake news detection, making them suitable for both academic research and real-world applications when combined with proper preprocessing and feature engineering.
Conclusion
This paper presented a machine learning-based approach for fake news detection using NLP techniques. Classical machine learning models were trained and evaluated using TF-IDF and BoW features. Experimental results showed that Random Forest and SVM achieved the highest accuracy, while Logistic Regression offered a lightweight and efficient alternative.
The proposed system demonstrates that classical ML techniques are effective for fake news detection and suitable for academic and real-world applications. In the future, this work can be extended by incorporating deep learning models such as LSTM and BERT, using multilingual datasets, and deploying the system as a real-time web or mobile application.
References
[1] K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, “Fake News Detection on Social Media: A Data Mining Perspective,” ACM SIGKDD Explorations Newsletter, vol. 19, no. 1, pp. 22–36, 2017.
[2] H. Ahmed, I. Traore, and S. Saad, “Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques,” in Proc. Int. Conf. Intelligent, Secure and Dependable Systems, Springer, 2017.
[3] N. Ruchansky, S. Seo, and Y. Liu, “CSI: A Hybrid Deep Model for Fake News Detection,” in Proc. ACM CIKM, 2017.
[4] R. K. Kaliyar, A. Goswami, and P. Narang, “DeepFake: Improving Fake News Detection Using Deep Learning,” IEEE Access, vol. 8, 2020.