This study presents a comparative analysis of traditional machine learning models and a transformer-based approach for fake news detection. I investigate the performance of Logistic Regression and Support Vector Machines (SVM), utilizing Term Frequency–Inverse Document Frequency (TF-IDF) features, against a fine-tuned Bidirectional Encoder Representations from Transformers (BERT) model. Using a balanced dataset of 4,000 fake and 4,000 real news articles, I apply standard preprocessing and perform an 80/20 train-test split. Experimental results demonstrate that Logistic Regression and SVM achieve accuracies of 97.94% and 98.62%, respectively, while BERT significantly outperforms both with an accuracy of 99.94%. Each model is evaluated using metrics including precision, recall, F1-score, and ROC-AUC, as well as training time and inference latency. The results emphasize the trade-offs between computational cost and classification performance, with BERT offering the highest accuracy at the expense of increased resource demands. This research contributes to the development of effective and scalable solutions for combating misinformation in digital media.
Introduction
The rapid spread of misinformation ("fake news") on digital platforms threatens public trust and societal stability, making fast and accurate detection critical. Traditional rule-based systems are limited in handling evolving language, while machine learning (ML) methods offer scalable, adaptive solutions by learning linguistic patterns from data.
This research compares three ML models for fake news detection: Logistic Regression and Support Vector Machines (SVM) using TF-IDF features, and a fine-tuned BERT transformer model. Using a balanced dataset of 8,000 news articles (half fake, half real), models were trained and evaluated on accuracy, precision, recall, F1-score, and ROC-AUC.
Results show that Logistic Regression achieved 97.94% accuracy, SVM improved slightly to 98.62%, and BERT significantly outperformed both with near-perfect accuracy of 99.94%. While BERT’s deep contextual understanding yields superior performance, it demands much higher computational resources compared to the faster, lightweight traditional models. TF-IDF analysis indicated that words related to journalistic integrity were strong indicators of real news, whereas emotional or sensational terms were linked to fake news.
Conclusion
I have demonstrated that transformer-based models significantly improve fake news detection at the cost of increased computational resources. Traditional models remain viable in resource constrained settings. Future work includes knowledge distillation to reduce BERT’s footprint, ensemble methods combining TF IDF and BERT, and cross domain validation on social media streams.
References
[1] J. Devlin et?al., “BERT: Pre training of Deep Bidirectional Transformers for Language Understanding,” NAACL HLT, 2019.
[2] C.?D. Manning et?al., Introduction to Information Retrieval. Cambridge Univ. Press, 2008.
[3] T.?M. Cover and J.?A. Thomas, Elements of Information Theory, 2nd ed. Wiley, 2006.
[4] Y. Liu et?al., “RoBERTa: A Robustly Optimized BERT Pretraining Approach,” arXiv preprint arXiv:1907.11692, 2019.
[5] S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python. O’Reilly Media, 2009.
[6] F. Pedregosa et?al., “Scikit learn: Machine Learning in Python,” J. Mach. Learn. Res., vol. 12, pp. 2825–2830, 2011.
[7] T.?S. Wagner, “A comparative study of text classification algorithms,” Int. J. Data Mining, vol. 5, no. 2, pp. 45–60, 2020.
[8] A. Vaswani et?al., “Attention Is All You Need,” NeurIPS, pp. 5998–6008, 2017.
[9] K. He et?al., “Deep Residual Learning for Image Recognition,” CVPR, pp. 770–778, 2016.
[10] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. MIT Press, 2016.