Design and Development of Fake News Detection System Using Machine Learning Techniques

Authors: Himanshu Singh, Diksha Singh, Atul Kumar, Harsh Panwar, Dr. Saumya Chaturvedi, Ms. Poonam Verma, Dr. Sureshwati

DOI Link: https://doi.org/10.22214/ijraset.2025.68497

Certificate: View Certificate

Abstract

The rapid expansion of digital communication and social media, the spread of fake news has become a growing concern. Detecting and filtering out fake news is crucial, yet it remains a challenging task due to limited datasets and effective analysis techniques.This study presents a machine learning-based approach to detecting fake news. The system extracts textual features using Term Frequency-Inverse Document Frequency (TF-IDF) with bag-of-words and n-grams. A Support Vector Machine (SVM) classifier is then employed to differentiate between authentic and fake news. Additionally, a dataset containing both real and fake news articles is introduced for training the model. The results highlight the effectiveness of the proposed system in accurately identifying misinformation.

Introduction

Problem Context:
The rapid spread of fake news, fueled by social media, poses significant risks by manipulating public opinion, harming reputations, and spreading misinformation—especially on health-related topics like COVID-19. The WHO has even warned about an "infodemic" where the overload of both true and false information confuses the public.

Proposed Solution: Machine Learning Model for Fake News Detection

Key Steps in the Approach:

Text Preprocessing: Clean the data by removing stop words, punctuation, and special characters.
Text Representation: Use Bag-of-Words, N-Grams, and TF-IDF to convert text to numerical form.
Feature Extraction: Analyze metadata like source, author, date, and sentiment.
Classification: Use a Support Vector Machine (SVM) model to classify news as real or fake, assigning a confidence score instead of just a binary label.

Related Work Overview:

Past studies used classifiers like Naive Bayes and LSVM with varying accuracies.
TF-IDF combined with LSVM yielded 92% accuracy, but LSVM struggles with complex, non-linear data.
Some methods included multimedia or social metadata but often ignored metadata like the author or source.
Researchers highlighted the need for confidence-based classification rather than binary labels.

System Design & Implementation:

A. Preprocessing:

Textual Data: Cleaned, stemmed, and numerically encoded.
Categorical Data: Sources and authors encoded for better pattern recognition.
Numerical Data: Date split into components; sentiment analysis performed.

B. Model Training & Validation:

SVM model trained and validated using cross-validation.
Classification based on a confidence score (positive = real, negative = fake).

C. Optimization:

SVM parameters like cost, kernel type, gamma, and epsilon were fine-tuned to maximize accuracy.

D. Deployment:

Once optimized, the model is used to classify new articles and provide a confidence score for reliability.

Experiments & Results:

Dataset Used:

Combined two datasets:
- Fake news from 244 flagged websites (12,999 entries)
- Real news from major outlets (e.g., CNN, NYT, Reuters)
Features included top words, N-grams, date, sentiment, source, author, and label.

Findings:

Bag-of-Words and 2-word N-Grams were most effective.
Sentiment score had limited impact.
Source, author, and date greatly improved model accuracy.
Best results came from encoding the author's name, achieving 100% accuracy.

Final Model Parameters:

Cost (C): 300
Epsilon (ε): 0.0001
Gamma (γ): 0.001
Linear and polynomial SVM kernels performed best.

Conclusion

Our study confirms that Support Vector Machine (SVM) is highly effective in identifying fake news. Key takeaways include: 1) The most crucial features for detection are text, author, source, and date. 2) N-Gram models outperform Bag-of-Words when analyzing larger datasets. 3) SVM provides superior accuracy while also assigning confidence scores to its classifications. 4) Future enhancements could involve expanding the dataset and implementing real-time updates for continuous learning.

References

[1] Ahmed, H., Traore, I., & Saad, S. (2017). Detection of online fake news using N-gram analysis and machine learning techniques. Intelligent, Secure, and Dependable Systems in Distributed and Cloud Environments, 127–138. Springer. [2] Chang, C.-C., & Lin, C.-J. (2018). LIBSVM – A Library for Support Vector Machines. [3] Conroy, N. J., Rubin, V. L., & Chen, Y. (2015). Automatic deception detection: Methods for finding fake news. Proceedings of the Association for Information Science and Technology, 52(1), 1–4. [4] Faloutsos, C. (1985). Access methods for text. ACM Computing Surveys (CSUR), 17(1), 49–74. [5] Granik, M., &Mesyura, V. (2017). Fake news detection using Naïve Bayes classifier. IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), 900–903. IEEE. [6] Kaggle. (2016). Getting Real about Fake News. [7] Kaggle. (2017). All the News. [8] Khan, J. Y., Khondaker, M., Islam, T., Iqbal, A., & Afroz, S. (2019). A benchmark study on machine learning methods for fake news detection. arXiv preprint arXiv:1905.04749. [9] Mai-grot, C., Kijak, E., & Claveau, V. (2018). Fusion par apprentissage pour la détection de faussesinformationsdans les réseauxsociaux. Document Numérique, 21(3), 55–80. [10] Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-validation. Encyclopedia of Database Systems, 532–538. [11] Pulido, C. M., Ruiz-Eugenio, L., Redound-Sama, G., &Villarejo-Carballido, B. (2020). A new application of social impact in social media for overcoming fake news in health. International Journal of Environmental Research and Public Health, 17(7), 2430. [12] Ramos, J. (2003). Using TF-IDF to determine word relevance in document queries. Proceedings of the First Instructional Conference on Machine Learning, 242, 133–142. New Jersey, USA. [13] Salton, G., & Mc Gill, J. M. (1983). Introduction to Modern Information Retrieval. [14] Sauvageau, F. (2018). Les faussesnouvelles, nouveaux visages, nouveaux défis. Presses de l’Université Laval. [15] Scholkopf, B., & Smola, A. J. (2018). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. Adaptive Computation and Machine Learning Series.

Copyright

Copyright © 2025 Himanshu Singh, Diksha Singh, Atul Kumar, Harsh Panwar, Dr. Saumya Chaturvedi, Ms. Poonam Verma, Dr. Sureshwati . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET68497

Publish Date : 2025-04-08

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here