Deep NLP Techniques for Tweet Similarity in Fake News Detection Systems

Authors: Cheruku Navya Sri, Chikatimalla ManMohan, D. Devendar, Mr. Sheik Riyaz UI Haq

DOI Link: https://doi.org/10.22214/ijraset.2026.78192

Abstract

Addressing the intricate challenge of fake news detection, traditionally reliant on the expertise of professional fact-checkers due to the inherent uncertainty in fact-checking processes, this research leverages advancements in language models to propose a novel Long Short-Term Memory (LSTM)-based network. The proposed model is specifically tailored to navigate the uncertainty inherent in the fake news detection task, utilizing LSTM\'s capability to capture long-range dependencies in textual data. The evaluation is conducted on the well-established LIAR dataset, a prominent benchmark for fake news detection research, yielding an impressive accuracy of 99%. Moreover, recognizing the limitations of the LIAR dataset, we introduce LIAR2 as a new benchmark, incorporating valuable insights from the academic community. Our study presents detailed comparisons and ablation experiments on both LIAR and LIAR2 datasets, establishing our results as the baseline for LIAR2. The proposed approach aims to enhance our understanding of dataset characteristics, contributing to refining and improving fake news detection methodologies by effectively leveraging the strengths of LSTM architecture

Introduction

This project focuses on fake news detection using Natural Language Processing (NLP) and Long Short-Term Memory (LSTM) networks. In the digital age, misinformation spreads rapidly through social media and online platforms, affecting public opinion and trust. Traditional manual fact-checking methods are slow and inefficient, so automated machine learning solutions are required.

The proposed system uses an LSTM-based deep learning model, which is well-suited for sequential text data because it can capture long-term dependencies and contextual relationships between words. The model processes news articles after applying NLP preprocessing techniques such as tokenization, stopword removal, lemmatization, and word embedding (Word2Vec/GloVe). These steps convert raw text into numerical representations suitable for training.

The system is evaluated using the LIAR dataset, a benchmark dataset for fake news detection. To improve dataset diversity and overcome limitations, an enhanced version called LIAR2 is introduced. Comparative analysis between LIAR and LIAR2 helps evaluate model performance and generalization.

The project aims to develop a scalable, accurate, and automated fake news detection system that can classify news as real or fake. The system includes modules such as data collection, preprocessing, embedding, LSTM model building, training, evaluation, and model saving (H5 format).

The literature review highlights advancements in deep learning techniques like CNN, LSTM, Bi-LSTM, BERT, and hybrid models, showing improved performance over traditional machine learning methods such as Naive Bayes and Logistic Regression. However, challenges remain, including data imbalance, overfitting, and limited generalization.

Compared to the existing CNN-BiLSTM hybrid model, the proposed LSTM-based approach offers:

Better handling of long-term dependencies in text
Improved contextual understanding
Simpler architecture with reduced complexity

Overall, the project aims to create an efficient, robust, and scalable fake news detection system to help combat misinformation in digital media.

Conclusion

In conclusion, this project demonstrates the potential of using NLP techniques combined with advanced models like LSTM for fake news detection, addressing a critical issue in today’s digital age. By leveraging text-based features and contextual understanding, the system is able to identify patterns that distinguish real news from fake, providing a valuable tool for combating misinformation. The proposed system\'s ability to analyze news articles through both feature extraction and sequence modeling offers improved accuracy compared to traditional methods. However, the project also highlights areas for future improvement, such as integrating multimodal data, enhancing real-time detection capabilities, and incorporating explainable AI methods to improve model transparency. As fake news continues to evolve, further refinements and enhancements, such as multi-language support and continuous learning, will help the system adapt to new challenges, ensuring its relevance in the ongoing fight against misinformation. Ultimately, this work contributes to the broader field of fake news detection, offering a foundation for future research and development of more robust, scalable solutions.

References

[1] D. Pogue, ‘‘How to stamp out fake news,’’ Sci. Amer., vol. 316, no. 2, p. 24, Jan. 2017. [2] H. Allcott and M. Gentzkow, ‘‘Social media and fake news in the 2016 election,’’ J. Econ. Perspect., vol. 31, no. 2, pp. 211–236, May 2017. [3] R. Zafarani, X. Zhou, K. Shu, and H. Liu, ‘‘Fake news research: Theories, detection strategies, and open problems,’’ in Proc. 25th ACM SIGKDD Int. Conf. Knowl. Discovery Data Mining, New York, NY, USA, Jul. 2019, pp. 3207–3208. [4] Y. M. Rocha, G. A. de Moura, G. A. Desidério, C. H. de Oliveira, F. D. Lourenço, and L. D. de Figueiredo Nicolete, ‘‘The impact of fake news on social media and its influence on health during the COVID- [5] pandemic: A systematic review,’’ J. Public Health, vol. 31, no. 7, pp. 1007–1016, Jul. 2023. S. Vosoughi, D. Roy, and S. Aral, ‘‘The spread of true and false news online,’’ Science, vol. 359, no. 6380, pp. 1146–1151, Mar. 2018. [6] C. Silverman, This Analysis Shows How Viral Fake Election News Stories Outperformed Real News on Facebook. New York, NY, USA: BuzzFeed News, 2016. [7] C. Xu and N. Yan, ‘‘AROT-COV23: A dataset of 500k original Arabic tweets on COVID-19,’’ in Proc. 4th Workshop Afr. Natural Lang. Process., 2023, pp. 1–9. [8] C. Colomina, H. S. Margalef, R. Youngs, and K. Jones, The Impact of Disinformation on Democratic Processes and Human Rights in the World. Brussels, Belgium: European Parliament, 2021. [9] K. Shu, A. Sliva, S. Wang, J. Tang, and H. Liu, ‘‘Fake news detection on social media: A data mining perspective,’’ ACM SIGKDD Explor. Newslett., vol. 19, no. 1, pp. 22–36, 2017. [10] X. Zhang and A. A. Ghorbani, ‘‘An overview of online fake news: Characterization, detection, and discussion,’’ Inf. Process. Manage., vol. 57, no. 2, Mar. 2020, Art. no. 102025. [11] X. Zhou and R. Zafarani, ‘‘A survey of fake news: Fundamental theories, detection methods, and opportunities,’’ ACM Comput. Surveys, vol. 53, no. 5, pp. 1–40, Sep. 2020. [12] J. Shang, J. Shen, T. Sun, X. Liu, A. Gruenheid, F. Korn, A. D. Lelkes, C. Yu, and J. Han, ‘‘Investigating rumor news using agreement-aware search,’’ in Proc. 27th ACM Int. Conf. Inf. Knowl. Manage., Oct. 2018, pp. 2117–2125. [13] R. Zellers, A. Holtzman, H. Rashkin, Y. Bisk, A. Farhadi, F. Roesner, and Y. Choi, ‘‘Defending against neural fake news,’’ in Proc. Adv. Neural Inf. Process. Syst., vol. 32, 2019, pp. 9054–9065. [14] W. Wang, ‘‘‘Liar, liar pants on fire’: A new benchmark dataset for fake news detection,’’ in Proc. 55th Annu. Meeting ACL (Short Papers), vol. 2. Vancouver, BC, Canada, Jul. 2017, pp. 422–426. [15] N. Vo and K. Lee, ‘‘Where are the facts? Searching for fact-checked information to alleviate the spread of fake news,’’ in Proc. Conf. Empirical Methods Natural Lang. Process. (EMNLP), 2020, pp. 7717–7731. [16] P. Patwa, S. Sharma, S. Pykl, V. Guptha, G. Kumari, M . Akhtar, A. Ekbal, A. Das, and T. Chakraborty, ‘‘Fighting an infodemic: COVID-19 fake news dataset,’’ in Proc. Int. Workshop Combating Line Hostile Posts Regional Lang. During Emergency Situation. Cham, Switzerland: Springer, 2021, pp. 21–29. [17] L. Zadeh, ‘‘Fuzzy sets,’’ Inf. Control, vol. 8, no. 3, pp. 338–353, 1965. [18] L. Zadeh, Fuzzy Logic. New York, NY, USA: Springer, 2023, pp. 19–49. [19] J.-S. R. Jang, ‘‘ANFIS: Adaptive-network-based fuzzy inference system,’’ IEEE Trans. Syst. Man, Cybern., vol. 23, no. 3, pp. 665–685, Jun. 1993. [20] Y. Deng, Z. Ren, Y. Kong, F. Bao, and Q. Dai, ‘‘A hierarchical fused fuzzy deep neural network for data classification,’’ IEEE Trans. Fuzzy Syst., vol. 25, no. 4, pp. 1006–1012, Aug. 2017. [21] R. Das, S. Sen, and U. Maulik, ‘‘A survey on fuzzy deep neural networks,’’ ACM Comput. Surv., vol. 53, no. 3, pp. 1–25, May 2020. [22] F. Olan, U. Jayawickrama, E. O. Arakpogun, J. Suklan, and S. Liu, ‘‘Fake news on social media: The impact on society,’’ Inf. Syst. Frontiers, vol. 26, pp. 443–458, Jan. 2022.

Copyright

Copyright © 2026 Cheruku Navya Sri, Chikatimalla ManMohan, D. Devendar, Mr. Sheik Riyaz UI Haq. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET78192

Publish Date : 2026-03-11

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here