This paper presents a real-time fake news detection system based on a Retrieval-Augmented Generation (RAG) framework integrated with Large Language Models (LLMs). Traditional fake news detection approaches rely on static datasets and require frequent retraining, making them inefficient in dynamic information environments. To address this limitation, the proposed system leverages live data retrieval from Google News, enabling up-to-date and evidencebased verification.The system employs the all-MiniLM-L6-v2 SentenceTransformer model to generate semantic embeddings of retrieved news headlines and utilizes FAISS for efficient similarity search. By comparing user-provided claims with relevant news evidence, the system determines whether the claim is real or fake. Additionally, a caching mechanism is introduced to store previously retrieved results and model responses, significantly reducing redundant computations and improving response time.To further enhance performance, Particle Swarm Optimization (PSO) is applied to optimize the retrieval parameter (top-k), enabling the system to adaptively select the most relevant evidence. Experimental results demonstrate that the system achieves high accuracy, particularly when operating with a smaller number of retrieved documents. The proposed approach is computationally efficient, scalable, and capable of delivering near real-time predictions, making it suitable for practical fake news detection applications.
Introduction
The text presents a retrieval-augmented fake news detection system designed to verify user-submitted claims using real-time web evidence instead of relying on static trained classifiers.
The proposed approach addresses limitations of traditional machine learning methods, which struggle with evolving misinformation and require frequent retraining. Instead, the system retrieves recent Google News headlines as evidence, processes them using NLP techniques, and applies large language model reasoning to determine whether a claim is real or fake.
The pipeline works in several stages:
First, the input claim is simplified by removing punctuation and keeping only key words. Then, it is used to fetch relevant news headlines from Google News, with results stored in a multi-level caching system (news cache, vector cache, and LLM response cache) to improve speed and reduce repeated computation.
Next, the retrieved headlines are converted into vector embeddings using SentenceTransformer and stored in a FAISS index for efficient semantic similarity search. The most relevant headlines are selected as evidence and passed, along with the claim, to an LLM (via OpenRouter), which outputs a binary verdict (real or fake).
To further optimize performance, the system uses particle swarm optimization (PSO) to tune the retrieval parameter (top-k). Evaluation is done using standard metrics like accuracy, precision, recall, F1-score, confusion matrix, and processing time.
References
[1] L. Li et al., “VeraCT Scan: Retrieval-Augmented Fake News Detection System,” arXiv preprint, 2024.
[2] Y. Bai et al., “A Large Language Model-based Fake News Detection Framework with Retrieval and Verification,” NSF/Preprint, 2024.
[3] R. Singal et al., “Evidence-backed Fact Checking using RAG and Few-Shot Learning,” ACL FEVER Workshop, 2024.
[4] N. Rubtsova et al., “Fake News Detection with Retrieval Augmented Generation Framework,” OpenReview, 2024.
[5] S. Sharma and S. Meena, “Retrieval Augmented Generation Classification Algorithm for Fake News Detection,” 2024.
[6] J. Bing et al., “RAG-Augmented Reasoning for Political Fact-Checking using LLMs,” arXiv preprint, 2024.
[7] IEEE Computer Society, “A Survey of Large Language Models in Fake News Detection,” IEEE Computer Magazine, 2025.
[8] P. Lewis et al., “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,” NeurIPS, 2020.
[9] K. Guu et al., “REALM: Retrieval-Augmented Language Model Pre-Training,” ICML, 2020.
[10] O. Thorne et al., “FEVER: A Large-Scale Dataset for Fact Extraction and Verification,” NAACL-HLT, 2018.
[11] S. Lopez-Joya et al., “The blueprint of a new fact-checking system,” 2025.
[12] H. Li et al., “Use of Retrieval-Augmented Large Language Model for Fact-Checking,” 2025.
[13] J. Díaz García et al., “VERIFAID: Verification FAISS-based framework for fake news detection,” 2025.
[14] Y. Bai et al., “FCRV: Full-Context Retrieval and Verification for misinformation detection,” 2024.
[15] A. Dash et al., “Evidence-backed Fact Checking using RAG and Few-Shot Learning,” 2024.