Multi-agent systems (MAS) offer an efficient design pattern for tackling complex distributed issues by utilising numerous independent agents that work together towards a shared goal. In this research work, the proposed MAS approach for attributed question answering (QA) comprises intelligent agents working in tandem to complete both retrieval and generation tasks to generate precise, trustworthy, and contextually relevant responses. The framework maximises answer accuracy, measured by coverage and relevance, and answer faithfulness, which is a metric that quantifies how strongly answers are anchored to the retrieved documents. The use of a mixed retrieval technique incorporating both sparse (BM25) and dense (E5) approaches enhances recall rates relative to other baseline modelsthat utilise only onetype ofretrieval model.Further, the solution involves a dual LLM setup giving users the freedom to either select cloud-based (OpenAI GPT) or on- premise (Llama) inference services, thereby resolving privacy issues.
Introduction
This project proposes an Intelligent Multi-Agent Chatbot with a Privacy-Oriented Architecture that combines Retrieval-Augmented Generation (RAG), multiple specialized AI agents, and a dual-LLM framework to deliver accurate, context-aware, and privacy-preserving conversational AI. Unlike traditional single-agent chatbots, the proposed system can handle complex, multi-step tasks, retrieve information from external knowledge sources, automate productivity tasks, and protect sensitive user data through local processing.
The literature review highlights major advancements in large language models (GPT, BERT, Transformer) and RAG-based systems such as MAIN-RAG, AU-RAG, LONGAGENT, and RAGentA. These systems improve retrieval accuracy, factual consistency, and reasoning by assigning specialized roles to different agents. However, most existing solutions rely on cloud processing, lack privacy-aware mechanisms, and focus mainly on question answering rather than practical task automation.
A comparison of existing RAG approaches shows that while multi-agent systems outperform single-agent models in retrieval quality and reasoning, they require greater computational resources and still face challenges related to privacy, scalability, and error propagation.
The study identifies four major research gaps:
Limited capability of single-agent RAG systems in handling complex, multi-step queries.
Lack of privacy-preserving processing, as most systems rely on cloud-based inference.
Insufficient support for real-world task automation, such as email and calendar management.
Limited real-world evaluation regarding user satisfaction, privacy, and automation effectiveness.
The proposed methodology introduces a modular multi-agent architecture consisting of:
An Intent Classifier to route user requests.
A RAG Agent for document-based question answering using BM25, E5 embeddings, and FAISS vector storage.
A Gmail Agent for drafting and sending emails.
A Calendar Agent for scheduling events using Google Calendar.
A Local FAISS Vector Database for secure document storage.
The workflow begins with user input through a chatbot interface. An LLM switch determines whether requests are processed using OpenAI GPT (cloud) or a local Llama model depending on data sensitivity. Uploaded documents are converted into embeddings, stored locally, and retrieved using a hybrid search mechanism before generating grounded, citation-supported responses.
A hybrid retrieval strategy combines BM25 for keyword matching and E5 for semantic similarity, improving retrieval accuracy by leveraging both lexical and contextual information.
The project's key innovation is its Dual-LLM Privacy Architecture, allowing users to choose between cloud-based processing for general queries and fully local inference for sensitive information. This ensures confidential data never leaves the user's device.
Natural language automation for email and calendar tasks.
Experimental evaluation using a FineWeb-based question-answering dataset demonstrates that the hybrid retrieval approach achieved 12.5% higher Recall@20, while the multi-agent framework improved response faithfulness by 10.7% and answer correctness by 1.1% compared to standard RAG systems. Although the multi-agent architecture requires additional computational resources, it significantly enhances factual accuracy, transparency through inline citations, context understanding, and user privacy.
Conclusion
The paper proposed a framework of Multi-Agent RAG that can contribute towards increasing the reliability ofattribute- ased question answering through hybrid retrieval, collaboration among multiple agents, and dual LLM processing with privacy. Results showed improvements in the faithfulness score of 10.7% and in the recall score of 12.5% over baselines of RAG models. What makes the proposed framework unique is its ability to go beyond QA by including Gmail agent and Google Calendar agent and making automated tasks possible using conversation-style NLU.
References
[1] T. B. Brown et al., \"Language Models are Few-Shot Learners,\" inAdvances in Neural Information Processing Systems (NeurIPS), vol.33, 2020, pp. 1877–1901. [Online]. Available:https://arxiv.org/pdf/2005.14165v4
[2] A. Vaswani et al., \"Attention is All You Need,\" in Advances in NeuralInformation Processing Systems (NIPS), vol. 30, 2017, pp. 5998–6008. [Online]. Available: https://arxiv.org/pdf/1706.03762
[3] P. Lewis et al., \"Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,\" in NeurIPS, vol. 33, 2020, pp. 9459–9474.
[4] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, \"BERT: Pre-training of Deep Bidirectional Transformers for LanguageUnderstanding,\" in Proc. NAACL-HLT, 2019, pp. 4171–4186.
[5] M. Berchansky, D. Fleischer, M. Wasserblat, and P. Izsak, \"CoTAR:Chain-of-Thought Attribution Reasoning with Multi-levelGranularity,\"inFindingsofEMNLP,2024,pp.236–246.[Online].
[6] Available:https://aclanthology.org/2024.findings-emnlp.13/
[7] C.-Y. Chang et al., \"MAIN-RAG: Multi-Agent Filtering RetrievalAugmented Generation,\" 2024. [Online]. Available:https://arxiv.org/pdf/2501.00332
[8] S. Es, J. James, L. Espinosa Anke, and S. Schockaert, \"RAGAs:Automated Evaluation of Retrieval Augmented Generation,\" in Proc.EACL System Demonstrations, 2024, pp. 150–158.
[9] K.Singhaletal.,\"Towardexpert-levelmedicalquestionansweringwithlargelanguagemodels,\"NatureMedicine,vol.31,no.3,pp.943–950,Mar.2025.
[10] R. Taylor et al., \"Galactica: A Large Language Model for Science,\"2022. [Online]. Available: https://arxiv.org/pdf/2211.09085
[11] T. Zhang et al., \"Benchmarking Large Language Models for NewsSummarization,\" Transactions of the Association for ComputationalLinguistics, vol. 12, pp. 39–57, 2024.
[12] J. Zhu,L. Yan,H.Shi,D. Yin,andL. Sha,\"ATM:AdversarialTuningMulti-agent System Makes a Robust Retrieval-AugmentedGenerator,\" in Proc. EMNLP, 2024, pp. 10902–10919.
[13] J. Jang and W.-S. Li, \"AU-RAG: Agent-based Universal RetrievalAugmented Generation,\" in Proc. ACM SIGIR-AP, 2024, pp. 2–11.
[14] P. Lewis et al., \"Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks,\" in NeurIPS, vol. 33, 2020, pp. 9459–9474.
[15] J. S.R.Mosquera,C.R.DeLa RosaPeredo,andM.GarridoCórdoba,\"AHybridApproachtoInformationRetrievalandAnswerGenerationfor Regulatory Texts,\" in Proc. RegNLP, 2025, pp. 31–35.
[16] J. Zhao et al., \"LONGAGENT: Achieving Question Answering for128k-Token-LongDocumentsthroughMulti-AgentCollaboration,\"inProc. EMNLP, 2024, pp. 16310–16324.
[17] T. Gao, H. Yen, J. Yu, and D. Chen, \"Enabling Large LanguageModels to Generate Text with Citations,\" in Proc. EMNLP, 2023, pp.6465–6488.
[18] C. Huang, Z. Wu, Y. Hu, and W. Wang, \"Training Language Modelsto Generate Text with Citations via Fine-grained Rewards,\" in Proc.ACL (Long Papers), 2024, pp. 2926–2949.
[19] J. HuangandK. Chang, \"Citation:A Key toBuildingResponsible andAccountableLargeLanguageModels,\"inFindingsofNAACL, 2024,
[20] pp.464–473.
[21] I. Besrour, T. M. F. Schreieder, J. He, and M. Färber, \"RAGentA:Multi-Agent Retrieval-Augmented Generation for AttributedQuestion Answering,\" 2025. [Online]. Available:https://arxiv.org/pdf/2506.16988