RAG-Based AI Chatbot for Student and Institutional Assistance

Authors: Nisanth P, Arohan A R, Adhithyan P C, Muhammed Suhail, Shahzad Bin Muhammed, Linsa V U

DOI Link: https://doi.org/10.22214/ijraset.2025.73970

Abstract

In the contemporary higher education setting of heightened student involvement and the need for immediate information, institutions must counteract the challenge of deliv- ering effective and convenient customer service. The steady flow of questions about admissions, academics, and campus services overwhelms support personnel and requires responsive solutions to better support the student experience. Filling this void, the ”RAG-Based AI Chatbot for College Customer Support” presents a solution, enabling students, parents, and staff to receive instant support and information. Building on Retrieval-Augmented Gen- eration (RAG) combined with Large Language Models (LLMs), the chatbot leverages a powerful architecture to tap into related information from reliable sources like college databases, web pages, and documents. Upon receiving user queries, the system efficiently identifies and ranks pertinent information, enabling the LLM to generate accurate and context-aware responses. This innovative solution offers 24/7 support, streamlines operational processes, and reduces the workload on support teams, fostering a more efficient and satisfying college experience.

Introduction

The rapid advancement of AI is reshaping interactions in educational institutions, with AI chatbots emerging as key tools to manage queries from students, staff, and applicants. However, many existing chatbots rely on broad pre-trained models that may produce inaccurate or non-specific answers, lacking grounding in an institution’s current documents and policies.

To address this, the College Support Chatbot project employs a Retrieval-Augmented Generation (RAG) architecture, integrating local open-source models and tools such as Ollama, ChromaDB, and Streamlit. The system processes institutional documents into semantic embeddings, stores them in a searchable vector database, retrieves and re-ranks relevant text chunks for user queries, and uses a local large language model (Gemma 3:4B) to generate context-specific, factually accurate responses grounded strictly in uploaded college data.

This approach improves reliability over traditional chatbots by reducing hallucinations and ensuring answers directly reference verified institutional knowledge. The project highlights the advantages of dynamic document ingestion, local model execution for data privacy, and context re-ranking to enhance factual accuracy and relevance.

Development involved setting up dependencies, creating a knowledge base from official college documents, embedding and indexing text data, building the RAG pipeline, and developing a user-friendly web interface with Streamlit. Extensive testing and user feedback informed ongoing optimization.

The final system enables users to upload college-specific documents and interactively query them, receiving precise answers grounded in those materials. This makes the chatbot a robust, adaptable, and privacy-conscious tool for college customer support, offering significant improvements in factual correctness and user experience compared to generic AI chatbots.

Conclusion

This research endeavored to develop and test the feasibility of a Retrieval-Augmented Generation (RAG) grounded AI chatbot, that would deliver accurate, context-specific assistance using particular documents from a collegiate setting. Through combining local open-source models with vector storage, we sought to build an entity whose answers are strictly founded on validated institutional sources, thus reducing the tendency for hallucination usually seen within general-purpose large language models (LLMs). The workflow involved processing varied college files (PDF, Excel), generating semantic embeddings with nomic-embed- text through Ollama, indexing these in a local instance of ChromaDB, and running a RAG pipeline within a Streamlit app. This pipeline had retrieval, CrossEncoder- based re-ranking, and answer generation by a locally served Gemma 3:4B model, only constrained by the retrieved context. The qualitative analysis, based on manual checking against source documents, showed the ability of the chatbot to provide responses factually consistent with the given knowledge base. The presence of the re-ranking step was seen to have the effect of improving contextual informativeness of information provided to the LLM. In addition, the project also demonstrated the effectiveness of using totally locally hosted models and databases (Ollama, ChromaDB) for building a domain-specific AI assistance tool, presenting possible advantages for data privacy and control. Some areas, nonetheless, need investigation and enhancement. One major limitation is the static nature of the knowledge base implemented. The accuracy of the chatbot depends directly on the update timeliness of the processed documents; it cannot track real-time updates unless the base vector store is updated. Future developments should involve working on developing an automated pipeline for tracking source document change and effectively updating the ChromaDB embeddings to keep them as relevant in the future. Another area for improvement is the assessment methodology. The existing assessment depended on manual verifications. The inclusion of automated evaluation systems, possibly modifying measures from tools such as RAGAS emphasizing faithfulness, answer relevance, and context accuracy, would offer more objective and scalable performance measures. In addition, formal user testing is required to collect feedback on usability and perceived effectiveness from the intended student and staff group. Furthermore, although proving feasibility, the locally hosted models’ performance (latency) and scalability on standard institutional hardware may be challenging under heavy loads. Subsequent versions may look into optimisation techniques like model quantisation, hardware acceleration, or other local model serving environments. In summary, despite these limitations, this project illustrates the tremendous potential of using a RAG architecture with locally controlled, open-source modules to construct robust, institution-specific AI support systems. Through the devel- opment of the proposed chatbot framework, our work adds a practical solution based on available tools, illustrating a way towards constructing trustworthy AI assistants that reduce hallucination by adhering strictly to curated knowledge sources in the education domain.

References

[1] Xu, Liwei and Liu, Jiarui, “A Chat Bot for Enrollment of Xi’an Jiaotong- Liverpool University Based on RAG”2024 8th International Workshop on Control Engineering and Advanced Algorithms (IWCEAA),IEEE, 2024. [2] D. Patel, N. Shetty, P. Kapasi, and I. Kangriwala, ”College enquiry chatbot using conversational AI”, International Journal for Research in Applied Science & Engineering Technology (IJRASET), vol. 11, no. 5, p. 2023, 2023. [3] Mohammad Shahid Beigh, Shahida Jahangir, “AI-BASED CHATBOT FOR EDUCATIONAL INSTITUTES,” in ResearchGate,June 2024. [4] Kumar Shivam; Khan Saud; Manav Sharma; Saurav Vashishth; Sheetal Patil, “Chatbot for College Website,”IJCAT - International Journal of Computing and Technology, Volume 5, Issue 6, June 2018. [5] C. V. Misischia, F. Poecze, and C. Strauss, “Chatbots in customer service: Their relevance and impact on service quality ,” Procedia Computer Science, vol. 201, pp. 421–428, 2022. [6] P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Ku¨ttler, M. Lewis, W.-t. Yih, T. Rockta¨schel, S. Riedel, and D. Kiela, “Retrieval-augmented generation for knowledge-intensive NLP tasks,” arXiv preprint arXiv:2005.11401, 2020x. [7] H. Soliman, H. Kotte, M. Kravc?´?k, N. Pengel, and N. Duong-Trung,” Retrieval-augmented chatbots for scalable educational support in higher education”, International Workshop on Generative AI for Learning Analytics, 2025. [8] ?Ismail Go¨kay K?rt?l, Beykan C¸ izel, ?Ismail Uzut, and Serdar Uzun, “Bridging the Gap: Fine-Tuning Artificial Intelligence (AI) Chatbots for Tourism,” Conference Paper, May 2024 [9] Marcondes, Francisco S and Gala, Adelino and Magalha˜es, Renata and Perez de Britto, Fernando and Dura˜es, Dalila and Novais, Paulo, “Natural Language Analytics with Generative Large-Language Models: A Practical Approach with Ollama and Open-Source LLMs,” 2023. [10] ?Ismail Go¨kay K?rt?l1, Beykan C¸ izel2, ?Ismail Uzut3, and Serdar Uzun1,”Bridging the Gap: Fine-Tuning Artificial Intelligence (AI) Chatbots for Tourism”,The Conference on Managing Tourism Across ContinentsAt:?Istanbul May 2024.

Copyright

Copyright © 2025 Nisanth P, Arohan A R, Adhithyan P C, Muhammed Suhail, Shahzad Bin Muhammed, Linsa V U. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET73970

Publish Date : 2025-08-31

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here