College administrative systems face critical challenges managing information scattered across multiple documents and websites. This research presents Genie Assistant, an open -source, lightweight AI-powered chatbot leveraging Retrieval-Augmented Generation (RAG) for college document query resolution the system enables users to upload custom documents and query them in natural language using Streamli , Sentence transformers (all-miniLM-L6-v2), ChromaDB, and Flan-T5. Testing with 50 student users demonstrates 92% query accuracy, 3.2-second average response time ,89% user satisfaction, and 98.7% successful query completion. the system operates ensuring complete institutional data privacy. Implementation using open-source technologies eliminates licensing costs, reducing average query resolution time by 98.8% (from 28 minutes to 3.2 seconds). The five-layer architecture comprises user Interface, Processing, Storage, Retrieval, and Generation layers. Genie Assistant demonstrates that sophisticated AI capabilities need not require expensive commercial infrastructure while maintaining transparent, source-attributed responses.
Introduction
Higher education institutions face major inefficiencies due to fragmented information stored across multiple documents and formats, leading to time loss for students and repetitive workload for staff. Common challenges include lack of centralized access, inconsistent information, limited availability outside office hours, and poor scalability as institutions grow.
To address this, the research proposes Genie Assistant, a privacy-preserving, open-source Retrieval-Augmented Generation (RAG) chatbot designed specifically for college environments. It processes documents locally, ensuring data privacy while improving information accessibility and user experience.
The system uses a five-layer architecture covering input capture, preprocessing, local inference, retrieval, and response generation. It supports multiple document formats, uses ChromaDB for local vector storage, and Flan-T5 for response generation grounded in retrieved content to reduce hallucinations. Average response time is 3.2 seconds.
Experimental evaluation on real college documents and users showed 92% average accuracy, 98.7% query success rate, and 89% user satisfaction, reducing query resolution time from 28 minutes to seconds and significantly lowering staff workload.
Key advantages include strong privacy protection, low cost, easy deployment, scalability, and transparent source-based answers. Limitations involve accuracy variation across query types, limited multilingual support, and dependence on document quality.
Future work plans include multilingual expansion, role-based access, ERP integration, mobile apps, OCR support, and development into a comprehensive campus intelligence platform.
Conclusion
Genie Assistant successfully demonstrates the feasibility of implementing an open-source, privacy-preserving AL chatbot for college document query resolution. The system addresses critical information access inefficiencies, reducing query resolution time by 98.8% while maintaining 92% accuracy and eliminating privacy risks.
A. Key Contributions
1) Demonstrates cost-effective, privacy-preserving RAG implementation for resource-constrained institutions
2) Validates 800-character chunking with 100-character overlap as optimal for educational documents
3) Establishes all-MiniLM-L6-v2 as viable embedding model for production college chatbots
4) Provides comprehensive evaluation framework for educational chatbot systems
5) Enables practical deployment guidance for educational practitioners
Genie Assistant proves that sophisticated AI capabilities need not require expensive infrastructure. The open-source implementation maintains institutional autonomy while delivering capabilities exceeding commercial solution. Implementation across Indian colleges and universities can enhance experience, operational efficiency, and institutional information management.
References
[1] Vaswani, A., Shazeer, N., Parmar, N., et al, (2017), Attention is All you Need. Advances in Neural information processing Systems, 30, 5998-6008.
[2] Devlin, J., Chang, M, w., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.
[3] Raffel, C., Shazeer, N., Roberts, A., et al. (2020). Exploring the Limits of Transfer Learning with a Unified Text-To-Text Transformer: journal of Machine Learning Research, 21(140),1-67.
[4] Lewis, p., perez, e., Piktus, A., et al. (2020). Retrieval-Augmented Generation for knowledge-Intensive NLP Tasks. Advances in Neural Information processing System, 33, 9459-9474.
[5] Singh, V., & Verma, R. (2023). Open-Source Language Models for Document-Based Question Answering in Indian Educational Contexts. ACM Transactions on Asian and Low-Resource Language Information
[6] Kumar, A., Gupta, S., & Verma, p. (2023). Semantic Search Using Vector Embeddings for Institutional Knowledge Management. Journal of Information Technology and Education, 18(2), 123-145.
[7] Thompson, J., Williams, R., & Davis, K. (2024), Privacy -Preserving Architectures in Educational Technology. International Journal of Information Management, 72, 102-119.
[8] Johnson, m., & Lee, C. (2024). Adoption Of AI Chatbots in Educational Institutions: impact on Support Services and User Satisfaction. Computers &Education, 189, 104-125.