AI-Driven Bilingual Voice Chatbot

Authors: S Manimala, Shashank Uppunda, Manoj Kumar N, B Charan Sai, Bhuvan R

DOI Link: https://doi.org/10.22214/ijraset.2025.71640

Abstract

ThisprojectproposesanAI-driven,bilingualvoice- enabled health chatbot that aims to enhance healthcare access in Karnataka’sruralandsemi-urbanzones.Thesystemwasinitially developed with a fine-tuned GPT-2 model; however, it generated unpredictableandsometimesunrelatedanswers,particularlyfor advanced or bilingual inputs. To address these constraints, the modelwassubstitutedwithMistralLLMthroughintegration onLangChain,allowingRetrieval-AugmentedGeneration(RAG) to generate more precise and context-specific responses from a carefully curated health QA dataset. Supporting both Kannada and English languages, voice interaction in real time, and a friendly Gradio interface, the chatbot offers inclusive, voice-to- voice health support customized for those with low literacy and restricted digital exposure.

Introduction

Despite rapid digitization in healthcare, rural and semi-rural communities in Karnataka face barriers due to language, literacy, and limited digital access. Most health chatbots only support English text, excluding Kannada speakers, low-literacy users, women, and the elderly. Existing rule-based systems are often rigid and culturally insensitive.

To address these gaps, the project introduces a bilingual AI-driven health chatbot with voice-first interaction in both English and Kannada. The initial GPT-2-based version suffered from inconsistent, hallucinated responses and poor bilingual handling. A switch to Mixtral-8x7B-Instruct, combined with LangChain and Retrieval-Augmented Generation (RAG) using a FAISS vector database, significantly improved accuracy and reliability. Speech recognition (Google API), real-time translation, and voice synthesis (gTTS) allow full voice-to-voice communication, and a Gradio interface ensures usability for those with minimal digital skills.

Literature Review & Identified Gaps

The literature reveals rapid progress in AI health chatbots but with key shortcomings:

Poor multilingual and voice support
Lack of regional language coverage
Limited transparency and source attribution

Studies reviewed:

BioMistral (2024): Strong in biomedical NLP but weak in multilingual and voice support.
Mokmin Ibrahim (2021): Proved chatbot usefulness for students but limited to English text and educated users.
Vignesh Amirn (2025): Offers multilingual voice support, but lacks Kannada and source transparency.
Singhania et al. (2024) – Medibuddy: Real-life use, but English-only and no source attribution.
Prashanth et al. (2023): COVID-19 chatbot with good triage but lacks context handling and local language support.

Project Contributions Over Existing Work

This project uniquely addresses the multilingual (Kannada-English), voice-first, culturally sensitive, and source-transparent needs unmet by previous solutions:

Supports both text and voice input/output
Targets low-literacy and rural users
Uses verified medical sources for reliable answers
Enables trust through source-backed responses

Methodology Overview

1. Dataset Preparation

Curated medical PDFs were processed into clean, overlapping chunks using RecursiveCharacterTextSplitter.
Embedded using sentence-transformers/all-MiniLM-L6-v2.
Stored in FAISS for efficient semantic search during query retrieval.
Bilingual content alignment ensures accurate Kannada-English interactions.

2. System Architecture

Gradio Interface: Simple GUI for voice/text queries.
Speech Recognition: Supports Kannada and English with ambient noise handling.
Language Detection & Translation: Ensures Kannada queries are semantically translated to English.
FAISS Retrieval Module: Matches user queries with vectorized medical content to fetch grounded answers.

Conclusion

The development of the AI-Driven Health Companion has been a transformational process that helped us develop the technical expertise and user-centered design skills necessaryto develop inclusive digital health solutions. Initially, we built our system using a fine-tuned GPT-2 model. However, we re- alized that the model had limitations in accuracy of responses and persisted contextual reliability. Therefore, we chose to switch to the Mixtral-8x7B-Instruct model with LangChain and a FAISS knowledge matrix. This enhancement enabled contextual, fact-based responses based on a curated medical QA dataset, which improved the capacity for delivery. In the process of the project, we had engaged exposure to speech recognition, translation, and text to speech technologies as we adapted Gradio to develop a multilingual and accessible user interface. There were challenges such as latency, inference, andmultilingualintegrationbutwereabletoputtogethera system that was both a user interface that demonstrated technical promise. In the future we aim to improve speed throughanasynchronousprocessingsystem,keepourinterface user-drivendesign,andincreaseourdatasetwithmorediverse and domain-specific medical data. Additional goals include adding image-based QA using computer vision approaches, multi-turnconversationalpossibilities,cleareraudiofordiverse accents, secure data storage for user personalization, and full as a deployed interface.A variety of tools are introduced to extend the program’s scalability, inclusiveness, and utility toa broader array of uses, notably within under-served zones.

References

[1] Radford,A.,Wu,J.,Child,R.,Luan,D.,Amodei,D.,Sutskever, I.(2019).”LanguageModelsareUnsupervisedMultitaskLearners”, Sensors,2019.DOI:CorpusID:160025533 [2] Abacha, A. B., Mrabet, Y.,Demner-Fushman, D. ”A question-entailment approach to question answering”, BMC Bioinformatics,20(1), 1–11.Sensors, 2019. DOI:10.1186/s12859-019-3119-4 [3] Mokmin, N. A. M.,Ibrahim, N. A. (2021). ”The evaluation of chatbotas a tool for health literacy education among undergraduate students”,Education and information technologies, 26(5), 6033–6049. Sensors,2021. DOI:10.1007/s10639-021-10542-y [4] Denecke,K.,May,R.,Rivera-Romero,O.(2024).”TransformerModelsinHealthcare:ASurveyandThematicAnalysisofPotentials,Shortcom-ings and Risks”, Journal of medical systems, 48(1), 23. Sensors, 2024. DOI:10.1007/s10916-024-02043-5 [5] Prashanth, MallelluReddy, PSwapna, Mudrakola. (2023). ”AI En-abledChatBotforCOVID’19”,Sensors,2023.DOI:10.1007/978-3-031-27524-168 [6] Laymouna M, Ma Y, Lessard D, Schuster T, Engler K, Lebouche B.(2024). Roles, Users, Benefits, and Limitations of Chatbots in HealthCare: Rapid Review. Sensors, 2024. DOI:preprint/56930 [7] Ahan Bhatt, Nandan Vaghela. (2024). Med-Bot: An AI-Powered As-sistant to Provide Accurate and Reliable Medical Information, Sensors,2024. DOI:10.48550/arXiv.2411.09648 [8] Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gour-raud, Mickael Rouvier, Richard Dufour. (2024). BioMistral: A Collec-tion of Open-Source Pretrained Large Language Models for MedicalDomains, Sensors, 2024. DOI:10.48550/arXiv.2402.10373 [9] Vishal Vinod, Susmit Agrawal, Vipul Gaurav, Pallavi R, Savita Choud-hary. (2021). Multilingual Medical Question Answering and Informa-tionRetrievalforRuralHealthIntelligenceAccess,Sensors,2021. DOI:10.48550/arXiv.2106.01251 [10] Qiming Bao, Lin Ni, Jiamou Liu. (2020). HHH: An Online MedicalChatbotSystembasedonKnowledgeGraphandHierarchicalBi- DirectionalAttention,Sensors,2020.DOI:10.1145/3373017.3373049 [11] Vignesh U, Aman Amirneni. (2025). Breaking Language Barriers inHealthcare: A Voice Activated Multilingual Health Assistant, Sensors,2025. DOI:10.28945/5455 [12] Vince Bartle, Janice Lyu, Freesoul El Shabazz-Thompson, Yunmin Oh,Angela Anqi Chen, Yu-Jan Chang, Kenneth Holstein, Nicola Dell.(2022). “A Second Voice”: Investigating Opportunities and ChallengesforInteractiveVoiceAssistantstoSupportHomeHealthAides,Sensors,2022. DOI:10.1145/3491102.3517683 [13] Leibo Liu,Oscar Perez-Concha,Anthony Nguyen,Vicki Bennett,LouisaJorm. (2022). De-identifying Australian hospital discharge summaries:Anend-to-endframeworkusingensembleofdeeplearningmodels, Sensors,2022.DOI:10.1016/j.jbi.2022.104215 [14] Ziqi Yang, Xuhai Xu, Bingsheng Yao, Ethan Rogers, Shao Zhang,Stephen Intille, Nawar Shara, Guodong Gordon Gao, Dakuo Wang.(2024). Talk2Care: An LLM-based Voice Assistant for Communica-tionbetweenHealthcareProvidersandOlderAdults,Sensors,2024. DOI:10.1145/3659625 [15] Kowatsch,TobiasccNißen,MarciaKatharinaccShih,Chen-Hsuan I. (2017). Text-based Healthcare Chatbots Supporting Patient andHealth Professional Teams: Preliminary Results of a Randomized Con-trolled Trial on Childhood Obesity,, Sensors, 2017. DOI:10.3929/ethz-b-000218776 [16] Singhania, R., Badagan, S., Reddy, D., Sai Teja, K. T.,Jett, C. (2024).Medibuddy–AhealthcarechatbotusingAI.InternationalJournalofSoftComputing and Engineering (IJSCE), 14(3), Article G9902. Sensors,2024. DOI:10.35940/ijsce.G9902.14030724

Copyright

Copyright © 2025 S Manimala, Shashank Uppunda, Manoj Kumar N, B Charan Sai, Bhuvan R. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET71640

Publish Date : 2025-05-26

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here