Bridging the Multilingual Health Literacy Gap: A Hybrid Transformer-Based Framework for Automated Clinical Documentation and Patient-Centric Reporting

Authors: Prathamesh Chavan, Rushil Dhube, Tushar Dayma, Riddhi Tumpalliwar, Kirti Randhe

DOI Link: https://doi.org/10.22214/ijraset.2026.83462

Abstract

The global healthcare system faces a critical chal-lenge in health literacy, particularly pronounced in resource-constrained environments such as the Indian subcontinent, where a staggering doctor-to-patient ratio of 1:1,511 exacerbates com-munication barriers between healthcare providers and patients. This paper presents a novel hybrid transformer-based framework that integrates Automatic Speech Recognition (ASR), Named Entity Recognition (NER), and intelligent report generation to bridge the multilingual health literacy gap. Building upon the foundational architecture of the HSUIT AI Healthcare Platform, we propose a comprehensive end-to-end system that processes multilingual audio inputs through advanced prepro-cessing pipelines, employs state-of-the-art transformer models for clinical entity extraction, and generates patient-centric reports in multiple languages. Our methodology incorporates Voice Activity Detection (VAD), RNNoise preprocessing, Whisper-based mul-tilingual ASR, BioClinicalBERT for domain-specific NER, and T5/BART models for abstractive summarization. Experimental evaluations demonstrate Word Error Rates (WER) below 8% for clinical transcription and F1-scores exceeding 0.92 for entity recognition across six Indian languages. The system addresses critical regulatory requirements including HIPAA, GDPR, and India’s Digital Personal Data Protection (DPDP) Act, while main-taining data sovereignty and algorithmic fairness. Our contribu-tions include: (1) a multilingual clinical documentation pipeline achieving state-of-the-art performance, (2) comprehensive risk assessment frameworks for AI-driven healthcare systems, and (3) practical deployment strategies for resource-limited health-care settings. This research demonstrates significant potential for democratizing healthcare access and improving patient outcomes through intelligent automation.

Introduction

This paper addresses the growing challenge of health literacy and proposes an AI-driven multilingual healthcare documentation and communication system to improve healthcare accessibility, efficiency, and patient understanding, particularly in linguistically diverse countries such as India.

1. Background and Problem Statement

Health literacy refers to an individual's ability to obtain, understand, and use health information effectively. Poor health literacy leads to:

Medication errors
Higher hospital readmission rates
Increased healthcare costs
Worse patient outcomes
Greater health inequalities

The problem is especially severe in India due to:

A shortage of healthcare professionals.
A doctor-to-patient ratio below recommended standards.
Extreme linguistic diversity, with 22 official languages and hundreds of dialects.
Heavy reliance on English-based medical documentation, which excludes many patients.

Patients who cannot fully understand medical information are significantly more likely to experience adverse health outcomes.

2. Need for AI-Based Healthcare Automation

Recent advances in Artificial Intelligence (AI), Natural Language Processing (NLP), and transformer models provide new opportunities to address healthcare communication challenges.

Key technologies include:

Whisper for multilingual speech recognition.
BioClinicalBERT for medical entity extraction.
Transformer-based summarization models for generating patient-friendly reports.

The paper builds upon the HSUIT AI Healthcare Platform, which combines speech recognition, clinical text analysis, and report generation into a unified healthcare workflow.

3. Research Motivation

The study is motivated by three major challenges:

A. Linguistic Diversity

Most healthcare systems operate primarily in English, making medical information inaccessible to a large percentage of patients. A multilingual AI system can bridge this communication gap.

B. Physician Workload

Doctors spend approximately 35–40% of their time on documentation rather than patient care. Automating transcription and report generation can reduce this burden and improve efficiency.

C. Patient Empowerment

Providing medical reports in local languages and simplifying medical terminology helps patients better understand their conditions and participate in healthcare decisions.

4. Major Contributions

The paper proposes:

Comprehensive AI Pipeline

An end-to-end healthcare documentation framework integrating:

Voice Activity Detection (VAD)
Audio enhancement
Speech recognition
Clinical entity extraction
Medical text summarization
Multilingual report generation

Performance Evaluation

The system is tested across six Indian languages:

Hindi
Bengali
Tamil
Telugu
Marathi
Gujarati

Regulatory Compliance

The framework incorporates privacy and legal compliance with:

HIPAA
GDPR
DPDP Act

Risk Management

The paper develops methods to identify and mitigate:

Algorithmic bias
Data privacy risks
Security vulnerabilities

Deployment Strategies

The system is designed for:

Resource-constrained healthcare environments
Edge computing
Offline functionality
Efficient model compression

5. Literature Review Findings

Automatic Speech Recognition (ASR)

Healthcare ASR has evolved from traditional statistical models to deep learning and transformer-based approaches.

Traditional Methods

Hidden Markov Models (HMMs)
Gaussian Mixture Models (GMMs)

These systems struggled with:

Accents
Medical terminology
Noisy clinical environments

Deep Learning Advances

Neural networks and LSTMs improved transcription accuracy significantly.

Transformer-Based ASR

Whisper achieves near-human transcription quality across multiple languages and performs particularly well in multilingual healthcare settings where code-switching between English and local languages is common.

Medical Language Models

BERT-Based Models

The development of specialized medical NLP models improved understanding of clinical text:

BioBERT
ClinicalBERT
BioClinicalBERT

These models perform tasks such as:

Disease identification
Clinical entity recognition
Risk prediction
Medical information extraction

Medical Named Entity Recognition (NER)

Modern NER systems identify:

Diseases
Symptoms
Medications
Procedures
Laboratory results

Using transformer models with specialized architectures, these systems achieve high accuracy in extracting structured clinical information from unstructured medical notes.

Clinical Text Summarization

The paper reviews two approaches:

Extractive Summarization

Selects important sentences directly from medical documents.

Abstractive Summarization

Uses transformer models to generate concise summaries in natural language.

Important models include:

T5
BART
Pegasus

These models help create understandable patient reports from complex clinical data.

Multilingual Medical NLP

Multilingual models such as:

Multilingual BERT
XLM-RoBERTa

allow knowledge transfer across languages, enabling healthcare AI systems to function even in languages with limited training data.

However, Indian languages remain challenging due to:

Complex grammar
Rich morphology
Limited medical datasets

Optical Character Recognition (OCR)

Many healthcare records exist as scanned documents.

Modern OCR systems such as:

Tesseract
PaddleOCR

convert medical documents into machine-readable text for further AI processing.

Conclusion

This research presents a comprehensive hybrid transformer-based framework addressing the critical global challenge of health literacy in multilingual Indian healthcare. Key contri-butions include: 1) State-of-the-Art Performance: WER below 8% for clinical transcription and F1-scores exceeding 0.92 for entity recognition across six Indian languages. 2) Comprehensive Regulatory Compliance: Detailed analysis and architectural design addressing HIPAA, GDPR, and India’s DPDP Act. 3) Risk Assessment Methodology: Systematic approaches to evaluating and mitigating algorithmic bias and data privacy vulnerabilities. 4) Practical Deployment Strategies: Model optimization techniques and edge computing architectures enabling deployment in resource-limited settings. 5) Patient Empowerment: Patient-centric report genera-tion in vernacular languages with simplified terminology directly addresses health literacy gaps.

References

[1] D. Nutbeam, “Health literacy as a public health goal: a challenge for contemporary health education and communication strategies into the 21st century,” Health Promotion International, vol. 15, no. 3, pp. 259–267, 2000. [2] N. D. Berkman et al., “Low health literacy and health outcomes: an updated systematic review,” Annals of Internal Medicine, vol. 155, no. 2, pp. 97–107, 2011. [3] M. Kutner, E. Greenberg, Y. Jin, and C. Paulsen, “The health literacy of America’s adults: Results from the 2003 National Assessment of Adult Literacy,” U.S. Department of Education, NCES, Washington, DC, 2006. [4] World Health Organization, “Global strategy on human resources for health: Workforce 2030,” Geneva, Switzerland, 2016. [5] D. Schillinger et al., “Association of health literacy with diabetes outcomes,” JAMA, vol. 288, no. 4, pp. 475–482, 2002. [6] J. A. Vernon, A. Trujillo, S. Rosenbaum, and B. DeBuono, “Low health literacy: Implications for national health policy,” University of Connecticut, Dept. of Finance, 2007. [7] A. Radford et al., “Robust speech recognition via large-scale weak supervision,” arXiv preprint arXiv:2212.04356, 2022. [8] E. Alsentzer et al., “Publicly available clinical BERT embeddings,” in Proc. 2nd Clinical NLP Workshop, 2019, pp. 72–78. [9] HSUIT Development Team, “HSUIT Comprehensive Final Report 2026: Integrated AI Healthcare Platform for Clinical Documentation and Analysis,” Internal Technical Report, 2026. [10] Office of the Registrar General and Census Commissioner, “Census of India 2011: Language,” Ministry of Home Affairs, Government of India, 2011. [11] B. G. Arndt et al., “Tethered to the EHR: Primary care physician workload assessment using EHR event log data and time-motion ob-servations,” Annals of Family Medicine, vol. 15, no. 5, pp. 419–426, 2017. [12] World Health Organization, “Health literacy: The solid facts,” WHO Regional Office for Europe, Copenhagen, 2013. [13] B. H. Juang and L. R. Rabiner, “Hidden Markov models for speech recognition,” Technometrics, vol. 33, no. 3, pp. 251–272, 1991. [14] A. Zafar, C. Overhage, and C. J. McDonald, “Continuous speech recognition for clinicians,” J. Am. Med. Inform. Assoc., vol. 6, no. 3, pp. 195–204, 1999. [15] L. R. Rabiner, “A tutorial on hidden Markov models and selected applications in speech recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257–286, 1989. [16] G. Hinton et al., “Deep neural networks for acoustic modeling in speech recognition,” IEEE Signal Process. Mag., vol. 29, no. 6, pp. 82–97, 2012. [17] A. Graves, A.-r. Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” in Proc. ICASSP, 2013, pp. 6645–6649. [18] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014. [19] V. Prabhu and A. Kannan, “Evaluation of Whisper for medical speech recognition: A comparative study,” J. Biomed. Inform., vol. 142, p. 104384, 2023. [20] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proc. NAACL-HLT, 2019, pp. 4171–4186. [21] J. Lee et al., “BioBERT: a pre-trained biomedical language representa-tion model for biomedical text mining,” Bioinformatics, vol. 36, no. 4, pp. 1234–1240, 2020. [22] K. Huang, J. Altosaar, and R. Ranganath, “ClinicalBERT: Modeling clinical notes and predicting hospital readmission,” arXiv preprint arXiv:1904.05342, 2019. [23] O¨ . Uzuner, B. R. South, S. Shen, and S. L. DuVall, “2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text,” J. Am. Med. Inform. Assoc., vol. 18, no. 5, pp. 552–556, 2011. [24] G. Lample, M. Ballesteros, S. Subramanian, K. Kawakami, and C. Dyer, “Neural architectures for named entity recognition,” in Proc. NAACL-HLT, 2016, pp. 260–270. [25] W. Sun, A. Rumshisky, and O. Uzuner, “Evaluating temporal relations in clinical text: 2012 i2b2 Challenge,” J. Am. Med. Inform. Assoc., vol. 20, no. 5, pp. 806–813, 2013. [26] X. Wang et al., “Cross-type biomedical named entity recognition with deep multi-task learning,” Bioinformatics, vol. 35, no. 10, pp. 1745–1752, 2019. [27] R. Mihalcea and P. Tarau, “TextRank: Bringing order into text,” in Proc. EMNLP, 2004, pp. 404–411. [28] C. Raffel et al., “Exploring the limits of transfer learning with a unified text-to-text transformer,” J. Mach. Learn. Res., vol. 21, pp. 1–67, 2020. [29] M. Lewis et al., “BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension,” in Proc. ACL, 2020, pp. 7871–7880. [30] Y. Zhang, D. Merck, E. B. Tsai, C. D. Manning, and C. P. Langlotz, “Leveraging pretrained models for automatic summarization of doctor-patient conversations,” in Proc. EMNLP Workshop on Health Text Mining, 2020, pp. 67–73. [31] J. Zhang, Y. Zhao, M. Saleh, and P. J. Liu, “PEGASUS: Pre-training with extracted gap-sentences for abstractive summarization,” in Proc. ICML, 2020, pp. 11328–11339. [32] A. Conneau et al., “Unsupervised cross-lingual representation learning at scale,” in Proc. ACL, 2020, pp. 8440–8451. [33] A. Kumar, S. Singh, and R. Sharma, “Building multilingual medical corpora for Indian languages: Challenges and opportunities,” in Proc. LREC, 2022, pp. 4567–4575. [34] V. Pandey and M. Gupta, “Multilingual health information extraction for low-resource Indian languages,” J. Biomed. Inform., vol. 118, p. 103789, 2021. [35] R. Smith, “An overview of the Tesseract OCR engine,” in Proc. ICDAR, 2007, pp. 629–633. [36] PaddlePaddle Team, “PaddleOCR: Awesome multilingual OCR toolk-its,” GitHub repository, 2020. [Online]. Available: https://github.com/ PaddlePaddle/PaddleOCR [37] J.-M. Valin, “A hybrid DSP/deep learning approach to real-time full-band speech enhancement,” in Proc. MMSP, 2018, pp. 1–5.

Copyright

Copyright © 2026 Prathamesh Chavan, Rushil Dhube, Tushar Dayma, Riddhi Tumpalliwar, Kirti Randhe. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET83462

Publish Date : 2026-06-05

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here