Multi-Modal Assistive Communication System: Real-Time American Sign Language Recognition, Lip Reading, and Morse code Translation Using Browser-Based AI
Authors: V. Laxmi, M. Jagadeesh, M. Balaji, M. Vershith
Communication barriers significantly impede social participation for the estimated 466 million individuals worldwide with disabling hearing loss and the additional millions affected by speech and motor impairments. Existing assistive technologies are characterised by three persistent deficiencies: single-modality coverage that obliges users to manage multiple fragmented tools, dependence on cloud-based inference that compromises user privacy, and prohibitive acquisition costs that restrict access in low-resource settings. This paper presents the Multi-Modal Assistive Communication System (MMACS), a unified browser-based artificial intelligence platform that simultaneously addresses all three deficiencies. MMACS integrates American Sign Language (ASL) gesture recognition (30 signs, ~10–15% of the full lexicon), AI-driven lip reading through viseme detection, and interactive Morse code translation within a single web application requiring no software installation. The system employs MediaPipe for real-time hand and facial landmark detection, TensorFlow.js for client-side neural network inference, and the Web Speech API for voice synthesis — all executing locally on the user\'s device to ensure data privacy. Empirical evaluation with 50 participants (18 with hearing impairments, 12 with speech impairments, 8 with motor impairments, 12 neurotypical controls) demonstrated ASL recognition accuracy of 94.2% (±2.1%), lip reading accuracy of 87.8% (±3.4%), and sub-200 ms end-to-end latency across all modalities.
The mobile-first responsive architecture ensures compatibility across all major browser engines on desktop, tablet, and mobile form factors. A post-study satisfaction survey recorded a 92% positive rating, indicating strong perceived utility in controlled settings. MMACS establishes a reproducible, open-source benchmark for privacy-preserving assistive communication technology and furnishes a principled framework for future multi-modal human-computer interaction research.
Introduction
The text discusses the development of MMACS (Multi-Modal Assistive Communication System), a browser-based AI platform designed to help individuals with hearing, speech, or motor impairments communicate more effectively. Existing assistive technologies are often expensive, limited to a single communication method, or dependent on cloud services that raise privacy concerns. MMACS addresses these issues by combining American Sign Language (ASL) recognition, lip reading, and Morse code translation into a single, privacy-preserving web application that runs entirely on the user’s device using TensorFlow.js and MediaPipe.
The system follows a four-layer architecture consisting of input acquisition, machine learning processing, translation, and presentation layers. It uses modern web technologies such as React, TypeScript, TensorFlow.js, and MediaPipe to ensure cross-platform accessibility without requiring installation or specialized hardware. All biometric data processing occurs locally, ensuring compliance with privacy standards like GDPR and HIPAA.
The ASL recognition module uses MediaPipe Hands and a multilayer perceptron classifier to recognize 30 gesture classes with real-time speech synthesis. A smart gesture state management mechanism prevents repetitive voice outputs during continuous gestures. The lip-reading module tracks facial landmarks using MediaPipe Face Mesh and classifies visemes through a bidirectional LSTM model to reconstruct spoken words. Adaptive thresholding improves performance under varying lighting conditions. The Morse code module supports keyboard, mouse, and touch input, translating Morse signals into text and speech with real-time audio-visual feedback.
The user interface is designed according to WCAG accessibility guidelines, featuring keyboard navigation, screen reader support, real-time visual feedback, and an analytics dashboard that stores usage statistics locally for privacy. The system was evaluated with 50 participants, including users with hearing, speech, and motor impairments, using tasks involving gesture recognition, lip reading, and Morse transcription. Results demonstrated high usability, strong accuracy, and practical performance for real-world assistive communication.
Conclusion
This paper presented MMACS, a Multi-Modal Assistive Communication System that unifies ASL gesture recognition, AI-driven lip reading, and Morse code translation within a single browser-based platform executing entirely on the user\'s device. The system addresses three unresolved challenges in assistive communication technology — fragmented single-modality solutions, privacy-compromising cloud dependency, and prohibitive acquisition costs — through a cohesive, open-source, zero-installation architecture. Experimental evaluation with 50 diverse participants demonstrated ASL recognition accuracy of 94.2%, lip reading accuracy of 87.8%, Morse code character error rate below 0.5%, end-to-end latency below 200 milliseconds across all modalities, and a System Usability Scale score of 81.4, all achieved without server-side processing or specialized hardware. The mathematical formulations presented for SoftMax classification (Eq. 1), accuracy computation (Eq. 2), and mouth aspect ratio (Eq. 3) provide a transparent and reproducible basis for future comparative studies. MMACS demonstrates that production-grade, privacy-preserving, multi-modal communication AI is achievable within the constraints of a modern web browser — a finding with broad implications for the design of inclusive assistive technologies in resource-limited settings worldwide. By providing a replicable open-source benchmark, this work lays a principled foundation for the next generation of multi-modal, human-centered communication systems.
References
[1] World Health Organization, \"World Report on Hearing,\" WHO Press, Geneva, Switzerland, 2021.
[2] S. Baxter, C. Enderby, P. Evans, and S. Judge, \"Barriers and facilitators to the use of high-technology augmentative and alternative communication devices: a systematic review,\" Int. J. Lang. Commun. Discord., vol. 47, no. 2, pp. 115–129, 2012.
[3] M. Garcia, A. Lopez, and R. Chen, \"Privacy implications of cloud-based assistive technologies: A systematic review,\" IEEE Secure. Privacy, vol. 20, no. 3, pp. 44–53, 2022.
[4] A. Kumar, S. Sharma, and P. Gupta, \"A survey of sign language recognition systems for the deaf and hard of hearing,\" ACM Compute. Surv., vol. 55, no. 4, pp. 1–38, 2022.
[5] J. Smith, T. Williams, and K. Brown, \"American sign language recognition using data gloves and hidden Markov models,\" IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 6, pp. 814–824, 2002.
[6] P. Premaratne, \"Historical development of hand gesture recognition,\" in Human Computer Interaction Using Hand Gestures, Singapore: Springer, 2014, pp. 5–29.
[7] L. Huang, W. Huang, E. Guo, Y. Tang, and Z. Zhou, \"Sign language recognition with multi-scale temporal fusion,\" Pattern Recognition., vol. 123, p. 108371, 2022.
[8] B. Bohanek and A. Cuculic, \"American sign language recognition using deep learning and computer vision,\" in Proc. 43rd Int. Conv. Inf., Commun. Electron. Technol. (MIPRO), Opatija, Croatia, 2020, pp. 1317–1321.
[9] H. Chen and Y. Zhang, \"Data transmission in assistive devices: Security and privacy analysis,\" ACM Comput. Surv., vol. 54, no. 9, pp. 1–34, 2021.
[10] A. Potamianos, C. Neti, G. Gravier, A. Garg, and A. W. Senior, \"Recent advances in the automatic recognition of audiovisual speech,\" Proc. IEEE, vol. 91, no. 9, pp. 1306–1326, 2003.
[11] Z. Zhang, R. Wang, and L. Xiao, \"Deep learning for lip reading using audio-visual information,\" IEEE Signal Process. Mag., vol. 38, no. 4, pp. 80–91, 2021.
[12] J. S. Chung and A. Zisserman, \"LipNet: Sentence-level lipreading,\" in Proc. ICLR Workshop End-to-End Learn. Speech Audio Process., 2017.
[13] MediaPipe Team, \"MediaPipe: A framework for building perception pipelines,\" arXiv preprint arXiv:1906.08172, Google Research, 2019.
[14] P. Anderson and J. Rosen, \"Morse code as an alternative communication channel: Performance evaluation,\" Assistive Technol., vol. 31, no. 3, pp. 158–165, 2019.
[15] R. Fazel-Rezai et al., \"P300-based brain-computer interface for Morse code communication,\" in Proc. Annu. Int. Conf. IEEE EMBC, Boston, MA, USA, 2011, pp. 4247–4250.
[16] S. Majaranta and A. Bulling, \"Eye tracking and eye-based human-computer interaction,\" in Advances in Physiological Computing, S. Fairclough and K. Gilleade, Eds., London: Springer, 2014, pp. 39–65.
[17] D. R. McNaughton and D. L. Light, \"The iPad and mobile technology revolution: Benefits and challenges for individuals who require AAC,\" Augment. Altern. Commun., vol. 29, no. 2, pp. 107–116, 2013.
[18] J. Higginbotham, D. Shane, S. Russell, and K. Caves, \"Access to AAC: Present, past, and future,\" Augment. Altern. Commun., vol. 23, no. 3, pp. 243–257, 2007.
[19] C. Zong, M. Chi, and J. Liu, \"A hybrid gesture-gaze interface for non-verbal communication assistance,\" IEEE Access, vol. 9, pp. 78234–78247, 2021.
[20] M. Abadi et al., \"TensorFlow: A system for large-scale machine learning,\" in Proc. 12th USENIX Symp. Oper. Syst. Des. Implement. (OSDI 16), 2016, pp. 265–283.