The growing incidence of psychological stress and safety risks in emergency environments necessitates intelligent systems capable of providing timely and context-aware assistance. This work investigates an emotion-aware conversational framework that leverages Speech Emotion Recognition (SER) to analyze vocal characteristics associated with stress, anxiety, and emotional distress. Acoustic features extracted from speech signals are utilized to classify emotional states into positive, neutral, and negative categories, enabling the generation of empathetic and adaptive conversational responses that support user well-being. The framework is further extended to aquatic safety scenarios, where behavioral patterns and panic-related vocal cues are analyzed to identify potential drowning risks. Upon detecting high-risk conditions, the system provides calming guidance and simultaneously triggers real-time alerts to caregivers or lifeguards for rapid intervention. The integration of emotion recognition with safety monitoring demonstrates the potential of AI-driven conversational systems to enhance mental health support and strengthen emergency response mechanisms in critical situations.
Introduction
Mental health issues such as stress, anxiety, and depression are increasing due to modern lifestyle pressures, while access to professional support remains limited. Advances in Artificial Intelligence (AI), Natural Language Processing (NLP), and affective computing have enabled the development of intelligent systems that can analyze human emotions and provide automated mental health support.
Key technologies include Speech Emotion Recognition (SER), which detects emotions from voice features like pitch and tone, and NLP techniques that analyze emotional content from text. AI-based systems can also monitor behavior, detect risks, and provide real-time support, contributing to early detection and intervention in mental health care.
Mental health AI systems are classified into text-based, speech-based, chatbot-based, and multimodal systems. Multimodal systems, which combine data from speech, text, facial expressions, and physiological signals, offer more accurate and comprehensive emotional analysis.
The development of these systems relies on high-quality datasets and feature extraction techniques such as MFCC, pitch, and energy. Machine learning models (e.g., SVM, Decision Trees) and deep learning models (e.g., CNN, RNN, LSTM) are used to classify emotional states, with deep learning providing higher accuracy.
Existing research highlights the effectiveness of AI chatbots and emotion recognition systems in providing accessible, cost-effective, and non-invasive mental health support. However, challenges remain, including variability in human emotions, data quality issues, language differences, privacy concerns, and the lack of human-like empathy in AI systems.
Research gaps include limited diverse datasets, reliance on single data modalities, and reduced performance in real-world conditions. Future directions focus on multimodal systems, advanced deep learning models, larger datasets, and stronger privacy protections, as well as integration with professional healthcare services.
Conclusion
Artificial Intelligence has become an important technology for supporting mental health monitoring and emotion analysis. This survey reviewed recent developments in AI-based systems designed to detect and analyze human emotions, particularly through speech emotion recognition, natural language processing, and conversational agents. By examining both acoustic and linguistic features present in human communication, these systems are capable of identifying emotional states such as stress, anxiety, happiness, and sadness. Such capabilities enable the development of intelligent systems that can assist individuals by providing emotional insights and supportive responses.
The study also discussed the key components involved in building emotion recognition systems, including datasets, feature extraction methods, and machine learning models. Features such as Mel-Frequency Cepstral Coefficients, pitch, energy, and spectral characteristics play a vital role in representing emotional information from speech signals. Traditional machine learning techniques along with modern deep learning models, including Convolutional Neural Networks and Long Short-Term Memory networks, have significantly improved the performance of emotion classification systems. These approaches allow models to learn complex emotional patterns from large datasets and provide more accurate predictions.
Despite these advancements, several challenges still exist in the practical deployment of AI-based mental health systems. Variability in emotional expression, limited dataset diversity, background noise in speech recordings, and privacy concerns related to sensitive mental health data can affect system reliability. Furthermore, AI systems cannot replace the empathy and professional expertise of trained mental health practitioners.
Future research should focus on developing multimodal emotion recognition systems and improving model robustness, enabling AI technologies to play a supportive role in enhancing mental health awareness and early emotional distress detection.
References
[1] Y. Kasanneni, A. Duggal, R. Sathyaraj, and S. P. Raja, \"Effective analysis of machine and deep learning methods for diagnosing mental health using social media conversations,\" IEEE Trans. Comput. Soc. Syst., vol. 12, no. 1, pp. 274–290, Feb. 2025, doi: 10.1109/TCSS.2024.3487168.
[2] A. Abilkaiyrkyzy, F. Laamarti, M. Hamdi, and A. El Saddik, \"Dialogue system for early mental illness detection: Toward a digital twin solution,\" IEEE Access, vol. 12, pp. 1-12, 2024, doi: 10.1109/ACCESS.2023.3348783.
[3] G. H. Mohmad Dar and R. Delhibabu, \"Speech databases, speech features, and classifiers in speech emotion recognition: A review,\" IEEE Access, vol. 12, pp. 151122–151146, 2024, doi: 10.1109/ACCESS.2024.3476960.
[4] K. Denecke, S. Vaaheesan, and A. Arulnathan, \"A mental health chatbot for regulating emotions (SERMO) - Concept and usability test,\" IEEE Trans. Emerg. Top. Comput., vol. 9, no. 3, pp. 1172–1183, Jul.-Sep. 2021, doi: 10.1109/TETC.2020.2974478.
[5] S. Sawhney, H. Joshi, S. Gandhi, R. Shah, and R. Mahata, “Mixture of experts for depression and anxiety disorder prediction from textual and non-textual social media data,” in Proc. 2021 IEEE International Conference on Big Data (Big Data), 2021, pp. 1–10.
[6] S. Jaybhaye, S. Deogade, S. Sanap, T. Mali, and U. Shendge, \"BlissBot: Mental Health Chatbot,\" in Proc. 2024 IEEE Region 10 Symp. (TENSYMP), 2024, pp. 1–5, doi: 10.1109/TENSYMP61132.2024.10752315.
[7] J. Deng and F. Ren, “A survey of textual emotion recognition and its challenges,” IEEE Transactions on Affective Computing, vol. 12, no. 2, pp. 286–299, Apr.–Jun. 2021, doi: 10.1109/TAFFC.2018.2874999.
[8] M. L. Joshi, “Depression detection using emotional artificial intelligence: A survey,” Materials Today: Proceedings, vol. 56, pp. 1916–1922, 2022, doi: 10.1016/j.matpr.2021.11.531.
[9] S. Mirsamadi, E. Barsoum, and C. Zhang, \"Automatic speech emotion recognition using recurrent neural networks with local attention,\" in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2017, pp. 2225–2229, doi: 10.1109/ICASSP.2017.7952504.
[10] W. Fan, X. Xu, X. Xing, W. Chen, and D. Huang, \"LSSED: A large-scale dataset and benchmark for speech emotion recognition,\" in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), 2021, pp. 6384–6388, doi: 10.1109/ICASSP39728.2021.9414542.
[11] Y. Yang, C. Fairbairn, and J. F. Cohn, “Detecting depression severity from vocal prosody,” IEEE Transactions on Affective Computing, vol. 12, no. 4, pp. 988–1001, Oct.–Dec. 2021, doi: 10.1109/TAFFC.2019.2926968.
[12] K. Denecke, S. Vaaheesan, and A. Arulnathan, \"A mental health chatbot for regulating emotions (SERMO) - Concept and usability test,\" IEEE Trans. Emerg. Top. Comput., vol. 9, no. 3, pp. 1172–1183, Jul.-Sep. 2021, doi: 10.1109/TETC.2020.2974478.
[13] F. Laamarti, H. F. Badawi, Y. Ding, F. Arafsha, B. Hafidh, and A. E. Saddik, ‘‘An ISO/IEEE 11073 standardized digital twin framework for health and well-being in smart cities,’’ IEEE Access, vol. 8, pp. 105950–105961, 2020.
[14] M. Ogamba, J. Olukuru, J. Gitonga, J. Sevilla, and B. Muriithi, \"Wellness Buddy: An AI Mental Health Chatbot for Kenyan University Students,\" in Proc. 2023 First Int. Conf. Adv. Artif. Intell. African Context (AAIAC), 2023, pp. 1–6, doi: 10.1109/AAIAC60008.2023.10465291.
[15] K. Hegde and H. Jayalath, \"Emotions in the Loop: A Survey of Affective Computing for Emotional Support,\" arXiv preprint arXiv:2505.01542, 2025.
[16] A. Ghandeharioun, D. McDuff, M. Czerwinski, and K. Rowan, \"EMMA: An emotion-aware wellbeing chatbot,\" arXiv preprint arXiv:1812.11423, 2019.
[17] K. Denecke and E. Gabarron, \"The ethical aspects of integrating sentiment and emotion analysis in chatbots for depression intervention,\" Front. Psychiatry, vol. 15, no. 1462083, Nov. 2024, doi: 10.3389/fpsyt.2024.1462083.
[18] D. B. Olawade, O. Z. Wada, A. Odetayo, A. C. David-Olawade, F. Asaolu, and J. Eberhardt, \"Enhancing mental health with Artificial Intelligence: Current trends and future prospects,\" J. Med. Surg. Public Health, vol. 3, p. 100099, 2024, doi: 10.1016/j.glmedi.2024.100099.
[19] J. Hu et al., \"The acoustically emotion-aware conversational agent with speech emotion recognition and empathetic responses,\" IEEE Trans. Affect. Comput., vol. 14, no. 1, pp. 17–30, Jan.-Mar. 2023.
[20] R. Jayabhaduri, G. Ceralaathan, A. Vijayaraghavan, A. K. R, and S. S. S, \"AI powered chatbot for mental health treatment,\" in Proc. 2024 First Int. Conf. Technol. Innov. Adv. Comput. (TIACOMP), Bengaluru, India, 2024, pp. 168–172, doi: 10.1109/TIACOMP64125.2024.00037.