Bridging the Gap Between Multimodal Sentiment Analysis and Explainable AI: A Conceptual Framework

Authors: Akshatha Rithesh, Divya M O, Neethu K, Sreeshma Mohan

DOI Link: https://doi.org/10.22214/ijraset.2026.79064

Abstract

The rapid development of social media has raised the importance of sentiment analysis in the context of user emotions. Recent developments in multimodal sentiment analysis have incorporated various features such as text, emojis, and hashtags to improve the predictive accuracy of the system. However, most of the existing multimodal sentiment analysis models are complex and involve deep learning and language models that are black box in nature. This makes it difficult for the user to trust the system. On the other hand, Explainable Artificial Intelligence (XAI) has been developed to improve the transparency of the system. However, the integration of multimodal sentiment analysis and Explainable Artificial Intelligence is in its infancy.The current paper presents a conceptual framework that bridges the gap in multimodal sentiment analysis and Explainable Artificial Intelligence. The proposed framework incorporates various features such as text, emojis, and hashtags with the Explainable Artificial Intelligence approach. The proposed approach will improve the transparency and usability of the multimodal sentiment analysis system. The research challenges and implementation of the proposed approach are also highlighted in the current study.

Introduction

Sentiment analysis plays a crucial role in understanding user opinions and emotions from digital platforms such as social media, online reviews, and forums. While traditional approaches mainly relied on text, recent research has shown that incorporating emojis and hashtags significantly improves sentiment detection by providing additional emotional and contextual information. Transformer-based deep learning models have further enhanced prediction accuracy through better contextual understanding, leading to the emergence of multimodal sentiment analysis. However, these models often function as black-box systems, limiting transparency and user trust.

To address this limitation, Explainable Artificial Intelligence (XAI) has gained importance by making AI decisions more transparent and interpretable. Although multimodal sentiment analysis and XAI have individually advanced, they are generally studied as separate research areas. This paper proposes a unified, human-centric framework that combines text, emojis, and hashtags with explainability techniques such as SHAP, LIME, attention visualization, and rule-based explanations. The framework consists of six layers: multimodal input, preprocessing, feature extraction, sentiment classification, explainability, and output. It not only predicts sentiment (positive, negative, or neutral) but also provides clear explanations for its decisions. By integrating multimodal learning with explainable AI, the proposed framework improves both prediction accuracy and interpretability, making it suitable for applications such as mental health monitoring, social media analytics, and customer feedback systems.

Conclusion

This paper presented a conceptual framework aimed at bridging the gap between multimodal sentiment analysis and explainable artificial intelligence. Although recent approaches have significantly improved sentiment prediction accuracy through deep learning and multimodal integration, they often operate as complex black-box systems, limiting transparency and interpretability [2]–[5], [8]–[10]. At the same time, explainable AI techniques have made considerable progress in improving the understanding of model decisions; however, these methods are largely focused on text-based systems and do not fully address multimodal scenarios [11]–[14]. To overcome these limitations, the proposed framework integrates textual data, emojis, and hashtags with explainability mechanisms to provide both accurate and interpretable sentiment predictions. By introducing a dedicated explainability layer, the framework enables the identification of feature contributions and generates human-understandable explanations, thereby enhancing user trust and system usability. This unified approach contributes to the development of transparent and human-centric sentiment analysis systems, in line with recent advancements in explainable and responsible AI [6], [15].

References

[1] Felbo, B., Mislove, A., Søgaard, A., Rahwan, I., & Lehmann, S. (2017). Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm. Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (EMNLP). [2] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT). [3] Li, S., & Okada, S. (2023). Interpretable multimodal sentiment analysis based on textual modality descriptions using large-scale language models. arXiv preprint. [4] Yu, Y., Zhao, M., Qi, S., Sun, F., Wang, B., Guo, W., Wang, X., Yang, L., & Niu, D. (2023). ConKI: Contrastive knowledge injection for multimodal sentiment analysis. arXiv preprint. [5] Miah, M. S. U., et al. (2024). A multimodal approach to cross-lingual sentiment analysis using transformers and large language models. Scientific Reports. [6] Mabokela, K. R., et al. (2024). Explainable pre-trained language models for sentiment analysis. Big Data and Cognitive Computing. [7] Diwali, A. (2024). Sentiment analysis meets explainable artificial intelligence: A survey. IEEE Transactions. [8] Chen, F., Huang, P., Ge, X., Huang, J., & Bao, Z. (2024). Multimodal sentiment analysis based on causal reasoning. arXiv preprint. [9] Hill, C. (2025). An analytical assessment of sentiment analysis trends and applications (2012–2024). ScienceDirect. [10] Dolhopolov, S., Riabchun, Y., Delembovskyi, M., &Molodid, O. (2026). Explainable artificial intelligence for multimodal sentiment analysis in revitalization project management. [11] Ribeiro, M. T., Singh, S., &Guestrin, C. (2016). “Why should I trust you?” Explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD). [12] Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems (NeurIPS). [13] Adadi, A., & Berrada, M. (2018). Peeking inside the black-box: A survey on explainable artificial intelligence. IEEE Access. [14] Arrieta, A. B., Díaz-Rodríguez, N., et al. (2020). Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion. [15] Longo, L. (2024). Explainable artificial intelligence (XAI) 2.0: Open challenges and future directions. Information Fusion.

Copyright

Copyright © 2026 Akshatha Rithesh, Divya M O, Neethu K, Sreeshma Mohan. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET79064

Publish Date : 2026-03-30

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here