Survey on Personality Detection on Multilingual Dataset using Machine Learning and Explainable AI

Authors: Riya Bhaskar, Arunima Jaiswal, Harshita Bhakhand, Divya Goyal, Shreya Ramhans

DOI Link: https://doi.org/10.22214/ijraset.2025.71767

Abstract

Personality detection is a key area in AI, significantly impacting psychological profiling, behavioral analysis, and personalized recommendations. It addresses the new frontiers of multilingual personality detection using machine learning (ML) and Explainable?AI (XAI). In this review paper, we studied the state-of-art research on deep learning and ensemble?methods, while also highlighting the contributions of XAI frameworks like SHAP and LIME in improving model interpretation. We analyzed based metrics on use?cases with fair and transparent measures to assess how ML methods generalize across languages and cultural settings through the survey of various multilingual dataset. Our findings highlight the significance of XAI in bridging AI predictions with human understanding, making AI-driven personality detection more interpretable and ethically responsible. This review contributes to personality computing by synthesizing advancements in ML-based personality detection, discussing challenges, and identifying future research directions to develop fair, accurate, and explainable AI-driven psychological assessments across languages.

Introduction

Personality detection plays a crucial role in AI and human-computer interaction, with applications in psychological assessment, personalized recommendations, and recruitment. Recent advances in machine learning (ML) and Explainable AI (XAI) have enhanced the ability to predict personality traits from text, but most existing models focus on monolingual data. This research addresses the challenge of multilingual personality detection, aiming to build robust, interpretable models that generalize across languages and cultures.

The study leverages diverse multilingual datasets and combines various ML methods—including deep learning, ensemble models, and transformers—with XAI techniques like SHAP and LIME to ensure transparency. It highlights the importance of interpretability, ethical responsibility, and practical deployment challenges.

A thorough literature review showcases a range of techniques and datasets used in personality detection, noting that while accuracy has improved, limitations remain regarding multilingual generalization, dataset availability, model interpretability, and ethical concerns.

Methodologically, the research outlines a pipeline involving data collection from social media and linguistic sources, preprocessing, feature extraction (TF-IDF, Word2Vec, BERT, LIWC), model training with supervised and unsupervised methods, evaluation using metrics like accuracy and F1-score, and model optimization through hyperparameter tuning. Diverse datasets—ranging from social media posts to facial images—support the analysis.

The paper also reviews a broad spectrum of techniques from traditional ML algorithms (Logistic Regression, SVM, XGBoost) to deep learning models (CNNs, RNNs, transformers) and emphasizes the integration of XAI for model transparency.

Finally, the field is evolving toward using multilingual datasets beyond English-centric data, expanding personality detection capabilities across languages and cultures, with ongoing challenges in ethical use, dataset diversity, and computational efficiency.

Conclusion

Personality detection is the process of identifying individual personality traits and has gained significant attention. The goal of this survey paper is to provide an overview of recent advancements in personality detection methods. In this survey, we reviewed different research studies on personality detection. The Big five personality traits are the most widely used framework for evaluating personality. We studied different deep learning approaches such as CNN, LSTM models, and K-S sets, comparing their performance across different datasets. A detailed analysis of datasets and their sources is also provided to give a fully understanding of the data used in personality detection. While the survey provides a broad overview, it lacks in depth analysis of specific methods such as LSTM models. We also recognized the need to include more research on personality detection. To improve this further iterations should consider incorporating additional personality detection methods. Expanding the scope to include approaches beyond deep learning and including hybrid methods could offer a more complete perspective on the current state of personality detection

References

[1] “Machine learning,” Wikipedia. Feb. 12, 2025. Accessed: Feb. 16, 2025. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Machine_learning&oldid=1275382050 [2] “Explainable AI(XAI) Using LIME,” GeeksforGeeks. Accessed: Dec. 03, 2024. [Online]. Available: https://www.geeksforgeeks.org/introduction-to-explainable-aixai-using-lime/ [3] “Introduction to Deep Learning - GeeksforGeeks.” Accessed: Feb. 16, 2025. [Online]. Available: https://www.geeksforgeeks.org/introduction-deep-learning/ [4] “Welcome to the SHAP documentation — SHAP latest documentation.” Accessed: Feb. 05, 2025. [Online]. Available: https://shap.readthedocs.io/en/latest/ [5] F. Liu, J. Perez, and S. Nowson, “A Language-independent and Compositional Model for Personality Trait Recognition from Short Texts,” Oct. 14, 2016, arXiv: arXiv:1610.04345. doi: 10.48550/arXiv.1610.04345. [6] X. Sun, B. Liu, J. Cao, J. Luo, and X. Shen, “Who Am I? Personality Detection Based on Deep Learning for Texts,” in 2018 IEEE International Conference on Communications (ICC), Kansas City, MO: IEEE, May 2018, pp. 1–6. doi: 10.1109/ICC.2018.8422105. [7] M. Khwaja, S. S. Vaid, S. Zannone, G. M. Harari, A. A. Faisal, and A. Matic, “Modeling Personality vs. Modeling Personalidad: In-the-wild Mobile Data Analysis in Five Countries Suggests Cultural Impact on Personality Models,” Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., vol. 3, no. 3, pp. 1–24, Sep. 2019, doi: 10.1145/3351246. [8] F. B. Siddique, D. Bertero, and P. Fung, “GlobalTrait: Personality Alignment of Multilingual Word Embeddings,” Nov. 20, 2018, arXiv: arXiv:1811.00240. doi: 10.48550/arXiv.1811.00240. [9] S. Leonardi, D. Monti, G. Rizzo, and M. Morisio, “Multilingual Transformer-Based Personality Traits Estimation,” Information, vol. 11, no. 4, p. 179, Mar. 2020, doi: 10.3390/info11040179. [10] T. T. Sasidhar, P. B, and S. K. P, “Emotion Detection in Hinglish(Hindi+English) Code-Mixed Social Media Text,” Procedia Computer Science, vol. 171, pp. 1346–1352, 2020, doi: 10.1016/j.procs.2020.04.144. [11] A. S. Khan, H. Ahmad, M. Zubair, F. Khan, A. Arif, and H. Ali, “Personality Classification from Online Text using Machine Learning Approach,” IJACSA, vol. 11, no. 3, 2020, doi: 10.14569/IJACSA.2020.0110358. [12] Hamdard University Karachi, Pakistan et al., “A Machine Learning Approach for Personality Type Identification using MBTI Framework,” JISR-C, vol. 19, no. 2, 2021, doi: 10.31645/JISRC.43.19.2.2. [13] F. M. Deilami, H. Sadr, and M. Nazari, “Using Machine Learning-Based Models for Personality Recognition”. [14] H. Christian, D. Suhartono, A. Chowanda, and K. Z. Zamli, “Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging,” J Big Data, vol. 8, no. 1, p. 68, Dec. 2021, doi: 10.1186/s40537-021-00459-1. [15] Y. Ramon, R. A. Farrokhnia, S. C. Matz, and D. Martens, “Explainable AI for Psychological Profiling from Behavioral Data: An Application to Big Five Personality Predictions from Financial Transaction Records,” Information, vol. 12, no. 12, p. 518, Dec. 2021, doi: 10.3390/info12120518. [16] G. R. Savant, “Personality Classification with Data Mining,” vol. 7, no. 5, 2022. [17] E. Kerz, Y. Qiao, S. Zanwar, and D. Wiechmann, “Pushing on Personality Detection from Verbal Behavior: A Transformer Meets Text Contours of Psycholinguistic Features,” Apr. 10, 2022, arXiv: arXiv:2204.04629. doi: 10.48550/arXiv.2204.04629. [18] Prof. A. Chincholkar, D. Bhosale, S. Adsul, A. Bodkhe, and R. Kadam, “A Comprehensive Survey on Personality Prediction Using Machine Learning Techniques,” INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN COMPUTER AND COMMUNICATION ENGINEERING, vol. 12, no. 11, Nov. 2023, doi: 10.17148/IJARCCE.2023.121120. [19] S. Garg and A. Garg, “Comparison of machine learning algorithms for content based personality resolution of tweets,” Social Sciences & Humanities Open, vol. 4, no. 1, p. 100178, 2021, doi: 10.1016/j.ssaho.2021.100178. [20] S. M. Alsubhi, A. M. Alhothali, and A. A. AlMansour, “AraBig5: The Big Five Personality Traits Prediction Using Machine Learning Algorithm on Arabic Tweets,” IEEE Access, vol. 11, pp. 112526–112534, 2023, doi: 10.1109/ACCESS.2023.3297981. [21] S. Gupta, J. Hingorani, S. Singh, and N. Phadnis, “DESIGNING OF WEB PORTAL FOR TRAINING AND PLACEMENT CELL,” vol. 08, no. 05, 2021. [22] M. Dandash and M. Asadpour, “Personality Analysis for Social Media Users using Arabic language and its Effect on Sentiment Analysis”. [23] Y. Mehta, N. Majumder, A. Gelbukh, and E. Cambria, “Recent trends in deep learning based personality detection,” Artif Intell Rev, vol. 53, no. 4, pp. 2313–2339, Apr. 2020, doi: 10.1007/s10462-019-09770-z. [24] M. Murphy, “Artificial Intelligence and Personality: Large Language Models’ Ability to Predict Personality Type,” Emerging Media, p. 27523543241257291, Jun. 2024, doi: 10.1177/27523543241257291. [25] K. Chraibi, I. Chaker, and A. Zahi, “Predicting personality traits from Arabic text: an investigation of textual and demographic features with feature selection analysis,” IJECE, vol. 15, no. 1, p. 970, Feb. 2025, doi: 10.11591/ijece.v15i1.pp970-979. [26] D. Saeteros, B. Domínguez-Álvarez, D. Gallardo-Pujol, and D. Ortiz-Martínez, “The Written Self: Decoding Personality and Sex Differences Through Explainable AI,” Jan. 10, 2025, PsyArXiv. doi: 10.31234/osf.io/eja7r. [27] U. Rudra, A. N. Chy, and Md. H. Seddiqui, “Personality Traits Detection in Bangla: A Benchmark Dataset with Comparative Performance Analysis of State-of-the-Art Methods,” in 2020 23rd International Conference on Computer and Information Technology (ICCIT), DHAKA, Bangladesh: IEEE, Dec. 2020, pp. 1–6. doi: 10.1109/ICCIT51783.2020.9392722. [28] “Myers–Briggs Type Indicator,” Wikipedia. Jan. 31, 2025. Accessed: Feb. 05, 2025. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Myers%E2%80%93Briggs_Type_Indicator&oldid=1273106391 [29] “Financial Transactions Dataset.” Accessed: Feb. 16, 2025. [Online]. Available: https://www.kaggle.com/datasets/cankatsrc/financial-transactions-dataset [30] A. Kazemeini, S. S. Roy, R. E. Mercer, and E. Cambria, “Interpretable Representation Learning for Personality Detection,” in 2021 International Conference on Data Mining Workshops (ICDMW), Auckland, New Zealand: IEEE, Dec. 2021, pp. 158–165. doi: 10.1109/ICDMW53433.2021.00026. [31] “Twitter US Airline Sentiment.” Accessed: Feb. 16, 2025. [Online]. Available: https://www.kaggle.com/datasets/crowdflower/twitter-airline-sentiment [32] “GoEmotions: A Dataset for Fine-Grained Emotion Classification.” Accessed: Feb. 16, 2025. [Online]. Available: https://research.google/blog/goemotions-a-dataset-for-fine-grained-emotion-classification/ [33] Y. Chai, D. Kakkar, J. Palacios, and S. Zheng, “Twitter Sentiment Geographical Index Dataset,” Sci Data, vol. 10, p. 684, Oct. 2023, doi: 10.1038/s41597-023-02572-7. [34] [M. H. Yimer, Y. Yu, K. Adu, E. Favour, S. M. Liyih, and R. A. Patamia, “Music Genre Classification using Deep Neural Networks,” in 2023 35th Chinese Control and Decision Conference (CCDC), May 2023, pp. 2384–2391. doi: 10.1109/CCDC58219.2023.10327367. [35] “essay.” Accessed: Oct. 08, 2024. [Online]. Available: https://paperswithcode.com/dataset/asap [36] “Logistic regression - Wikipedia.” Accessed: Nov. 25, 2024. [Online]. Available: https://en.wikipedia.org/wiki/Logistic_regression [37] “xgBoost.” Accessed: Oct. 09, 2024. [Online]. Available: https://xgboost.readthedocs.io/en/latest/ [38] “CNN vs. RNN: How are they different? | TechTarget.” Accessed: Nov. 25, 2024. [Online]. Available: https://www.techtarget.com/searchenterpriseai/feature/CNN-vs-RNN-How-they-differ-and-where-they-overlap [39] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding,” May 24, 2019, arXiv: arXiv:1810.04805. doi: 10.48550/arXiv.1810.04805. [40] “glove.” Accessed: Oct. 08, 2024. [Online]. Available: https://nlp.stanford.edu/projects/glove/ [41] “liwc.” Accessed: Oct. 08, 2024. [Online]. Available: https://www.liwc.app/ [42] “Multilayer perceptron,” Wikipedia. Dec. 29, 2024. Accessed: Feb. 16, 2025. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Multilayer_perceptron&oldid=1265916526 [43] “Bootstrap aggregating,” Wikipedia. Dec. 27, 2024. Accessed: Feb. 16, 2025. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Bootstrap_aggregating&oldid=1265529013 [44] “k-nearest neighbors algorithm,” Wikipedia. Feb. 05, 2025. Accessed: Feb. 16, 2025. [Online]. Available: https://en.wikipedia.org/w/index.php?title=K-nearest_neighbors_algorithm&oldid=1274106583 [45] “Deep belief network,” Wikipedia. Aug. 13, 2024. Accessed: Feb. 16, 2025. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Deep_belief_network&oldid=1240122786 [46] “Naive Bayes Classifiers - GeeksforGeeks.” Accessed: Nov. 25, 2024. [Online]. Available: https://www.geeksforgeeks.org/naive-bayes-classifiers/ [47] “Active shape model,” Wikipedia. Oct. 05, 2023. Accessed: Feb. 16, 2025. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Active_shape_model&oldid=1178707117 [48] “Support Vector Machine (SVM) Algorithm - GeeksforGeeks.” Accessed: Nov. 25, 2024. [Online]. Available: https://www.geeksforgeeks.org/support-vector-machine-algorithm/ [49] “AdaBoost,” Wikipedia. Nov. 23, 2024. Accessed: Feb. 16, 2025. [Online]. Available: https://en.wikipedia.org/w/index.php?title=AdaBoost&oldid=1259173406 [50] “Big Five personality traits,” Wikipedia. Feb. 13, 2025. Accessed: Feb. 16, 2025. [Online]. Available: https://en.wikipedia.org/w/index.php?title=Big_Five_personality_traits&oldid=1275467842 [51] “Introduction to Multi-Task Learning(MTL) for Deep Learning - GeeksforGeeks.” Accessed: Feb. 16, 2025. [Online]. Available: https://www.geeksforgeeks.org/introduction-to-multi-task-learningmtl-for-deep-learning/ [52] “Understanding TF-IDF (Term Frequency-Inverse Document Frequency) - GeeksforGeeks.” Accessed: Feb. 05, 2025. [Online]. Available: https://www.geeksforgeeks.org/understanding-tf-idf-term-frequency-inverse-document-frequency/ [53] “Word2vec - Wikipedia.” Accessed: Feb. 05, 2025. [Online]. Available: https://en.wikipedia.org/wiki/Word2vec [54] “Word Embeddings Using FastText - GeeksforGeeks.” Accessed: Feb. 16, 2025. [Online]. Available: https://www.geeksforgeeks.org/word-embeddings-using-fasttext/ [55] “XLNet.” Accessed: Feb. 16, 2025. [Online]. Available: https://huggingface.co/docs/transformers/model_doc/xlnet [56] “Accuracy vs. precision vs. recall in machine learning: what’s the difference?” Accessed: Nov. 21, 2024. [Online]. Available: https://www.evidentlyai.com/classification-metrics/accuracy-precision-recall [57] “roc and auc.” Accessed: Oct. 09, 2024. [Online]. Available: https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc#:~:text=Receiver%2Doperating%20characteristic%20curve%20(ROC,for%20choosing%20model%20and%20threshold [58] “comp.” Accessed: Oct. 09, 2024. [Online]. Available: https://www.geeksforgeeks.org/idea-of-efficiency-in-computational-thinking/

Copyright

Copyright © 2025 Riya Bhaskar, Arunima Jaiswal, Harshita Bhakhand, Divya Goyal, Shreya Ramhans. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET71767

Publish Date : 2025-05-28

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here