This research introducesanovelmethodof optimizing the process of cultural fit assessment in corporate recruitment through the combination of naturallanguageprocessing(NLP)methodsand semi-supervised self-training models. Using an unlabeled dataset drawn from tweets, we used preprocessing techniques and word2vec embeddings to determine semantic relationships in the text. K-means clustering was used to find the optimal number of clusters (k=4). In cooperation with psychology professionals, we marked 10% of the dataset based on the DISC model of psychology to facilitate model training. A mixed model integrating Convolutional Neural Networks(CNN)andLongShort-TermMemory networks (LSTM) was created to properly leverage labeled and unlabeled data via semi-supervisedlearning. This study will simplify culturalfitevaluation during recruitment, making it possible to quickly and accurately evaluate candidates.
Introduction
The modern recruitment process faces major challenges due to the surge in job applications, which overwhelms traditional face-to-face interviews and cultural fit evaluations. Previous studies have explored personality prediction from social media data but had limitations, such as relying on simpler models or focusing only on the Big Five personality traits.
This research proposes a more advanced approach using semi-supervised learning and the DISC personality model to automate personality assessment from social media behavior, specifically Twitter data. The study uses natural language processing (NLP) techniques like Word2Vec to convert tweets into numerical data, applies K-means clustering to identify patterns, and involves psychology experts labeling a small portion of data. A hybrid neural network combining CNN and LSTM processes both labeled and unlabeled data iteratively to improve personality classification accuracy.
This method aims to enhance recruitment by providing more precise and efficient candidate screening based on personality traits, offering a sophisticated alternative to conventional cultural fit tests.
The related works section reviews prior methods:
Using social media features and random forest regression for personality prediction.
Naive Bayes and KNN algorithms applied to Big Five traits from Twitter data.
Hybrid CNN-RNN models improving accuracy in personality detection.
NLP techniques like SciBERT and K-means for scientific text classification.
The methodology details data preprocessing (cleaning and lemmatization of tweets), model architecture (Word2Vec embeddings, K-means clustering with elbow method to find optimal clusters, and a semi-supervised CNN-LSTM model), and evaluation metrics (Elbow method, Silhouette score, Davies-Bouldin index for clustering quality, and categorical cross-entropy for classification loss).
Conclusion
This research illustratestheseamlessintegrationof computational rigor,languageanalysis,psychological understanding, and sophisticated methodologies. From the early clustering with K-means and NLP to building a sound hybrid modelintegratingCNNsand RNNs, every phase was carefully designed to reveal the complex patterns of Twitter user behavior. The results highlight the revolutionary potential of multidisciplinary methods in massive-scale social media analysis, opening doors to futurebreakthroughs in understanding and forecastingonline personalities. Future studies can investigatethe seamless fusion of text and image data, crossing platform boundaries to record users\' holistic online trace. Multimodal neural networks can potentially improve prediction accuracy and sentiment analysis. In addition, placing emphasis on privacy-preserving methods and adaptable fusion mechanisms will be crucial for realizing accurate model applicability in various digital environments.
References
[1] EmaUtami, AnggitDwiHartanto, SumarniAdi, IrwanOyong, andSuwantoRaharjo. Profiling analysis of disc personality traits based ontwitterpostsinbahasaindonesia.JournalofKingSaudUniversityComputerand Information Sciences, 34(2):264–269, 2022.
[2] M.Skowron, M. Tkalc?ic?, and B. Ferwerda. Fusing social media cues: Personality prediction from twitter and instagram. 2016.
[3] BayuYudhaPratama and RiyanartoSarno. Personality classification based on twitter text using naive bayes, knn and svm. In 2015 International Conference on Data and Software Engineering (ICoDSE), pages 170–174, 2015. 37
[4] Yuhao Pan, Zhiqun Chen, Yoshimi Suzuki, FumiyoFukumoto, and HiromitsuNishizaki. Sentiment analysis using semi supervised learning with few labeled data. In 2020 International Conference on Cyberworlds (CW), pages 231 234, 2020.
[5] Rosanna Turrisi. Beyond original research articles categorization via nlp. In Workshop on Human-in-the-Loop Applied Machine Learning (HITLAML), September 04-06, 2023- Belval, Luxembourg, 2023.
[6] Dr. Jayasudha J.S PrincySathyadas. Hybrid cnn-rnn model for per sonality detection. International Journal for Research in Engineering Application Management (IJREAM), 6:37–44, 2020.
[7] Stanford University. Stemming and lemmatization- stanfordnlp. https://nlp.stanford.edu/IR book/html/htmledition/stemming-and lemmatization1.html. [Online]. Accessed on: August 27, 2023.
[8] IBM. Natural language processing at ibm. IBM. Accessed: August 5, 2023.
[9] Andrew Maas, Raymond E Daly, Peter T Pham, Dan Huang, Andrew Y Ng, and Christopher Potts. Learning word vectors for sentiment analysis. In Proceedings of the 49th annual meeting of the association for computational linguistics: Human language technologies, pages 142–150, 2011.
[10] MaiteGim´enez , Roberto Paredes, and Paolo Rosso, Personality Recognition Using Convolutional Neural Networks, Springer Nature Switzerland, CICLing 2017, LNCS 10762, pp. 313–323, 2018.
[11] Rachma Indira and Warih Maharani. Personality detection on social media twitter using long short-term memory with word2vec. In 2021 IEEE International Conference on Communication, Networks and Satel lite (COMNETSAT), pages 64–69, 2021.
[12] Hochreiter, S., &Schmidhuber, J. Long short term memory. Neuralcomputation,9(8), 17351780, 1997