Insider threats continue to be one of cybersecurity’s most critical problems.Their presence is invisible amid normal user patterns, buried under what looks like ordinary user activity but carries hidden intent. Most anomaly detection systems stick to monitoring system behavior—logins, file transfers, network use—yet they miss something essential: the emotional undertone that often appears before a real incident unfolds. This study introduces the Sentiment-Augmented Random Forest (SARF), a compact and explainable model built to fuse behavioral data with emotional context drawn from user communications. SARF doesn’t just watch what people do it listens to how they sound when they do it. By combining activity metrics with measures of sentiment polarity and subjectivity, it captures the subtle psychological drift that can signal risk before it turns into damage.
We generated a synthetic dataset of 2000 simulated enterprise users to test the model, representing both normal and insider-like behavior. SARF reached an accuracy of 96.4% and an ROC-AUC score of 0.974. The results held up under statistical scrutiny, showing clear gains over baseline models. Feature importance plots make the model’s decisions transparent, giving analysts a clear look into what drove each prediction. Beyond its technical results, SARF connects two worlds that rarely meet cybersecurity and human psychology. It’s a step toward defense systems that don’t just monitor data, but understand emotion, turning cybersecurity from a purely technical shield into something more perceptive, more human.
Introduction
The text presents SARF, an insider threat detection framework that addresses a key weakness of traditional security models: their focus on technical anomalies while ignoring human emotional signals. Insider threats are difficult to detect because insiders are trusted users with legitimate access, and early warning signs often appear as subtle behavioral or emotional changes in everyday communication.
SARF combines behavioral data (such as logins, file access, USB usage, and emails sent) with sentiment features (polarity and subjectivity) extracted from internal messages. This fusion allows the model to detect not only unusual actions but also emotional cues like stress or frustration that may precede malicious behavior.
To evaluate SARF, the authors created a realistic synthetic dataset of 2,000 users with a 10% insider ratio. A Random Forest model was trained using both behavioral and sentiment features and compared against Isolation Forest, Autoencoder, and SVM baselines. SARF achieved the best overall performance, with higher accuracy, F1-score, and ROC-AUC, while also offering strong interpretability through feature importance analysis.
The key contributions include multimodal feature fusion, improved explainability, a robust synthetic dataset, superior and balanced detection performance, and practical deployment readiness. Results show that sentiment features are among the most influential predictors, confirming that integrating emotional and technical signals significantly improves insider threat detection.
Conclusion
This study introduces SARF, a fresh approach to spotting insider threats - mixing how people act with how they feel, using a clear-as-day Random Forest setup. Instead of just tech red flags, it weaves in mood tone and personal bias pulled from messages, tied to real-world actions through logical links. The model went through tough tests using fake data that feels real, built to act like actual company threats from insiders. Instead of Isolation Forest, Autoencoder, or SVM, it did way better - hit 96.4Beyond the solid stats, SARF reveals which traits stand out - giving cyber pros a clear look they rarely get: honest visibility. Instead of just sounding alerts, it spells out the reasons behind them. This clarity flips basic forecasts into useful guidance, letting experts catch sketchy actions before things spiral. SARF points toward tools that blend reasoning with awareness, picking up on quiet danger signals in mood and small changes how users act. Even so, fake data can only do so much. Actual insider actions are way more chaotic loaded with shifting feelings, complex reasons, and talking styles that change out of nowhere. Coming studies will test SARF using real company info, bring in time-based tracking to spot growing threats earlier, while also boosting language checks with smarter NLP that gets sarcasm, irony, and mood shifts in real situations. Adding extra data - say, network logs or device activity - to SARF might boost its ability to catch threats. Overall, this approach proves that behavioral insights and machine learning can actually fit together well. With data accuracy paired to human insight, SARF shifts security from just watching to acting ahead - systems that don’t merely spot user actions but grasp the reasons behind them.
References
[1] F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” in Proc. IEEE ICDM, 2008.
[2] Y. Sakurada and T. Yairi, “Anomaly detection using autoencoders with nonlinear dimensionality reduction,” in Proc. MLSDA, 2014.
[3] C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.
[4] J. Ma, A. Zhang, and L. Liu, “Text-driven insider threat detection using sentiment analysis,” Information Security Journal, vol. 29, no. 1, pp. 42–52, 2020.
[5] L. Chen, K. Li, and B. Rong, “Hybrid models for insider threat detection with sentiment features,” IEEE Trans. Information Forensics and Security, 2024.
[6] S. Loria,TextBlob Documentation, 2014. [Online]. Available: [https://textblob.readthedocs.io/en/dev/]
[7] E. Cole and S. Ring, “Insider Threat: Protecting the Enterprise from Sabotage, Spying, and Theft,” Elsevier, 2005.
[8] J. Cappelli, T. Moore, R. Trzeciak, “Common Sense Guide to Mitigating Insider Threats,” Carnegie Mellon University, 2012.
[9] K. Kent et al., “Data Set Challenge: Insider Threat Detection,” IEEE Security and Privacy Workshops, 2015.
[10] CERT Insider Threat Center, “Insider Threat Test Dataset,” [Online].Available: [https://resources.sei.cmu.edu](https://resources.sei.cmu.edu)
[11] D. Gunning, “Explainable AI (XAI),” Defense Advanced Research Projects Agency (DARPA), 2017.
[12] R. Guidotti et al., “A survey of methods for explaining black box models,” ACM Computing Surveys, vol. 51, no. 5, 2018.
[13] B. Pang and L. Lee, “Opinion mining and sentiment analysis,” Foundations and Trends in Information Retrieval, 2(1–2), 2008.
[14] J. Salas, “Insider threat detection and mitigation: Challenges and solutions,” Journal of Cybersecurity, 2023.
[15] A. Gupta and P. Kumar, “Multimodal insider threat detection using behavioral and sentiment fusion,” IEEE Transactions on Information Forensics and Security, 2025.