Keyloggers are among the most severe forms of malware, designed to silently record keystrokes in order to steal sensitive user information such as passwords, banking credentials, and personal data. With the increasing sophistication of modern malware, traditional signature-based antivirus solutions have become ineffective against stealthy and zero-day keylogger attacks. To address this growing security concern, this project presents an Advanced Keylogger Detection System based on behavioral analysis using Isolation Forest and Long Short-Term Memory (LSTM) models.
The proposed system continuously monitors low-level system activities including keystroke dynamics, process execution behavior, system API calls, file access operations, and network communication patterns. Isolation Forest, an unsupervised anomaly detection algorithm, is utilized to identify deviations from normal system behavior and to detect unknown or previously unseen keylogger activities by isolating anomalous behavior patterns. To further strengthen detection accuracy, an LSTM-based deep learning model analyzes sequential and time-series behavioral data to identify persistent and stealthy keylogging activities that evolve over extended periods.
By integrating anomaly detection with temporal behavior analysis, the system effectively distinguishes legitimate applications from malicious keylogging processes. Upon detection of suspicious activity, the system generates real-time alerts and initiates automated response mechanisms such as process termination and detailed activity logging for forensic analysis. Experimental evaluation demonstrates that the proposed system achieves improved detection accuracy with reduced false positives, providing a robust and proactive defence mechanism suitable for modern endpoint security environments.
Introduction
The paper presents an Advanced Keylogger Detection System designed to protect endpoint devices from stealthy malware that records keystrokes and steals sensitive data such as passwords and financial information. It addresses the limitations of traditional signature-based antivirus systems, which fail to detect zero-day and behavior-mimicking keyloggers.
The proposed solution uses a hybrid machine learning approach combining:
Isolation Forest for unsupervised anomaly detection of abnormal system behavior (CPU usage, keystroke rate, process activity, etc.)
LSTM (Long Short-Term Memory) networks for analyzing sequential and time-dependent behavior patterns to detect persistent or low-intensity keylogging activity
The system collects real-time behavioral telemetry (keystroke rate, CPU usage, timing intervals, and network activity), preprocesses it, and evaluates it through a two-stage detection pipeline. A hybrid decision engine combines both model outputs to classify activity as normal, medium risk, or high risk.
When suspicious behavior is detected, the system triggers real-time alerts, process termination, and forensic logging, all visualized through a dashboard built using FastAPI and WebSockets.
Experimental results show that:
Normal system behavior is correctly classified with low false positives
The hybrid system effectively distinguishes normal, anomalous, and malicious patterns
It operates in real time with stable performance and low overhead
Overall, the approach improves endpoint security by combining anomaly detection and temporal behavior modeling, enabling more accurate and adaptive detection of modern keyloggers compared to traditional rule-based systems.
Conclusion
This paper presented an Advanced Behavioral Keylogger Detection System based on a hybrid machine learning framework combining Isolation Forest and Long Short-Term Memory (LSTM) networks. The proposed system focuses on behavioral analysis rather than direct keystroke interception, ensuring user privacy while enabling effective detection of stealthy keylogging activities.
Experimental results demonstrated that the Isolation Forest model accurately established a baseline of normal system behavior and efficiently identified anomalous deviations. The LSTM model complemented this approach by capturing temporal patterns and sequential [15] dependencies indicative of automated or malicious activity. The hybrid decision engine successfully integrated the outputs of both models, enabling reliable classification of system behavior into low, medium, and high threat levels.
Real-time evaluation using a streaming telemetry pipeline confirmed that the system operates with low latency and high stability. The interactive dashboard provided clear visualization of behavioral trends, anomaly scores, and threat assessments, supporting interpretability and ease of analysis. The system also demonstrated strong false-positive control by correctly identifying benign system processes and suppressing unnecessary alerts.Overall, the proposed approach proves to be effective for academic demonstration and research in behavioral-based malware detection.
References
[1] F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation Forest,” in Proceedings of the 2008 IEEE International Conference on Data Mining, Pisa, Italy, 2008, pp. 413–422.
[2] S. Hochreiter and J. Schmidhuber, “Long Short Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
[3] M. Egele, T. Scholte, E. Kirda, and C. Kruegel, “A Survey on Automated Dynamic Malware Analysis Techniques and Tools,” ACM Computing Surveys, vol. 44, no. 2, pp. 1–42, Mar. 2012.
[4] A. A. Cárdenas, P. K. Manadhata, and S. P. Rajan, “Big Data Analytics for Security Intelligence,” IEEE Security & Privacy, vol. 11, no. 6, pp. 74–76, Nov.–Dec. 2013.
[5] N. Moustafa and J. Slay, “UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems,” in Proceedings of the Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 2015, pp. 1–6.
[6] I. Goodfellow, Y. Bengio, and A. Courville, Deep Learning. Cambridge, MA, USA: MIT Press, 2016.
[7] E. Bertino and N. Islam, “Botnets and Internet of Things Security,” Computer, vol. 50, no. 2, pp. 76–79, Feb. 2017.
[8] J. Brownlee, Deep Learning for Time Series Forecasting. Melbourne, Australia: Machine Learning Mastery, 2018.
[9] T. Chen and C. Guestrin, “XGBoost: A Scalable Tree Boosting System,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 2016, pp. 785–794.
[10] A. Khraisat, I. Gondal, P. Vamplew, and J. Kamruzzaman, “Survey of Intrusion Detection Systems: Techniques, Datasets and Challenges,” Cybersecurity, vol. 2, no. 20, pp. 1–22, Dec. 2019
[11] D. E. Denning, “An Intrusion-Detection Model,” IEEE Transactions on Software Engineering, vol. SE-13, no. 2, pp. 222–232, Feb. 1987.
[12] S. Forrest, S. A. Hofmeyr, A. Somayaji, and T. A. Longstaff, “A Sense of Self for Unix Processes,” in Proceedings of the IEEE Symposium on Security and Privacy, Oakland, CA, USA, 1996, pp. 120–128.
[13] P. Malhotra, L. Vig, G. Shroff, and P. Agarwal, “Long Short-Term Memory Networks for Anomaly Detection in Time Series,” in Proceedings of the European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, 2015, pp. 89–94.
[14] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, “A Deep Learning Approach to Network Intrusion Detection,” IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, no. 1, pp. 41–50, Feb. 2018.
[15] A. L. Buczak and E. Guven, “A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection,” IEEE Communications Surveys & Tutorials, vol. 18, no. 2, pp. 1153–1176, Second Quarter 2016.
[16] M. Conti, T. Dargahi, A. Dehghantanha, and M. Conti, “A Survey on Security and Privacy Issues of Internet of Things,” IEEE Communications Surveys & Tutorials, vol. 20, no. 3, pp. 2127–2162, Third Quarter 2018.
[17] R. Mitchell and I.-R. Chen, “A Survey of Intrusion Detection Techniques for Cyber-Physical Systems,” ACM Computing Surveys, vol. 46, no. 4, pp. 1–29, Mar. 2014