Malware Behavior Classification Using XGBoost with MITRE ATT&CK Technique Mapping

Authors: Dr. J. S. Kanchana, Monesh B., Amirthaganesh S., Visish B.

DOI Link: https://doi.org/10.22214/ijraset.2026.78798

Abstract

This project proposes a web-based automated malware analysis system designed to identify adversarial behaviors using the MITRE ATT&CK framework. The system allows users to upload suspicious malware samples through a secure web interface, which are then executed in an isolated and sandboxed Windows environment to prevent host compromise. During execution, Windows event logs and enhanced telemetry data are collected to capture detailed runtime behavior, including process creation, command execution, file and registry modifications, and network activity. These logs are processed and transformed into structured behavioral features that represent the actions performed by the malware. A machine learning–based multi-label classification approach using a binary relevance model is employed to map the extracted features to corresponding MITRE ATT&CK techniques. Independent Extreme Gradient Boosting (XGBoost) models are trained to detect the presence of individual attack techniques, enabling the identification of multiple techniques from a single malware execution. In addition, the system extracts relevant Indicators of Compromise (IOCs) such as malicious file paths, process names, and network endpoints. The proposed framework enables automated, explainable malware behavior classification and provides a systematic method for classifying malware activities based on standardized adversary techniques.

Introduction

Malware is increasingly sophisticated, making traditional signature-based detection methods inadequate. To address this, the research proposes a web-based automated malware analysis system that focuses on behavior-based detection using the MITRE ATT&CK framework. Users can safely upload suspicious files, which are executed in an isolated Windows sandbox. During execution, the system captures detailed runtime telemetry, including process creation, file and registry changes, network activity, and command-line usage.

These raw data are transformed into structured behavioral features, which are analyzed using machine learning, specifically a multi-label classification approach with XGBoost. This allows mapping a single malware sample to multiple ATT&CK techniques simultaneously. The system also extracts Indicators of Compromise (IOCs) such as suspicious file paths, process names, and network endpoints, aiding threat hunting and incident response.

System architecture includes:

Malware intake and sandbox execution for safe observation.
Windows event log collection for detailed activity monitoring.
Feature extraction and preprocessing to produce structured data.
MITRE ATT&CK mapping for standardized behavior classification.
XGBoost-based multi-label classification for accurate detection.
IOC extraction and output to support security operations.

Performance evaluation shows the system achieves high accuracy (~95.6%) and strong F1-score (~0.889) compared to traditional logistic regression, demonstrating reliable malware detection and classification.

Overall, the system automates malware analysis, reduces manual effort, provides interpretable insights into attack behavior, and supports real-time cybersecurity operations with scalable, modular architecture.

Conclusion

The developed system provides an efficient method for analyzing malware using sandbox execution and behavior-based detection. It reduces manual effort and enables faster identification of malicious activities through automated log analysis and machine learning models. The system also helps in understanding attacker techniques by mapping behaviors to known attack patterns. Future improvements can focus on enhancing detection accuracy and supporting more advanced and evolving threats.

References

[1] G. Önal and M. Güven, \"Enhancing Dynamic Malware Behavior Analysis Through Novel Windows Events With Machine Learning,\" in IEEE Access, vol. 13, pp. 153937-153958, 2025, doi: 10.1109/ACCESS.2025.3604979. [2] R. Elnaggar, L. Servadei, S. Mathur, R. Wille, W. Ecker, and K. Chakrabarty, ‘‘Accurate and robust malware detection: Running XGBoost on runtime data from performance counters,’’ IEEE Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 41, no. 7, pp. 2066–2079, Jul. 2022, doi: 10.1109/TCAD.2021.3102007. [3] Z. Zhang, P. Qi, and W. Wang, ‘‘Dynamic malware analysis with feature engineering and feature learning,’’ in Proc. AAAI Conf. Artif. Intell., Apr. 2020, vol. 34, no. 1, pp. 1210–1217, doi: 10.1609/aaai.v34i01.5474. [4] A. Pekta? and T. Acarman, ‘‘Classification of malware families based on runtime behaviors,’’ J. Inf. Secur. Appl., vol. 37, pp. 91–100, Dec. 2017, doi: 10.1016/j.jisa.2017.10.005. [5] K. Berlin, D. Slater, and J. Saxe, ‘‘Malicious behavior detection using windows audit logs,’’ in Proc. 8th ACM Workshop Artif. Intell. Secur., Oct. 2015, pp. 35–44, doi: 10.1145/2808769.2808773. [6] R. M. S. Priya, P. K. R. Maddikunta, P. M., S. Koppu, T. R. Gadekallu, C. L. Chowdhary, and M. Alazab, ‘‘An effective feature engineering for DNN using hybrid PCA-GWO for intrusion detection in IoMT architecture,’’ Comput. Commun., vol. 160, pp. 139–149, Jul. 2020, doi: 10.1016/j.comcom.2020.05.048. [7] M. A. Hossain, M. A. Haque, S. Ahmad, H. A. M. Abdeljaber, A. E. M. Eljialy, A. Alanazi, D. Sonal, K. Chaudhary, and J. Nazeer, ‘‘AIenabled approach for enhancing obfuscated malware detection: A hybrid ensemble learning with combined feature selection techniques,’’ Int. J. Syst. Assurance Eng. Manage., Mar. 2024, doi: 10.1007/s13198-024- 02294-y. [8] P. S. Nguyen, T. N. Huy, T. A. Tuan, P. D. Trung, and H. V. Long, ‘‘Hybrid feature extraction and integrated deep learning for cloud-based malware detection,’’ Comput. Secur., vol. 150, Mar. 2025, Art. no. 104233, doi: 10.1016/j.cose.2024.104233. [9] J. Busch, A. Kocheturov, V. Tresp, and T. Seidl, ‘‘NF-GNN: Network flow graph neural networks for malware detection and classification,’’ in Proc. 33rd Int. Conf. Sci. Stat. Database Manage., New York, NY, USA, Jul. 2021, pp. 121–132, doi: 10.1145/3468791.3468814. [10] E. G. Onyedinma, A. Doris C, and I. E. Onyenwe, ‘‘Towards resilient malware detection:Ahybrid framework leveraging static-dynamic features and ensemble models,’’ World J. Adv. Eng. Technol. Sci., vol. 15, no. 3, pp. 634–639, Jun. 2025, doi: 10.30574/wjaets.2025.15.3.0901.

Copyright

Copyright © 2026 Dr. J. S. Kanchana, Monesh B., Amirthaganesh S., Visish B.. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET78798

Publish Date : 2026-03-26

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here