Intelligent Malware Classification Using PE File Metadata and Machine Learning Techniques

Authors: J Cypto, G Srikanth, K Surya Prakash, B Gunal

DOI Link: https://doi.org/10.22214/ijraset.2025.69729

Abstract

This study presents a machine learning-based approach to enhance malware detection by analyzing structural and statistical features extracted from Portable Executable (PE) files. Utilizing the ClaMP_Integrated-5184.csv dataset—which includes metadata from PE headers, entropy values, and packer-related information—the research aims to distinguish between benign and malicious software effectively. Traditional signature-based detection methods often fail to detect modern threats due to evasion techniques like obfuscation and polymorphism. In contrast, machine learning offers a more adaptive and intelligent solution. This work focuses on feature selection, model training, and performance evaluation using various machine learning algorithms. The results demonstrate that these techniques can significantly improve the accuracy and reliability of malware classification, highlighting their potential in advancing cybersecurity defenses.

Introduction

The text discusses advanced machine learning (ML) and deep learning techniques for detecting malware and cyber threats, focusing on Android malware, network intrusions, phishing, and zero-day attacks. It highlights the limitations of traditional signature-based detection methods in the face of evolving and obfuscated malware. Various studies propose innovative approaches, including federated learning for privacy preservation, adaptive incremental learning to handle concept drift, hardware-assisted detection using hardware performance counters, and hybrid models combining heuristic and ML techniques.

Multiple ML algorithms—Random Forest, XGBoost, CNN-LSTM, SVM, ensemble methods, and reinforcement learning—are evaluated across different datasets, showing high accuracy (often above 90%) and emphasizing the strengths of deep learning and ensemble learning in malware classification.

The proposed solution in the text introduces a robust, scalable, and modular malware detection framework using comprehensive feature preprocessing, dimensionality reduction (e.g., LDA), and a diverse set of classifiers combined through voting and stacking ensembles. It emphasizes automated hyperparameter tuning and cross-validation to optimize performance and generalization. The framework is designed to be resistant to obfuscation techniques, computationally efficient, and adaptable to emerging threats.

Finally, deployment considerations include real-time readiness, model update mechanisms to counter new malware variants, and monitoring systems for continuous performance evaluation, ensuring the solution remains effective in dynamic cybersecurity environments.

Conclusion

This research highlights the effectiveness of a machine learning-based approach in detecting malware by analyzing the structural and statistical features of Portable Executable (PE) files. By using a comprehensive dataset with entropy values, PE header attributes, and packer-related information, the proposed models successfully identified malicious software with high accuracy and reliability. Ensemble models such as Random Forest and Extra Trees demonstrated superior performance due to their ability to manage high-dimensional and complex feature interactions.Entropy-based and packer-related features played a key role in detecting disguised or encrypted threats.Moreover, the system exhibits strong potential for integration into real-world applications such as endpoint protection, cloud-based threat intelligence, and malware triaging systems.Overall, this study presents a scalable and adaptive solution that significantly improves malware detection in modern cybersecurity environments.

References

[1] B. S. Purkayastha, M. M. Rahman and M. Shahpasand, \"Android Malware Detection Using Machine Learning and Neural Network: A Hybrid Approach with Federated Learning,\" 2024 7th International Conference on Advanced Communication Technologies and Networking (CommNet), Rabat, Morocco, 2024, pp. 1-5, doi: 10.1109/CommNet63022.2024.10793304. [2] A. A. Darem, F. A. Ghaleb, A. A. Al-Hashmi, J. H. Abawajy, S. M. Alanazi and A. Y. Al-Rezami, \"An Adaptive Behavioral-Based Incremental Batch Learning Malware Variants Detection Model Using Concept Drift Detection and Sequential Deep Learning,\" in IEEE Access, vol. 9, pp. 97180-97196, 2021, doi: 10.1109/ACCESS.2021.3093366. [3] B. Bokolo, R. Jinad and Q. Liu, \"A Comparison Study to Detect Malware using Deep Learning and Machine learning Techniques,\" 2023 IEEE 6th International Conference on Big Data and Artificial Intelligence (BDAI), Jiaxing, China, 2023, pp. 1-6, doi: 10.1109/BDAI59165.2023.10256957. [4] Z. He, H. M. Makrani, S. Rafatirad, H. Homayoun and H. Sayadi, \"Breakthrough to Adaptive and Cost-Aware Hardware-Assisted Zero-Day Malware Detection: A Reinforcement Learning-Based Approach,\" 2022 IEEE 40th International Conference on Computer Design (ICCD), Olympic Valley, CA, USA, 2022, pp. 231-238, doi: 10.1109/ICCD56317.2022.00042. [5] M. Iwabuchi and A. Nakamura, \"A Heuristics and Machine Learning Hybrid Approach to Adaptive Cyberattack Detection,\" 2024 International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA), Victoria, Seychelles, 2024, pp. 1-7, doi: 10.1109/ACDSA59508.2024.10467929. [6] Y. Gao et al., \"Adaptive-HMD: Accurate and Cost-Efficient Machine Learning-Driven Malware Detection using Microarchitectural Events,\" 2021 IEEE 27th International Symposium on On-Line Testing and Robust System Design (IOLTS), Torino, Italy, 2021, pp. 1-7, doi: 10.1109/IOLTS52814.2021.9486701. [7] D. Kundra, \"Identification and Classification of Malicious and Benign URL using Machine Learning Classifiers,\" 2023 7th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Kirtipur, Nepal, 2023, pp. 160-165, doi: 10.1109/I-SMAC58438.2023.10290303. [8] R. S, P. M, R. P S, S. A. S and M. K. B, \"Dynamic Algorithmic Configuration for Enhanced Malware Detection,\" 2025 3rd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), Bengaluru, India, 2025, pp. 493-496, doi: 10.1109/IDCIOT64235.2025.10914737. [9] R. S, P. M, R. P S, S. A. S and M. K. B, \"Dynamic Algorithmic Configuration for Enhanced Malware Detection,\" 2025 3rd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), Bengaluru, India, 2025, pp. 493-496, doi: 10.1109/IDCIOT64235.2025.10914737. [10] B. Hariharan, R. Siva, S. Sadagopan, V. Mishra and Y. Raghav, \"Malware Detection Using XGBoost based Machine Learning Models - Review,\" 2023 2nd International Conference on Edge Computing and Applications (ICECAA), Namakkal, India, 2023, pp. 964-970, doi: 10.1109/ICECAA58104.2023.10212327. [11] A. Sharma and H. Babbar, \"Detecting Cyber Threats in Real-Time: A Supervised Learning Perspective on the CTU-13 Dataset,\" 2024 5th International Conference for Emerging Technology (INCET), Belgaum, India, 2024, pp. 1-5, doi: 10.1109/INCET61516.2024.10593100. [12] A. Hassan, S. Tahir and A. I. Baig, \"Unsupervised Machine Learning for Malicious Network Activities,\" 2019 International Conference on Applied and Engineering Mathematics (ICAEM), Taxila, Pakistan, 2019, pp. 151-156, doi: 10.1109/ICAEM.2019.8853788. [13] A. Kumar, J. B. Simha and R. Agarwal, \"Machine Learning-Based Web Application Firewall for Real-Time Threat Detection,\" 2024 IEEE Conference on Engineering Informatics (ICEI), Melbourne, Australia, 2024, pp. 1-8, doi: 10.1109/ICEI64305.2024.10912239. [14] M. A. Syafiq Rohmat Rose, N. Basir, N. F. Nabila Rafie Heng, N. Juana Mohd Zaizi and M. M. Saudi, \"Phishing Detection and Prevention using Chrome Extension,\" 2022 10th International Symposium on Digital Forensics and Security (ISDFS), Istanbul, Turkey, 2022, pp. 1-6, doi: 10.1109/ISDFS55398.2022.9800826. [15] B. E. Amanfu and G. Ramaiah Yeluripati, \"A Comparative Analysis and Evaluation of Machine Learning Algorithms for Malware Detection,\" 2024 IEEE 9th International Conference on Adaptive Science and Technology (ICAST), Accra, Ghana, 2024, pp. 1-7, doi: 10.1109/ICAST61769.2024.10856508. [16] A. Balaram, E. Umashankari, A. Dutt, G. Bharadwaj, R. V and A. Albawi, \"Addressing the Rising Challenge of Malware Innovative Detection and Mitigation Techniques,\" 2024 International Conference on Communication, Computer Sciences and Engineering (IC3SE), Gautam Buddha Nagar, India, 2024, pp. 1165-1166, doi: 10.1109/IC3SE62002.2024.10593567. [17] N. H. S and S. K, \"The Cutting-Edge Machine Learning Techniques for Seamless and Proactive Automation in Cybersecuri ty,\" 2024 International Conference on Computing and Data Science (ICCDS), Chennai, India, 2024, pp. 1-6, doi: 10.1109/ICCDS60734.2024.10560436. [18] M. A. Adaji et al., \"Effectiveness of Machine Learning Algorithms in Threat Detection and Mitigation in Cyberspace: A Systematic Review,\" 2024 IEEE 5th International Conference on Electro-Computing Technologies for Humanity (NIGERCON), Ado Ekiti, Nigeria, 2024, pp. 1-14, doi: 10.1109/NIGERCON62786.2024.10927069.

Copyright

Copyright © 2025 J Cypto, G Srikanth, K Surya Prakash, B Gunal . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET69729

Publish Date : 2025-04-25

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here