The rapid increase in cyberattacks poses a significant threat to digital infrastructure, highlighting the need for intelligent and timely threat detection systems. This paper presents a machine learning-based approach for classifying cybersecurity threats using network traffic data. Various supervised learning models, including Logistic Regression, Random Forest, Support Vector Machine (SVM), and XGBoost, are applied and evaluated for their performance in detecting and categorizing network-based attacks. The system processes key features such as packet length, port usage, and protocol behavior to identify malicious patterns and determine attack types. A user-friendly interface is developed using Streamlit to support data upload, preprocessing, prediction, and visualization. Experimental results demonstrate that the proposed models, particularly ensemble methods, provide high accuracy in threat classification, offering an effective and scalable solution for proactive cybersecurity defense
Introduction
In today’s digital era, cyber threats are increasingly complex and frequent, challenging traditional intrusion detection systems (IDS) that rely on static rules and signatures. To overcome these challenges, integrating machine learning (ML) techniques into cybersecurity offers a more adaptive and accurate approach for detecting and classifying threats.
This paper proposes a two-stage ML-based system that uses structured network traffic features to first detect whether activity is malicious (binary classification) and then identify the specific type of attack (multiclass classification) such as DDoS, PortScan, or Ransomware. Several ML algorithms—Logistic Regression, Random Forest, Support Vector Machine (SVM), and XGBoost—are implemented and evaluated using standard metrics.
A user-friendly web interface built with Streamlit allows easy dataset upload, preprocessing, prediction, and visualization, making the system accessible for both researchers and cybersecurity professionals. The study shows that ensemble models like Random Forest and XGBoost perform best in classifying cyber threats.
The system addresses the limitations of traditional IDS by offering adaptability, detailed threat categorization, and modularity for future upgrades, such as real-time monitoring and integration with automated response tools. It is implemented in Python using widely adopted libraries, emphasizing usability, scalability, and practical deployment in real-world cyber defense scenarios.
Conclusion
This project presents a machine learning-based approach for classifying cyber threats using structured network traffic data. By implementing a two-stage classification process—first identifying whether the traffic is malicious, and then categorizing the type of attack—the system improves the accuracy and reliability of threat detection. Machine learning algorithms such as Random Forest, Support Vector Machine, Logistic Regression, and XGBoost were evaluated for performance and integrated into a Streamlit-based interface for ease of use. The results demonstrate that machine learning models can effectively enhance cybersecurity measures, offering a scalable and accessible solution for detecting a wide range of network threats.
References
[1] M. Tavallaee, E. Bagheri, W. Lu and A. A. Ghorbani, \"A Detailed Analysis of the KDD CUP 99 Data Set,\" Proceedings of the IEEE Symposium on Computational Intelligence for Security and Defense Applications (CISDA), 2009, pp. 1-6.
[2] A. Javaid, Q. Niyaz, W. Sun and M. Alam, \"A Deep Learning Approach for Network Intrusion Detection System,\" Proceedings of the 9th EAI International Conference on Bio-inspired Information and Communications Technologies, 2016, pp. 21–26.
[3] M. R. Ahmed, A. Naser Mahmoud and H. A.Mahmoud,\"Machine Learning Approaches for Detecting Cyber Attacks in IoT Systems: A Survey,\" IEEE Access, vol. 10, pp. 18478–18494, 2022.
[4] S. M. Bridges and R. B. Vaughn, \"Fuzzy data mining and genetic algorithms applied to intrusion detection,\" Proceedings of the National Information Systems Security Conference, 2000, pp. 13–31.
[5] N. Moustafa and J. Slay, \"UNSW-NB15: A Comprehensive Data Set for Network Intrusion Detection Systems,\" 2015 Military Communications and Information Systems Conference (MilCIS), Canberra, Australia, 2015, pp. 1–6.
[6] W. Wang, M. Zhu, J. Wang, X. Zeng and Z. Sheng, \"Malware Traffic Classification Using Convolutional Neural Network for Representation Learning,\" 2017 International Conference on Information Networking (ICOIN), 2017, pp. 712–717.
[7] A. L. Buczak and E. Guven, \"A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection,\" IEEE Communications Surveys & Tutorials, vol. 18, no. 2, pp. 1153–1176, 2016.
[8] H. Hindy et al., \"A Taxonomy of Network Threats and the Effect of Current Datasets on Intrusion Detection Systems,\" IEEE Access, vol. 8, pp. 104650–104675, 2020.
[9] S. Shone and Q. N. Ng, \"A Deep Learning Approach to Network Intrusion Detection,\" IEEE Transactions on Emerging Topics in Computational Intelligence, vol. 2, no. 1, pp. 41–50, 2018.
[10] N. Doshi, D. Apte, and A. Merchant, \"Intrusion Detection Using Machine Learning: A Comparative Study,\" 2019 International Conference on Advances in Computing, Communication and Control (ICAC3), 2019, pp. 1–6.
[11] T. T. Nguyen and G. Armitage, \"A Survey of Techniques for Internet Traffic Classification using Machine Learning,\" IEEE Communications Surveys & Tutorials, vol. 10, no. 4, pp. 56–76, 2008.