Cyber Threat Detection Using Machine Learning: A Performance Evaluation Approach

Authors: Dr. Levina Tukaram, Khadanand Chaudhary, Nitesh Jha, Bibek Pamgeni, Vision Gaihre

DOI Link: https://doi.org/10.22214/ijraset.2025.73531

Abstract

Now a day world has come all dependent on cyberspace for every aspect of daily living. The use of cyberspace is increasing with each day by day. The world is spending most of the time on the Internet than ever ahead. As a result, the pitfalls of cyber pitfalls and cybercrimes are increasing day by day. The term\' cyber trouble\' is applicable to as the illegal exertion performed using the Internet. Cybercriminals are changing their ways with time to pass through the wall of protection. Conventional ways are not able of detecting zero- day attacks and sophisticated attacks. There fore, far, stacks of machine literacy ways have been developed to descry the cybercrimes and battle against cyber pitfalls. The ideal of this exploration work is to present the evaluation of some of the extensively used machine literacy ways used to descry some of the most threatening cyber pitfalls to the cyberspace. Three primary machine literacy ways are substantially delved, including deep belief network, decision tree and support vector machine. We’ve presented a brief disquisition to gauge the performance of these machine literacy ways in the spam discovery, intrusion discovery and malware discovery grounded on constantly used and standard datasets.

Introduction

Cyber threat detection is critical for identifying harmful activities in systems and networks before damage occurs. Traditional rule-based detection systems are limited:

Can’t detect unknown or zero-day attacks
Have high false positive rates
Lack adaptability to new threats

2. Literature Survey

Key research contributions include:

Tavallaee et al. (2009): Introduced the improved NSL-KDD dataset
Amor et al. (2004): Compared Decision Trees, SVM, Naïve Bayes
Shone et al. (2018): Hybrid deep learning with autoencoders and random forests
Vinayakumar et al. (2019): Used CNNs for network intrusion detection
Kumar & Singhal (2021): Applied ensemble methods like Random Forest + XGBoost

3. Problem Definition

Traditional methods fail to detect novel threats
High false alarm rates
Lack of learning/adaptability to evolving attack patterns

4. Proposed System

A machine learning–based Intelligent Intrusion Detection System (IDS) that:

Learns from labeled network data
Detects both known and unknown threats
Is automated, scalable, and adaptive

Key Features:

Uses ML models: Random Forest, SVM, KNN, Logistic Regression
Random Forest used for final classification
Applies PCA for feature selection
Implements a Flask-based dashboard for real-time alerts

5. Dataset

NSL-KDD dataset is used for training/testing
It addresses the redundancy and imbalance issues of KDD'99

6. System Architecture

Step-by-step flow:

Collect network traffic
Preprocess data (cleaning, encoding, scaling)
Select features (IP, ports, packet size, protocol)
Train model (Random Forest, SVM, etc.)
Tune hyperparameters
Deploy and monitor in real-time
Alert generation upon detection of anomalies

7. Implementation

ML Models used: DT, SVM, KNN, ANN, Random Forest, etc.
Random Forest preferred due to:
- High accuracy
- Robustness against overfitting
- Built-in feature importance scoring

Data Preprocessing Steps:

Clean irrelevant/missing data
Encode categorical variables
Standardize features
Split dataset (80% training, 20% testing)

Feature Selection:

Helps improve accuracy, reduce overfitting, and lower training time

Model Training (Random Forest):

Initialized with 100 trees
Tuned via GridSearchCV
Trained on PCA-reduced data
Achieved accuracy: 96.78%, low error rate, and good generalization

8. Experimental Setup

Dataset: NSL-KDD (Train+ and Test+)
Cross-validation: 5-fold
Metrics: Accuracy, Precision, Recall, F1-score, ROC-AUC
Designed to scale for real-time monitoring in live environments

9. Observations

Distribution of flow duration is fairly even
Benign and malicious traffic share similar median idle time
No extreme outliers in flow duration data

Conclusion

The increasing sophistication and frequency of cyberattacks have made network security a top priority. This Paper focused on detecting cyber threats by classifying network traffic as benign or malicious using various machine learning algorithms, including Logistic Regression, Random Forest, Support Vector Machine (SVM), and K-Nearest Neighbors (KNN). Through data pre-processing and feature selection, relevant network traffic features such as flow duration, packet counts, and statistical measures were used to train and evaluate the models. Among the algorithms tested, Random Forest emerged as the most effective, offering high accuracy and robustness in classifying threats. The Paper demonstrated that machine learning models, when properly trained and tuned, can play a significant role in enhancing the early detection of cyber threats. It also emphasized the importance of using a balanced and comprehensive dataset to improve detection reliability and reduce false positives/negatives. While the results are promising, the Paper also has its limitations. Real-time implementation, adaptability to new and unknown threats, and scalability in high-speed networks remain challenges. Future improvements may include integrating deep learning models, using real-time data streams, and expanding the feature set for more accurate threat detection. In conclusion, this Paper provides a foundational approach to applying machine learning in cybersecurity and highlights the potential of data-driven models in strengthening network defense systems.

References

[1] S.Ke, \"Cyber Threat Detection Using Machine Learning Techniques: A Performance Evaluation Perspective,\" Journal of Physics: Conference Series, vol. 2113, no. 1, pp. 012074, 2021. [2] Bipin Kumar Singh, \"A Review on Machine Learning Based Cyber Threat Detection Techniques,\" Journal of Pharmaceutical Negative Results, 2024.. [3] G.Kim, S. Lee, and S. Kim, \"A Novel Hybrid Intrusion Detection Method Integrating Anomaly Detection with Misuse Detection,\" Expert Systems with Applications, vol. 41, no. 4, pp. 1690-1700, 2014. [4] I.Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, \"Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization,\" ICISSP, pp. 108-116, 2018. [5] KDD Cup 1999 Dataset: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html [6] D. Gunawan, R. F. Rahmat, A. Putra, and M. F. Pasha, \"Filtering Spam Text Messages by Using Twitter-LDA Algorithm,\" in 2018 IEEE International Conference on Communication, Networks and Satellite (Comnetsat), 2018: IEEE, pp. 1-6 [7] P. Mishra, V. Varadharajan, U. Tupakula, E. S. J. I. C. S. Pilli, and Tutorials, \"A detailed investigation and analysis of using machine learning techniques for intrusion detection,\" vol. 21, no. 1, pp. 686-728, 2018. [8] B. Ingre and A. Yadav, \"Performance analysis of NSL-KDD dataset using ANN,\" in 2015 International Conference on Signal Processing and Communication Engineering Systems, 2015: IEEE, pp. 92-96. [9] M. Pradhan, C. K. Nayak, and S. K. Pradhan, \"Intrusion Detection System (IDS) and Their Types,\" in Securing the Internet of Things: Concepts, Methodologies, Tools, and Applications: IGI Global, 2020, pp. 481-497.

Copyright

Copyright © 2025 Dr. Levina Tukaram, Khadanand Chaudhary, Nitesh Jha, Bibek Pamgeni, Vision Gaihre. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET73531

Publish Date : 2025-08-03

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here