Intrusion Detection System: Comparative Analysis of Supervised and Unsupervised Techniques

Authors: Anshita Singh, Manish Choudhary, Abhishek Singh

DOI Link: https://doi.org/10.22214/ijraset.2025.68447

Abstract

With advancements in technology, the rapid growth in cybercrimes poses crucial challenges to maintaining the security and integrity of computer networks. Signature-based techniques and predefined rules are the traditional methods for Intrusion Detection Systems, which are inadequate for handling emerging cybercrimes. This paper presents a comparative analysis of supervised and unsupervised machine learning techniques in Intrusion Detection Systems. Various machine learning models, such as supervised and unsupervised learning, are used to review the limitations of traditional IDS approaches. Overcoming these challenges in conventional Intrusion Detection Systems is the key concept behind this paper. In thesupervised learning method we have used Support Vector Machine, Decision Tree and Random Forest and forunsupervised learning algorithm in intrusion detection, we use Principal Component Analysis, K-Means method and DBSCAN.This work discusses the implications of adopting machine learning in intrusion detection and suggests potential areas for future research, such as the integration of deep learning techniques and the development of an adaptive intrusion detection system that evolves with emerging threats.The outcomes are meant to help in the development of more sophisticated and effective intrusion detection systems that are capable of handling novel cyber-attacks.

Introduction

In the digital age, as internet connectivity grows, cyber-attacks pose significant threats to personal and organizational data. Traditional cybersecurity measures like firewalls and antivirus software are insufficient against sophisticated, modern attacks. This has led to the development of automated security systems, particularly Machine Learning (ML)-based Intrusion Detection Systems (IDS), to detect and respond to cyber threats more effectively.

Intrusion Detection Systems (IDS) Overview

IDS monitor network traffic and detect suspicious activities to protect information systems.
Two main types:
- Signature-based IDS: Detects known attacks using predefined patterns but fails with unknown threats.
- Anomaly-based IDS: Detects deviations from normal behavior, making it effective against zero-day attacks.

Machine Learning in IDS

ML enables IDS to analyze vast amounts of network data, recognize patterns, and distinguish between normal and malicious activity. The study evaluates both:

Supervised Learning: Requires labeled data for training.
Unsupervised Learning: Works with unlabeled data to find patterns or anomalies.

Supervised Learning Techniques

Support Vector Machine (SVM):
- Good for classification in high-dimensional spaces.
- Accurate with small datasets but sensitive to noise and computationally intensive for large data.
Decision Tree (DT):
- Simple, interpretable model.
- Effective for real-time detection and large datasets, but prone to overfitting.
Random Forest (RF):
- Ensemble of decision trees, offering higher accuracy and resistance to overfitting.
- Requires more processing power but performs well with complex datasets.

Unsupervised Learning Techniques

K-Means Clustering:
- Groups similar data, flags anomalies by identifying outliers.
- Fast and efficient but assumes fixed cluster structure and struggles with novel attacks.
DBSCAN (Density-Based Spatial Clustering of Applications with Noise):
- Identifies dense clusters and labels outliers as anomalies.
- Effective for irregular threats and does not require predefining clusters but struggles with high-dimensional data.
Principal Component Analysis (PCA):
- Reduces dimensionality while preserving important features.
- Enhances efficiency when combined with other models but is less effective with non-linear data.

Results & Discussion

Supervised Methods:
- Random Forest shows the best overall performance in accuracy and generalization.
- Decision Trees excel in speed and usability but may overfit.
- SVM performs well with structured data but lacks scalability.
Unsupervised Methods:
- K-Means is useful for pattern recognition but may miss rare or new attacks.
- PCA aids in simplifying datasets and enhances classifier performance.
- DBSCAN is ideal for anomaly detection but has computational limits.

Conclusion

The review work aimed to explore and compare the effectiveness of supervised and unsupervised machine learning techniques in Intrusion Detection Systems (IDS). By analyzing different approaches, this study has shown that both supervised and unsupervised methods offer unique strengths. Support Vector Machines (SVM) are supervised learning techniques that perform well with small labeled datasets, while Decision Tree and Random Forest methods produce better results with large training samples Although these supervised learning methods are effective at recognizing known threats, they might not be as efficient against zero-day attacks. In contrast, unsupervised methods like K-Means, Principal Component Analysis (PCA), and DBSCAN are effective for detecting unusual patterns and anomalies that may indicate unknown or evolving threats. These methods do not require labeled data, making them particularly useful in dynamic network environments where labeling every instance is impractical. However, each technique also presents its own limitations. The reliability of supervised algorithms decreases significantly if unknown attacks are present in the test data, as supervised methods rely mostly on labeled datasets. Unsupervised methods, while adaptable, often yield higher false positive rates, necessitating additional filtering or combination with other methods to enhance accuracy. Overall, this comparative analysis underscores the importance of selecting effective machine learning algorithms based on the specific requirements and limitations of the security environment. With the goal to cope with the increasing requirements in modern cyber security, future research may focus on either enhancing hybrid models or developing new techniques that mitigate the limitations the negative of strictly supervised or unsupervised approaches.

References

[1] Cyber Security Threats and Countermeasures in Digital Age. Journal of Applied Science and Engineering, vol. 4, no. 1 (2024): 1–20. https://doi.org/10.54060/a2zjournals.jase.42. [2] Saeed, S., S. A. Altamimi, N. A. Alkayyal, E. Alshehri, and D. A. Alabbad. “Digital Transformation and Cybersecurity Challenges for Businesses Resilience: Issues and Recommendations.” Sensors 23, no. 15 (2023): 1–20. https://doi.org/10.3390/s23156666. [3] Neupane, K., R. Haddad, and L. Chen. “Next Generation Firewall for Network Security: A Survey.” Conference Proceedings - IEEE SOUTHEASTCON (2018): 1–6. https://doi.org/10.1109/SECON.2018.8478973. [4] Sarker, I. H. “Machine Learning for Intelligent Data Analysis and Automation in Cybersecurity: Current and Future Prospects.” Annals of Data Science 10, no. 6 (2023): 1473–1498. https://doi.org/10.1007/s40745-022-00444-2. [5] Lunt, T. F. “A Survey of Intrusion Detection Techniques.” Computers & Security 12, no. 4 (1993): 405–418. https://doi.org/10.1016/0167-4048(93)90029-5. [6] Ozkan-Okay, M., R. Samet, O. Aslan, and D. Gupta. “A Comprehensive Systematic Literature Review on Intrusion Detection Systems.” IEEE Access 9 (2021): 157727–157760. https://doi.org/10.1109/ACCESS.2021.3129336. [7] Rama Devi, R., and M. Abualkibash. “Intrusion Detection System Classification Using Different Machine Learning Algorithms on KDD-99 and NSL-KDD Datasets - A Review Paper.” International Journal of Computer Science and Information Technology 11, no. 03 (2019): 65–80. https://doi.org/10.5121/ijcsit.2019.11306. [8] García-Teodoro, P., J. Díaz-Verdejo, G. Maciá-Fernández, and E. Vázquez. “Anomaly-Based Network Intrusion Detection: Techniques, Systems and Challenges.” Computers & Security 28, no. 1–2 (2009): 18–28. https://doi.org/10.1016/j.cose.2008.08.003. [9] Buczak, A. L., and E. Guven. “A Survey of Data Mining and Machine Learning Methods for Cyber Security Intrusion Detection.” IEEE Communications Surveys & Tutorials 18, no. 2 (2016): 1153–1176. https://doi.org/10.1109/COMST.2015.2494502. [10] Einy, S., C. Oz, and Y. D. Navaei. “The Anomaly- And Signature-Based IDS for Network Security Using Hybrid Inference Systems.” Mathematical Problems in Engineering 2021 (2021). https://doi.org/10.1155/2021/6639714. [11] Samrin, R., and D. Vasumathi. “Review on Anomaly Based Network Intrusion Detection System.” International Conference on Electrical, Electronics, Communications, and Computing Technologies Optimization Techniques (ICEECCOT) 2017, vol. 2018 (2017): 141–147. https://doi.org/10.1109/ICEECCOT.2017.8284655. [12] Abdallah, E. E., W. Eleisah, and A. F. Otoom. “Intrusion Detection Systems Using Supervised Machine Learning Techniques: A Survey.” Procedia Computer Science 201, no. C (2022): 205–212. https://doi.org/10.1016/j.procs.2022.03.029. [13] Mukkamala, S., G. Janoski, and A. Sung. “Intrusion Detection Using Neural Networks and Support Vector Machines.” Proceedings of the International Joint Conference on Neural Networks 2 (2002): 1702–1707. https://doi.org/10.1109/ijcnn.2002.1007774. [14] Ahmad, I., M. Basheri, M. J. Iqbal, and A. Rahim. “Performance Comparison of Support Vector Machine, Random Forest, and Extreme Learning Machine for Intrusion Detection.” IEEE Access 6, no. c (2018): 33789–33795. https://doi.org/10.1109/ACCESS.2018.2841987. [15] Johnson. “The Journal of Computing Sciences in Colleges.” Journal of Computing Sciences in Colleges 34, no. 3 (2019): 1–120. [16] Liu, H., and B. Lang. “Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey.” MDPI AG (2019): https://doi.org/10.3390/app9204396. [17] Parameswari, D., and V. Khanaa. “Intrusion Detection System Using Modified J48 Decision Tree Algorithm.” Journal of Critical Reviews 7, no. 4 (2020): 730–734. https://doi.org/10.31838/jcr.07.04.135. [18] Rai, K., M. S. Devi, and A. Guleria. “Decision Tree Based Algorithm for Intrusion Detection.” International Journal of Advanced Networking Applications 7, no. 4 (2016): 2828–2834. https://www.researchgate.net/publication/298175900. [19] Ahmad, Z., A. Shahid Khan, C. Wai Shiang, J. Abdullah, and F. Ahmad. “Network Intrusion Detection System: A Systematic Study of Machine Learning and Deep Learning Approaches.” Transactions on Emerging Telecommunications Technologies 32, no. 1 (2021): https://doi.org/10.1002/ett.4150. [20] Wang, J., Q. Yang, and D. Ren. “An Intrusion Detection Algorithm Based on Decision Tree Technology.” Proceedings of the 2009 Asia-Pacific Conference on Information Processing 2 (2009): 333–335. https://doi.org/10.1109/APCIP.2009.218. [21] Relan, N. G., and P. G. Student. “13=6-TREE.pdf.” (2015): 3–7. [22] Peddabachigari, S., A. Abraham, and J. Thomas. “Intrusion Detection Systems Using Decision Trees and Support Vector Machines.” (2004). [23] Liu, L., P. Wang, J. Lin, and L. Liu. “Intrusion Detection of Imbalanced Network Traffic Based on Machine Learning and Deep Learning.” IEEE Access 9 (2021): 7550–7563. https://doi.org/10.1109/ACCESS.2020.3048198. [24] Resende, P. A. A., and A. C. Drummond. “A Survey of Random Forest-Based Methods for Intrusion Detection Systems.” ACM Computing Surveys 51, no. 3 (2018): https://doi.org/10.1145/3178582. [25] Soheily-Khah, S., P. F. Marteau, and N. Bechet. “Intrusion Detection in Network Systems through Hybrid Supervised and Unsupervised Machine Learning Process: A Case Study on the ISCX Dataset.” Proceedings of the 2018 1st International Conference on Data Intelligence and Security (2018): 219–226. https://doi.org/10.1109/ICDIS.2018.00043. [26] Farnaaz, N., and M. A. Jabbar. “Random Forest Modeling for Network Intrusion Detection System.” Procedia Computer Science 89 (2016): 213–217. https://doi.org/10.1016/j.procs.2016.06.047. [27] Aung, Y. Y., and M. M. Min. “An Analysis of Random Forest Algorithm-Based Network Intrusion Detection System.” Proceedings of the 18th IEEE/ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (2017): 127–132. https://doi.org/10.1109/SNPD.2017.8022711. [28] Ahanger, A. S., S. M. Khan, and F. Masoodi. “An Effective Intrusion Detection System Using Supervised Machine Learning Techniques.” Proceedings of the 5th International Conference on Computer Methodologies and Communications (2021): 1639–1644. https://doi.org/10.1109/ICCMC51019.2021.9418291. [29] Aung, Y. Y., and M. M. Min. “An Analysis of K-means Algorithm-Based Network Intrusion Detection System.” Advances in Science, Technology and Engineering Systems 3, no. 1 (2018): 496–501. https://doi.org/10.25046/aj030160. [30] Khaddor, M. A., and B. Al-Khattib. “Intrusion Detection Systems Using K-Means and Random Forest Algorithms.” International Journal of Science and Engineering Research 11, no. 9 (2020): 217–224. http://www.ijser.org. [31] Saranya, T., S. Sridevi, C. Deisy, T. D. Chung, and M. K. A. A. Khan. “Performance Analysis of Machine Learning Algorithms in Intrusion Detection System: A Review.” Procedia Computer Science 171 (2020): 1251–1260. https://doi.org/10.1016/j.procs.2020.04.133. [32] Jianliang, M., S. Haikun, and B. Ling. “The Application of Intrusion Detection Based on K-means Cluster Algorithm.” Proceedings of the 2009 International Forum on Information Technology and Applications 1 (2009): 150–152. https://doi.org/10.1109/IFITA.2009.34. [33] Dumoulin, J., et al. “UNICITY: A Depth Maps Database for People Detection in Security Airlocks.” Proceedings of AVSS 2018 - 2018 15th IEEE International Conference on Advanced Video and Signal-Based Surveillance (2018): https://doi.org/10.1109/AVSS.2018.8639152. [34] Laskov, P., D. Patrick, and C. Sch. “Learning Intrusion Detection: Supervised or Unsupervised?” (2014): 50–57. [35] Deng, D. “Research on Anomaly Detection Method Based on DBSCAN Clustering Algorithm.” Proceedings of the 2020 5th International Conference on Information Science, Computing Technology, and Transportation (2020): 439–442. https://doi.org/10.1109/ISCTT51595.2020.00083. [36] Li, X. Y., G. H. Gao, and J. X. Sun. “A New Intrusion Detection Method Based on Improved DBSCAN.” Proceedings of the 2010 WASE International Conference on Information Engineering 2 (2010): 117–120. https://doi.org/10.1109/ICIE.2010.123. [37] Jain, P., M. S. Bajpai, and R. Pamula. “A Modified DBSCAN Algorithm for Anomaly Detection in Time-Series Data with Seasonality.” Journal of Data Science 19, no. 1 (2022): 23–28. [38] Abdulhammed, R., H. Musafer, A. Alessa, M. Faezipour, and A. Abuzneid. “Features Dimensionality Reduction Approaches for Machine Learning-Based Network Intrusion Detection.” Electronics 8, no. 3 (2019): https://doi.org/10.3390/electronics8030322. [39] Praneeth, N. S. K. H., N. M. Varma, and R. R. Naik. “Principal Component Analysis-Based Intrusion Detection System Using Support Vector Machine.” 2016 IEEE International Conference on Recent Trends in Electronics, Information, and Communication Technologies (2017): 1344–1350. https://doi.org/10.1109/RTEICT.2016.7808050. [40] Wang, W., and R. Battiti. “Identifying Intrusions in Computer Networks with Principal Component Analysis.” Proceedings of the First International Conference on Availability, Reliability and Security (2006): 272–279. https://doi.org/10.1109/ARES.2006.73. [41] Waskle, S., L. Parashar, and U. Singh. “Intrusion Detection System Using PCA with Random Forest Approach.” Proceedings of the International Conference on Electronics and Sustainable Communication Systems (ICESC) (2020): 803–808. https://doi.org/10.1109/ICESC48915.2020.9155656.

Copyright

Copyright © 2025 Anshita Singh, Manish Choudhary, Abhishek Singh. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET68447

Publish Date : 2025-04-07

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here