Authors: Arun Pandey, Dr. Shishir Sharma
Certificate: View Certificate
The swift development of computers technologies changed the way data and information were kept. The risk of this data being exposed to uninvited and unauthorized users arises with this new paradigm of data access. Numerous systems have been created that examine data to look for deviations from a user\'s or system\'s typical behavior or look for a known signature in the data. Intrusion Detection Systems (IDS) is the name given to these systems. These systems use a variety of approaches, including machine learning algorithms and statistical methodologies. With the massive rise in the use of network-based services and information sharing on networks, network security has emerged as the fundamental component. The integrity, confidentiality, and availability of computer and network resources are all seriously compromised by intrusion, which also poses a severe risk to network security. Network audit data classification by humans is a costly, time-consuming, and laborious task. An intrusion detection system (IDS) is one tool used to find anomalies and attacks on a network. The network intrusion detection system has made extensive use of data mining techniques to extract valuable information from vast amounts of network data. This work proposes a hybrid model that combines two distinct intrusion detection techniques: anomaly-based and signature-based. The model is separated into two stages. Systems for detecting intrusions make use of audit data produced by network devices, operating systems, and application software. These sources generate enormous databases that contain tens of millions of records. Data mining, which is the process of extracting meaningful patterns from a sizable amount of information, is used to analyze this data. The presented paper deals with the role and the applicability of data mining techniques in designing and developing the IDS Systems.
I. INTRODUCTION TO INTRUSION DETECTION SYSTEM
Due to the rapid expansion of networked computer resources in recent years, numerous network-based apps have been created to offer services in a range of sectors, including social media, banking, government, e-commerce, and so on. Unauthorized activity has increased as a result of more machines being networked, both from external and inside threats, such as persons getting unprivileged access for personal gain . Unauthorized intrusions into computer networks and systems are detected by intrusion detection systems, or IDS. Incidents can involve malware attacks (such worms or viruses), attackers accessing the system without authorization via the Internet, or users acquiring unprivileged root access to the system without authorization. Similar to a network sniffer, an IDS gathers network log data and keeps an eye on a computer system's network activity. An intrusion detection model or approach is used to analyze the gathered network data and look for rule violations. The IDS sounds an alarm to notify the network administrator of any rule violations. Fig. 1 demonstrates a generic model of Intrusion Detection System.
Intrusion is another term for malicious online activity. Any behavior that contravenes the network's security rules is considered an incursion . In order to fill in the gaps left by firewalls and antivirus programmes, intrusion detection systems (IDS) are hardware and software that are used to detect unauthorized use of, or attacks against, computers or telecommunications networks. Monitoring and analyzing user and system activity, auditing system configuration and vulnerabilities, evaluating the integrity of important system and data files, statistically analyzing activity patterns based on comparison with known attacks, analyzing anomalous activity, and conducting system audits are all capabilities of an intrusion detection system (IDS) . One benefit of the IDS is its capacity to record an incursion or threat to an organization, which gives system logs the basis for educating the public about the most recent assault patterns.
II. WHAT DATA MINING IS
The practice of obtaining knowledge or insights from massive amounts of data through a variety of statistical and computational techniques is known as data mining. The data can be kept in a variety of formats, including databases, data warehouses, and data lakes. It can also be semi-structured, unstructured, or structured. Finding hidden patterns and relationships in the data that may be utilized to generate forecasts or well-informed judgements is the main objective of data mining. This entails analyzing the data using a variety of methods, including anomaly detection, regression analysis, association rule mining, clustering, and classification.
Numerous industries, including marketing, banking, network security, healthcare, and telecommunications, have used data mining extensively. For instance, data mining in marketing can be used to pinpoint target audiences for advertising campaigns, and in healthcare, it can be utilised to pinpoint illness risk factors and create individualized treatment regimens. Similarly, data mining provide an excellent way to detect and protect intrusion in a computer network. IDS based on data mining is capable of effectively locating this user-interested data and forecasting outcomes for future usage. Both the IT industry and general public have shown a tremendous deal of interest in data mining, or knowledge discovery in databases. In order to extract meaningful information from massive amounts of noisy, erratic, and dynamic data, data mining has been used. It is positioned in the centre of the network to collect all incoming packets that are sent across it. After being gathered, the data are sent for pre-processing to eliminate noise and replace any missing or irrelevant features. Subsequently, the preprocessed data undergo analysis and classification based on their severity metrics. If the record is normal, then no further changes are needed; if not, a report is generated with warnings. Alarms are set off based on the data's condition so that the administrator can take proactive measures.
A. Stages involved in Data Mining Process
There are some stages applicable during the data mining. These stages are:
B. Techniques Applied in Data Mining
Data mining applies certain techniques as per the requirements of system or model. These data mining techniques are:
III. DATA MINING IN IDS
For intrusion detection, a wide range of data mining approaches are available, each with a unique set of benefits. However, it depends largely on the nature or motive of the intrusion detection models that is to be developed.
Finding security flaws in information systems is the aim of intrusion detection. Since intrusion detection keeps an eye on information systems and sounds an alarm when security breaches are found, it is a passive method of security. The misuse of privileges or the exploitation of attacks to take advantage of flaws in software or protocols are two examples of security breaches. The two main categories into which intrusion detection systems are traditionally divided are anomaly detection and misuse detection. The process of misuse detection involves looking for signs or patterns of widely recognized assaults . Evidently, that method of detection is limited to known attacks that leave distinctive evidence. In contrast, anomaly detection makes use of a model of typical user or system behaviour and marks notable departures from this model as possibly harmful. The term "user or system profile" refers to this representation of typical user or system behaviour. The capacity of anomaly detection to identify as-yet-unknown assaults is one of its strengths. Furthermore, there are many classifications for intrusion detection systems (IDSs) based on the types of input data they examine. This results in the separation of network-based and host-based intrusion detection systems. IDSs that are based on hosts examine audit sources that are specific to the host, like application logs, system logs, and operating system audit trails. Network-captured network packets are analyzed by network-based intrusion detection systems . Data mining is the process of sifting through data to find trends and build connections. The following are some data mining parameters: forecasting, classification, association, sequence analysis and clustering. A data preparation phase is one of the most crucial components of any data mining system. Approximately eighty percent of a typical real-world time is spent on data preparation . An attempt at data mining. Inadequate data quality can result in absurd data mining findings that need to be rejected. The selection, assessment, cleaning, enrichment, and modification of the data are all included in data preprocessing. This brain modeling is a methodical approach to creating mechanical solutions. Compared to its more accustomed rivals, this new arrival style to computing also offers a more gradual decline during system overload. An interconnected collection of artificial neurons known as a neural network processes information using a mathematical or computational model that uses a connection approach to computing.
Although a neural network cannot initially be trained with domain knowledge, it can be trained to make decisions by mapping sample pairs of input data into sample output vectors and estimating its weights to approximate each input instance vector to the corresponding output example vector (Hecht-Nielsen, 1988).
IV. APPLICATIONS OF DATA MINING TECHNIQUES IN IDS SYSTEMS
High security controls are necessary with modern network technologies to guarantee secure and reliable information exchange between a client and user. The purpose of an IDS System is to safeguard the system in the event that conventional technologies fail. The process of extracting relevant information from a vast amount of data is known as data mining. Both supervised and unsupervised learning techniques are supported. Since intrusion detection is essentially a data-centric process , data mining techniques can help IDS identify anomalous activity, learn from previous incursions, and enhance performance through experience. By enhancing segmentation, it assists in examining the substantial growth in the database and collects only reliable information, enabling organizations to make real-time plans and save time. It can be used for a number of purposes, including identifying suspicious activity, fraud and abuse, terrorist activity, and lying detection in criminal investigations .
For the IDS Systems, the following applications of data mining techniques can be discussed.
A. Data Stream Analysis
The term "data stream analysis" refers to continuous data analysis; yet, because data mining requires sophisticated calculations and lengthy processing times, it is primarily applied to static data. The dynamic nature of malicious assaults and breaches makes intrusion detection within the records stream context much more crucial. Furthermore, even though an event appears normal on its own, it may be deemed malevolent if it is seen as a component of a larger set of events. Therefore, it's critical to identify sequential trends, look for outliers, and consider whether sequences of tasks are frequently encountered together .. Real-time intrusion detection also requires other data mining techniques for finding growing clusters and building dynamic class models in record streams.
B. Devising Innovative IDS Model
The IDS model's data mining technique has a lower false alarm rate and a greater efficiency rate. Data mining techniques are applicable to anomaly-based as well as signature-based detection. Training data is categorized as "normal" or "intrusion" in signature-based detection. Then, a classifier to find acknowledged incursions can be derived. Cost-sensitive modeling, association rule mining, and clarifying algorithms have all been used in research on this location. Completely anomaly-based detection creates models of typical behavior and automatically identifies significant departures from this . Statistical techniques, class algorithms, clustering software, and outlier analysis software are examples of methods. The employed solutions must be effective, scalable, and capable of handling excessively large, multidimensional, and heterogeneous amounts of community data.
C. Distributed Data Mining
It is employed to analyze random data, which is naturally dispersed across many databases, making data processing integration challenging. Attacks can originate from a variety of unique locations and target a variety of unique locations. To find those dispersed attacks, community data from several network locations can be examined using distributed data mining techniques.
D. Visualization Tool
These tools are used to display the data as graphs, which makes it easier for the user to interpret the data visually. The aberrant patterns that are found can also be viewed using these tools. These tools could include the ability to view outliers, clusters, relationships, and discriminative patterns. A graphical user interface is actually required for intrusion detection structures in order to enable safety analysts to ask questions about the network data or intrusion detection findings.
Given that the IDS is a long-standing technology, it will inevitably face certain issues that are incompatible with the contemporary IT landscape. Because it has been around for so long, malevolent actors have developed evasion strategies to fool intrusion detection systems into missing attacks. Some of the major evasion techniques are:
A modern computer system\'s planning involves considering network security. In computer network security, intrusion assault detection is the most crucial problem to solve. Based on the methods used for detection, existing IDS can be categorized into two groups: misuse detection and anomaly detection. The identification of anomalies, misuse, and mis-configuration can be done in a number of ways. Applying data mining techniques greatly increase the capabilities of IDS Systems. The k-means algorithm is a value-sensitive starting method that yields varying clustering outcomes depending on the value of k. An enhanced version of the k-means algorithm is presented in this work in light of its flaws. The data set automatically generates an ideal value through clustering, therefore the k value does not need to be known in advance. After determination, there is no need to modify the clustering centre, and only one scan of the entire data set is required. Both the clustering effect and the enhanced k-means algorithm are now much more efficient.
 Basant Agarwal, Namita Mittal, “Hybrid Approach for Detection of Anomaly Network Traffic using Data Mining Techniques”, 2nd International Conference on Communication, Computing & Security, Procedia technology, ScienceDirect, Elsevier Publication, 2012, DOI: 10.1016/j.protcy.2012.10.121  Bischof, H., Leonardis, A., and Selb, A. MDL principle for robust vector quantisation. Pattern Analysis and applications. 2:59-72,1999.  SANS Institute. Understanding Intrusion Detection System. 2001.  K Bala, N Kumar, AK Singh, Performance Evaluation of Advanced Intrusion Detection System, International Journal of Research in Engineering and Technology Volume 7 Issue (10), pp- 10 – 15, 2018.  Shankar Kumar, Dr. Nandeshwar Pd. Singh, Dr. Narendra Kumar.\"Mechanism, Tools and Techniques to Mitigate Distributed Denial of Service Attacks\", Volume 11, Issue I, International Journal for Research in Applied Science and Engineering Technology (IJRASET) Page No: 855-861, ISSN : 2321-965  Curtis A. Carver, Jr., Jeffrey W. Humphries, and Udo W. Pooch,”Adaptation Techniques for Intrusion Detection and Intrusion Response Systems  Eduardo Mosqueira-Rey, Amparo Alonso-Betanzos, Belen Baldonedo Del Rio, and Jesus Lago Pineiro, ” A Misuse Detection Agent for Intrusion Detection in a Multi-agent Architecture”. Springer-Verlag Berlin Heidelberg 2007  Jiawei, H. and Micheline, K. Data Mining Concepts and techniques, second edition, China Machine Press, pp. 296-303. 2006.  Sabhani, M., and Serpen, G. Why Machine Learning Algorithms Fail in Misuse Detection on KDD Intrusion Detection Dataset. Intelligent Data Analysis, vol 6. (Jne 2004).  Siddiqui, M.K., and Naahid, S.Analysis of KDD CUP 99 Dataset using Clustering based Data Mining. International Journal of Database Theory and Application Vol.6, No. 5. pp.23-24. 2013.
Copyright © 2024 Arun Pandey, Dr. Shishir Sharma. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.