Demystifying Neural Network: A Case Study on Intrusion Detection System

Authors: Vyom Patel

DOI Link: https://doi.org/10.22214/ijraset.2025.68644

Abstract

Considering the historical context of Deep Neural Networks (DNNs), it can be inferred that DNN has had a promising history that has now evolved with various philosophical viewpoints. Modeling DNN has now become effortless with the generation and availability of large amount of data. Moreover, with the development of improved hardware and software infrastructure requirements, DNN models have now grown in size. DNN models have the capability of addressing complex application problems with improved accuracy over time. The art and science of DNN is based on the foundation of neural network. Thus, this paperr aims at discussing the fundamentals of neural networks and how they work. The paper includes a brief discussion on functionality of neural network, role of activation functions, backpropagation algorithm, loss function calculations, and optimizers for neural networks. Further, the paperr also discusses generalization in neural network and parameters in neural network architecture. The paper also includes a case study on intrusion detection model building using neural network. The demystifying neural network is the first stride towards understanding DNN.

Introduction

1. Neural Network Inspiration from the Human Brain

Neural networks are inspired by how the human brain functions, with neurons as the fundamental units.
In the brain, neurons are highly interconnected and transmit signals through electrical discharges.
Connections between neurons can be strong or weak, depending on usage, and this concept is replicated in artificial neural networks through weights.

2. Structure of a Neural Network Neuron

An artificial neuron receives inputs (like dendrites), multiplies them with weights, adds a bias, and applies an activation function to produce an output.
This mimics the biological neuron, where inputs are processed and then transmitted to other neurons.

3. Neural Network Architecture

Neural networks consist of layers:
- Input Layer: Takes raw data (no computation).
- Hidden Layers: Perform computations.
- Output Layer: Produces the final prediction.
Each neuron in one layer connects to all neurons in the next, forming a feedforward structure.

4. Single Neuron Functionality

Computation in a neuron is represented as:
H=X1W1+X2W2+X3W3+BH = X_1W_1 + X_2W_2 + X_3W_3 + BH=X1?W1?+X2?W2?+X3?W3?+B
Output is generated by applying an activation function to the above result.
For full networks, outputs are computed through a sequence of matrix multiplications and activation functions:
Y=f(W3∗f(W2∗f(W1∗X+B1)+B2)+B3)Y = f(W_3 * f(W_2 * f(W_1 * X + B_1) + B_2) + B_3)Y=f(W3?∗f(W2?∗f(W1?∗X+B1?)+B2?)+B3?)

5. Activation Functions

They introduce non-linearity and decide whether a neuron should activate.
Common types:
- Sigmoid: Smooth output (0–1), but suffers from vanishing gradient.
- TanH: Outputs between -1 and 1; centered but still has vanishing gradient.
- ReLU: Fast convergence, but may cause "dying ReLU".
- Leaky ReLU and Parametric ReLU: Solve ReLU limitations by allowing small negative outputs.
- Softmax: Used in classification to convert logits to probabilities.

6. Parameters vs. Hyperparameters

Parameters (e.g., weights, biases):
- Learned during training.
- Affect model predictions directly.
Hyperparameters (e.g., learning rate, number of layers):
- Manually set before training.
- Influence how the model learns.

Feature	Parameters	Hyperparameters
Role	Prediction & classification	Optimization & learning efficiency
Estimation	Learned from data	Set manually or optimized
Adjustability	Automatic during training	Manual tuning or algorithmic search

7. Generalization in Neural Networks

Generalization: The ability of a model to perform well on unseen data.
- Underfitting: Model performs poorly on both training and test data.
- Overfitting: Model performs well on training but poorly on test data.
- Good fit: Balanced performance on both.

8. Regularization Techniques for Generalization

Dropout: Randomly deactivates neurons during training to prevent co-adaptation.
Noise Injection: Adds noise to training data to improve robustness.
Early Stopping: Stops training when validation performance ceases to improve.
Batch Normalization: Normalizes input to each layer for faster and more stable training.

Conclusion

Neural network techniques have been widely used for various application domains such as Intrusion Detection System (IDS). The characteristic properties of neural networks such as end-to-end learning and automated feature learning has increased their usage in various application domains. In this paper, we have discussed fun- damentals of neural networks for building models for learning and prediction. Thispaeprdiscusses functionality of neural network, activation functions, backprop- agation technique, loss functions, optimizers, parameters, and hyperparameters of neural networks. Moreover, we have also discussed need to generalize neural net- work models and the techniques used to achieve generalization. Further, we have also discuss a case study that describes functional components that can be considered while building IDS using neural networks.

References

[1] Aggarwal, C.C., et al.: Neural networks and deep learning. Springer (2018) [2] Agostinelli, F., Hoffman, M., Sadowski, P., Baldi, P.: Learning activation functions to improve deep neural networks. arXiv preprint arXiv:1412.6830 (2014) [3] Al-Sammarraie, N.A., Al-Mayali, Y.M.H., El-Ebiary, Y.A.B.: Classification and diagnosis using back propagation artificial neural networks (ann). In: 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), pp. 1–5. IEEE (2018) [4] Anand, P., Rastogi, R., Chandra, S.: Generalized ???? —loss function-based regression. In: Machine intelligence and signal analysis, pp. 395–409. Springer (2019) [5] Cao, Y., Gu, Q.: Generalization bounds of stochastic gradient descent for wide and deep neural networks. In: Advances in Neural Information Processing Systems, pp. 10836–10846 (2019) [6] Deng, L.: A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing 3 (2014) [7] Du, S., Lee, J., Li, H., Wang, L., Zhai, X.: Gradient descent finds global minima of deep neural networks. In: International Conference on Machine Learning, pp. 1675–1685 (2019) [8] Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep learning, vol. 1. MIT press Cambridge (2016) [9] Hameed, A.A., Karlik, B., Salman, M.S.: Back-propagation algorithm with variable adaptive momentum. Knowledge-Based Systems 114, 79–87 (2016) [10] Hatcher, W.G., Yu, W.: A survey of deep learning: Platforms, applications and emerging research trends. IEEE Access 6, 24411–24432 (2018) [11] Indrapriyadarsini, S., Mahboubi, S., Ninomiya, H., Asai, H.: An adaptive stochastic nesterov accelerated quasi newton method for training rnns. arXiv preprint arXiv:1909.03620 (2019) [12] Janocha, K., Czarnecki, W.M.: On loss functions for deep neural networks in classification. arXiv preprint arXiv:1702.05659 (2017) [13] Labach, A., Salehinejad, H., Valaee, S.: Survey of dropout methods for deep neural networks. arXiv preprint arXiv:1904.13310 (2019) [14] Li, Y., Liang, Y.: Learning overparameterized neural networks via stochastic gradient descent on structured data. In: Advances in Neural Information Processing Systems, pp. 8157–8166 (2018) [15] Liang, S., Sun, R., Li, Y., Srikant, R.: Understanding the loss surface of neural networks for binary classification. arXiv preprint arXiv:1803.00909 (2018) [16] Lydia, A., Francis, S.: Adagrad—an optimizer for stochastic gradient descent. Int. J. Inf. Comput. Sci. 6(5) (2019) [17] Martins, A., Astudillo, R.: From softmax to sparsemax: A sparse model of attention and multi-label classification. In: International Conference on Machine Learning, pp. 1614–1623 (2016) [18] Narasimhan, H.: Learning with complex loss functions and constraints. In: International Conference on Artificial Intelligence and Statistics, pp. 1646–1654 (2018) [19] Neyshabur, B., Li, Z., Bhojanapalli, S., LeCun, Y., Srebro, N.: The role of over-parametrization in generalization of neural networks. In: International Conference on Learning Representations (2018) [20] Okewu, E., Misra, S., Lius, F.S.: Parameter tuning using adaptive moment estimation in deep learning neural networks. In: International Conference on Computational Science and Its Applications, pp. 261–272. Springer (2020) [21] Ren, Y., Zhao, P., Sheng, Y., Yao, D., Xu, Z.: Robust softmax regression for multi-class clas- sification with self-paced learning. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 2641–2647 (2017) [22] Sharma, S.: Activation functions in neural networks. Towards Data Science 6 (2017) [23] Sibi, P., Jones, S.A., Siddarth, P.: Analysis of different activation functions using back prop- agation neural networks. Journal of theoretical and applied information technology 47(3), 1264–1268 (2013) [24] Siregar, S.P., Wanto, A.: Analysis of artificial neural network accuracy using backpropagation algorithm in predicting process (forecasting). IJISTECH (International Journal of Information System & Technology) 1(1), 34–42 (2017) [25] Tan, H.H., Lim, K.H.: Review of second-order optimization techniques in artificial neural networks backpropagation. In: IOP Conference Series: Materials Science and Engineering, vol. 495, p. 012003. IOP Publishing (2019) [26] Thakkar, A., Lohiya, R.: Attack classification using feature selection techniques: a comparative study. Journal of Ambient Intelligence and Humanized Computing pp. 1–18 (2020) [27] Thakkar, A., Lohiya, R.: A review of the advancement in intrusion detection datasets. Procedia [28] Computer Science 167, 636–645 (2020) [29] Wang, L., Yang, Y., Min, R., Chakradhar, S.: Accelerating deep neural network training with inconsistent stochastic gradient descent. Neural Networks 93, 219–229 (2017) [30] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science pp. 1–26 (2020) [31] Xie, B., Liang, Y., Song, L.: Diversity leads to generalization in neural networks. In: Interna- tional Conference on Artificial Intelligence and Statistics (AISTATS) (2017) [32] Zajmi, L., Ahmed, F.Y., Jaharadak, A.A.: Concepts, methods, and performances of particle swarm optimization, backpropagation, and neural networks. Applied Computational Intelli- gence and Soft Computing 2018 (2018) [33] Zhang, G., Wang, C., Xu, B., Grosse, R.: Three mechanisms of weight decay regularization. arXiv preprint arXiv:1810.12281 (2018) [34] Zhang, Z.: Improved adam optimizer for deep neural networks. In: 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), pp. 1–2. IEEE (2018)

Copyright

Copyright © 2025 Vyom Patel. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET68644

Publish Date : 2025-04-10

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here