Considering the historical context of Deep Neural Networks (DNNs), it can be inferred that DNN has had a promising history that has now evolved with various philosophical viewpoints. Modeling DNN has now become effortless with the generation and availability of large amount of data. Moreover, with the development of improved hardware and software infrastructure requirements, DNN models have now grown in size. DNN models have the capability of addressing complex application problems with improved accuracy over time. The art and science of DNN is based on the foundation of neural network. Thus, this paperr aims at discussing the fundamentals of neural networks and how they work. The paper includes a brief discussion on functionality of neural network, role of activation functions, backpropagation algorithm, loss function calculations, and optimizers for neural networks. Further, the paperr also discusses generalization in neural network and parameters in neural network architecture. The paper also includes a case study on intrusion detection model building using neural network. The demystifying neural network is the first stride towards understanding DNN.
Introduction
1. Neural Network Inspiration from the Human Brain
Neural networks are inspired by how the human brain functions, with neurons as the fundamental units.
In the brain, neurons are highly interconnected and transmit signals through electrical discharges.
Connections between neurons can be strong or weak, depending on usage, and this concept is replicated in artificial neural networks through weights.
2. Structure of a Neural Network Neuron
An artificial neuron receives inputs (like dendrites), multiplies them with weights, adds a bias, and applies an activation function to produce an output.
This mimics the biological neuron, where inputs are processed and then transmitted to other neurons.
3. Neural Network Architecture
Neural networks consist of layers:
Input Layer: Takes raw data (no computation).
Hidden Layers: Perform computations.
Output Layer: Produces the final prediction.
Each neuron in one layer connects to all neurons in the next, forming a feedforward structure.
They introduce non-linearity and decide whether a neuron should activate.
Common types:
Sigmoid: Smooth output (0–1), but suffers from vanishing gradient.
TanH: Outputs between -1 and 1; centered but still has vanishing gradient.
ReLU: Fast convergence, but may cause "dying ReLU".
Leaky ReLU and Parametric ReLU: Solve ReLU limitations by allowing small negative outputs.
Softmax: Used in classification to convert logits to probabilities.
6. Parameters vs. Hyperparameters
Parameters (e.g., weights, biases):
Learned during training.
Affect model predictions directly.
Hyperparameters (e.g., learning rate, number of layers):
Manually set before training.
Influence how the model learns.
Feature
Parameters
Hyperparameters
Role
Prediction & classification
Optimization & learning efficiency
Estimation
Learned from data
Set manually or optimized
Adjustability
Automatic during training
Manual tuning or algorithmic search
7. Generalization in Neural Networks
Generalization: The ability of a model to perform well on unseen data.
Underfitting: Model performs poorly on both training and test data.
Overfitting: Model performs well on training but poorly on test data.
Good fit: Balanced performance on both.
8. Regularization Techniques for Generalization
Dropout: Randomly deactivates neurons during training to prevent co-adaptation.
Noise Injection: Adds noise to training data to improve robustness.
Early Stopping: Stops training when validation performance ceases to improve.
Batch Normalization: Normalizes input to each layer for faster and more stable training.
Conclusion
Neural network techniques have been widely used for various application domains such as Intrusion Detection System (IDS). The characteristic properties of neural networks such as end-to-end learning and automated feature learning has increased their usage in various application domains. In this paper, we have discussed fun- damentals of neural networks for building models for learning and prediction. Thispaeprdiscusses functionality of neural network, activation functions, backprop- agation technique, loss functions, optimizers, parameters, and hyperparameters of neural networks. Moreover, we have also discussed need to generalize neural net- work models and the techniques used to achieve generalization. Further, we have also discuss a case study that describes functional components that can be considered while building IDS using neural networks.
References
[1] Aggarwal, C.C., et al.: Neural networks and deep learning. Springer (2018)
[2] Agostinelli, F., Hoffman, M., Sadowski, P., Baldi, P.: Learning activation functions to improve deep neural networks. arXiv preprint arXiv:1412.6830 (2014)
[3] Al-Sammarraie, N.A., Al-Mayali, Y.M.H., El-Ebiary, Y.A.B.: Classification and diagnosis using back propagation artificial neural networks (ann). In: 2018 International Conference on Smart Computing and Electronic Enterprise (ICSCEE), pp. 1–5. IEEE (2018)
[4] Anand, P., Rastogi, R., Chandra, S.: Generalized ???? —loss function-based regression. In: Machine intelligence and signal analysis, pp. 395–409. Springer (2019)
[5] Cao, Y., Gu, Q.: Generalization bounds of stochastic gradient descent for wide and deep neural networks. In: Advances in Neural Information Processing Systems, pp. 10836–10846 (2019)
[6] Deng, L.: A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing 3 (2014)
[7] Du, S., Lee, J., Li, H., Wang, L., Zhai, X.: Gradient descent finds global minima of deep neural networks. In: International Conference on Machine Learning, pp. 1675–1685 (2019)
[8] Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep learning, vol. 1. MIT press Cambridge (2016)
[9] Hameed, A.A., Karlik, B., Salman, M.S.: Back-propagation algorithm with variable adaptive momentum. Knowledge-Based Systems 114, 79–87 (2016)
[10] Hatcher, W.G., Yu, W.: A survey of deep learning: Platforms, applications and emerging research trends. IEEE Access 6, 24411–24432 (2018)
[11] Indrapriyadarsini, S., Mahboubi, S., Ninomiya, H., Asai, H.: An adaptive stochastic nesterov accelerated quasi newton method for training rnns. arXiv preprint arXiv:1909.03620 (2019)
[12] Janocha, K., Czarnecki, W.M.: On loss functions for deep neural networks in classification. arXiv preprint arXiv:1702.05659 (2017)
[13] Labach, A., Salehinejad, H., Valaee, S.: Survey of dropout methods for deep neural networks. arXiv preprint arXiv:1904.13310 (2019)
[14] Li, Y., Liang, Y.: Learning overparameterized neural networks via stochastic gradient descent on structured data. In: Advances in Neural Information Processing Systems, pp. 8157–8166 (2018)
[15] Liang, S., Sun, R., Li, Y., Srikant, R.: Understanding the loss surface of neural networks for binary classification. arXiv preprint arXiv:1803.00909 (2018)
[16] Lydia, A., Francis, S.: Adagrad—an optimizer for stochastic gradient descent. Int. J. Inf. Comput. Sci. 6(5) (2019)
[17] Martins, A., Astudillo, R.: From softmax to sparsemax: A sparse model of attention and multi-label classification. In: International Conference on Machine Learning, pp. 1614–1623 (2016)
[18] Narasimhan, H.: Learning with complex loss functions and constraints. In: International Conference on Artificial Intelligence and Statistics, pp. 1646–1654 (2018)
[19] Neyshabur, B., Li, Z., Bhojanapalli, S., LeCun, Y., Srebro, N.: The role of over-parametrization in generalization of neural networks. In: International Conference on Learning Representations (2018)
[20] Okewu, E., Misra, S., Lius, F.S.: Parameter tuning using adaptive moment estimation in deep learning neural networks. In: International Conference on Computational Science and Its Applications, pp. 261–272. Springer (2020)
[21] Ren, Y., Zhao, P., Sheng, Y., Yao, D., Xu, Z.: Robust softmax regression for multi-class clas- sification with self-paced learning. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 2641–2647 (2017)
[22] Sharma, S.: Activation functions in neural networks. Towards Data Science 6 (2017)
[23] Sibi, P., Jones, S.A., Siddarth, P.: Analysis of different activation functions using back prop- agation neural networks. Journal of theoretical and applied information technology 47(3), 1264–1268 (2013)
[24] Siregar, S.P., Wanto, A.: Analysis of artificial neural network accuracy using backpropagation algorithm in predicting process (forecasting). IJISTECH (International Journal of Information System & Technology) 1(1), 34–42 (2017)
[25] Tan, H.H., Lim, K.H.: Review of second-order optimization techniques in artificial neural networks backpropagation. In: IOP Conference Series: Materials Science and Engineering, vol. 495, p. 012003. IOP Publishing (2019)
[26] Thakkar, A., Lohiya, R.: Attack classification using feature selection techniques: a comparative study. Journal of Ambient Intelligence and Humanized Computing pp. 1–18 (2020)
[27] Thakkar, A., Lohiya, R.: A review of the advancement in intrusion detection datasets. Procedia
[28] Computer Science 167, 636–645 (2020)
[29] Wang, L., Yang, Y., Min, R., Chakradhar, S.: Accelerating deep neural network training with inconsistent stochastic gradient descent. Neural Networks 93, 219–229 (2017)
[30] Wang, Q., Ma, Y., Zhao, K., Tian, Y.: A comprehensive survey of loss functions in machine learning. Annals of Data Science pp. 1–26 (2020)
[31] Xie, B., Liang, Y., Song, L.: Diversity leads to generalization in neural networks. In: Interna- tional Conference on Artificial Intelligence and Statistics (AISTATS) (2017)
[32] Zajmi, L., Ahmed, F.Y., Jaharadak, A.A.: Concepts, methods, and performances of particle swarm optimization, backpropagation, and neural networks. Applied Computational Intelli- gence and Soft Computing 2018 (2018)
[33] Zhang, G., Wang, C., Xu, B., Grosse, R.: Three mechanisms of weight decay regularization. arXiv preprint arXiv:1810.12281 (2018)
[34] Zhang, Z.: Improved adam optimizer for deep neural networks. In: 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), pp. 1–2. IEEE (2018)