Cyber threats such as malware, phishing, and DDoS attacks are becoming increasingly sophisticated, necessitating advanced detection mechanisms. This paper presents an AI-driven cybersecurity system that integrates machine learning models for real-time detection of cyber threats, network intrusions, phishing URLs, and email phishing. The system employs NLP for malware analysis, anomaly detection for intrusion detection, and classification models for phishing prevention. Developed using FastAPI for real-time inference and SQLite for secure logging, the system ensures efficient threat identification and response. Security measures such as SQL injection protection, API authentication, and data encryption further enhance its robustness. Experimental results show high detection accuracy, with intrusion detection at 96% and email phishing detection at 97%.
Introduction
Problem Context:
Cyber threats are growing in complexity and frequency. Traditional rule-based systems are increasingly ineffective, particularly against zero-day attacks, phishing, and malware. To combat this, there is a shift toward AI and machine learning-based systems that can adapt, detect, and respond to threats in real time.
Solution Overview:
The paper presents an AI-powered cybersecurity system designed to detect and mitigate:
Malware
Network intrusions (including DDoS)
Phishing URLs
Email phishing
The system employs four specialized AI models, each targeting a specific threat category, and operates in real-time using FastAPI for inference and SQLite for secure threat logging.
Model Architecture:
Threat Type
Model Used
Key Features
Cyber Threat Detection
RandomForest + MultiOutputClassifier
Classifies multiple threats from text-based logs using TF-IDF and BERT embeddings
Intrusion Detection (IDS)
XGBoost
Detects anomalies and DDoS using network flow stats with entropy-based feature selection
Phishing URL Detection
RandomForest
Uses lexical analysis and domain features (e.g., URL length, domain age, HTTPS)
Email Phishing Detection
BERT
Uses NLP to detect phishing via contextual understanding of email content
Datasets Used:
Text-based Cyber Threat Dataset (Kaggle)
CIC-DDoS2019 for intrusion detection
Phishing Domain Dataset for URL classification
Phishing Mail Dataset (Torch-RoBERTa) for email phishing detection
Each dataset is preprocessed with appropriate techniques (e.g., tokenization, normalization, HTML stripping, embedding extraction) to enhance model accuracy.
Key Technologies:
FastAPI – Enables real-time API-based inference
SQLite – Stores logs with AES encryption
Security Measures – SQL injection protection, API authentication, secure logging
Performance Evaluation:
All models were tested using:
80-20 train-test split
5-fold cross-validation
Metrics: Accuracy, Precision, Recall, F1-score
Model
Accuracy
Precision
Recall
F1-Score
Cyber Threat Detection
94%
92%
95%
93%
Intrusion Detection (IDS)
96%
95%
96%
95%
Phishing URL Detection
93%
91%
92%
92%
Email Phishing Detection
97%
96%
97%
96%
Strengths and Innovations:
Real-time detection with minimal latency
Context-aware NLP (using BERT) for phishing email classification
Multi-label classification of text-based threats
Entropy-based feature selection for IDS
Automated threat response via a Threat Decision Engine (e.g., block IPs, quarantine emails)
Strong encryption and secure logging
Low false positives, high scalability, and adaptability
Limitations in Prior Work (Addressed by This System):
Previous Limitations
This System’s Improvements
High false positives, model retraining needs (Alatise et al.)
Improved model accuracy and reduced false positives
Lack of real-time detection and adaptability (Hadi et al.)
Real-time processing and automated threat responses
No deep learning use, poor scalability (Sunil Kumar et al.)
Integrates deep learning (BERT) and scalable architecture
Vulnerability to obfuscation in phishing URLs (Abutaha et al.)
The proposed AI-powered cybersecurity system effectively detects cyber threats, network intrusions, phishing URLs, and email phishing using machine learning and deep learning models. The system achieved high accuracy, precision, and recall, outperforming traditional rule-based security mechanisms. By integrating RandomForest, XGBoost, and BERT, it ensures real-time monitoring, automated threat mitigation, and secure logging through FastAPI and SQLite. The results demonstrate that AI-driven approaches enhance cybersecurity resilience by adapting to evolving attack patterns while minimizing false positives and false negatives. Future enhancements will focus on adversarial defense, cloud-based deployment, and real-time adaptive learning to improve the system’s scalability and efficiency in handling zero-day threats.
References
[1] Kumar, Sunil, BhanuPratap Singh, and Vinesh Kumar. \"A semantic machine learning algorithm for cyber threat detection and monitoring security.\" In 2021 3rd International Conference on Advances in Computing, Communication Control and Networking (ICAC3N), pp. 1963-1967. IEEE, 2021.
[2] Timothy I Alatise and Olusegun E Nottidge, “Threat detection and response with SIEM system,\" International Journal of Communication and Information Technology 2024.
[3] Hadi, Hassan Jalil, Umer Hayat, NumanMusthaq, Faisal Bashir Hussain, and Yue Cao. \"Developing realistic distributed denial of service (ddos) dataset for machine learning-based intrusion detection system.\" In 2022 9th International Conference on Internet of Things: Systems, Management and Security (IOTSMS), pp. 1-6. IEEE, 2022.
[4] Abutaha, Mohammed, Mohammad Ababneh, Khaled Mahmoud, and Sherenaz Al-Haj Baddar. \"URL phishing detection using machine learning techniques based on URLs lexical analysis.\" In 2021 12th International Conference on Information and Communication Systems (ICICS), pp. 147-152. IEEE, 2021.
[5] Yaseen, Asad. \"Accelerating the SOC: Achieve greater efficiency with AI-driven automation.\" International Journal of Responsible Artificial Intelligence 12, no. 1 (2022): 1-19.
[6] Sarker, Iqbal H., MdHasanFurhad, and RazaNowrozy. \"Ai-driven cybersecurity: an overview, security intelligence modeling and research directions.\" SN Computer Science 2, no. 3 (2021): 173..
[7] Yaseen, Asad. \"AI-driven threat detection and response: A paradigm shift in cybersecurity.\" International Journal of Information and Cybersecurity 7, no. 12 (2023): 25-43.
[8] X. Ye, J. Zhao, Y. Zhang, and F. Wen. “Quantitative vulnerability assessment of cyber security for distribution automation systems. Energies,” 8(6):5266–5286, 2020.
[9] Gu, Guofei, et al. \"BotMiner: “Clustering Analysis of Network Traffic for Protocol-and Structure-Independent Botnet Detection\", USENIX security symposium. Vol. [5] No. 2. 2015.
[10] Anderson HS, Roth P. EMBER, “An Open Dataset For Training Static PE Malware Machine Learning Models”, arXiv preprint arXiv:1804.04637, April 2018.
[11] Yang, Caihong, Fei Wang, and Benxiong Huang. \"Internet traffic classification using dbscan\", Information Engineering, 2009. ICIE\'09. WASE International Conference on. Vol. 2. IEEE, (2014)