The fast increase in cyber-attacks means we need better tools to spot and deal with threats quickly to stop big security problems. This study introduces a Real-Time Cyber Threat Monitoring and Reporting System that keeps checking for possible cyber dangers all the time. It uses data from public cybersecurity websites on the surface web, which is collected through web scraping. The system looks at text data, uses machine learning to find strange patterns and unusual behavior, and helps identify threats early. The system has a Python-based backend using Flask to handle data and run models, and a React.js frontend that gives a clear dashboard showing live alerts, summaries, and threat details. MongoDB is used to store and get large amounts of data quickly. By combining web data collection, natural language processing, and strong machine learning models, this system provides fast alerts and in-depth reports, helping cybersecurity experts take action before problems happen
Introduction
The text describes the increasing severity of cybercrime and the need for a more advanced, automated system to detect and monitor cyber threats in real time using publicly available online data.
Traditional cybersecurity methods rely heavily on rule-based systems and structured data (like IP addresses and file hashes), but they struggle with large volumes of unstructured information from sources such as forums, blogs, and online discussions. This creates an “intelligence-to-action gap,” where early signs of cyber threats like zero-day attacks are often missed or detected too late.
To address this issue, the study proposes a system that uses Artificial Intelligence (AI), Machine Learning (ML), Deep Learning (DL), and Natural Language Processing (NLP) to automatically collect, analyze, and classify cyber threat information from public sources. The goal is to build an ethical, real-time cyber threat intelligence system that improves situational awareness and supports faster response to attacks.
The literature review highlights that existing systems already use NLP and ML for cyber threat detection, but they face limitations such as scalability issues, high computational cost, difficulty handling noisy or multilingual data, and reliance on restricted or dark web sources. Recent transformer-based models like BERT and DistilBERT have improved accuracy but still require better real-time and scalable solutions.
The proposed system addresses these gaps by developing an automated cyber threat intelligence platform that:
Collects data from online sources and APIs
Cleans and preprocesses unstructured information
Uses ML models and the Wazuh engine for threat detection
Classifies activities as normal or malicious
Provides dashboards and reports for security analysts
Overall, the system aims to improve real-time cyber threat detection, monitoring, and reporting in a scalable, automated, and ethically responsible way.
Conclusion
The proposed system demonstrates high-accuracy threat classification, achieving 93.72% accuracy and a 0.90 macro F1-score across seven threat categories including Phishing, Malware, DDoS , Data Breach, Insider Threat, Ransomware, and Zero-Day, on 15,720 test samples despite class imbalance . It incorporates a DistilBERT-based transformer pipeline with a multi-layer architecture involving preprocessing, real-time threat classification, confidence validation where 6.7 percent of cases are flagged for human verification, and structured reporting for operational cybersecurity [12]. The system ensures real-time SOC performance with a low inference latency of 22 ms, which is 51% faster compared to the BERT-Base model, along with reduced model size (66M parameters) and memory requirement (1.1 GB), enabling efficient GPU-free deployment . Furthermore, it supports production-level monitoring capability by enabling automated decision-making for 93.3% of cases while allowing human intervention for exceptional scenarios, thereby ensuring continuous and reliable cyber threat monitoring and reporting].
References
[1] A. S. Gautam, Y. Gahlot, and P. Kamat, “Hacker forum exploit and classification for proactive cyber threat intelligence,” in Proc. Inventive Computation Technol., vol. 98, S. Smys, R. Bestak, and Á. Rocha Eds. Cham, Switzerland: Springer, 2020, pp. 279–285, doi: 10.1007/978-3-030-33846-6_32.
[2] W. S. Admass, Y. Y. Munaye, and A. A. Diro, “Cyber security: State of the art, challenges and future directions,” Cyber Secur. Appl., vol. 2, 2024, Art. no. 100031, doi: 10.1016/j.csa.2023.100031.
[3] M. A. Manjramkar and K. C. Jondhale, “Cyber security using machine learning techniques,” in Proc. Int. Conf. Appl. Mach. Intell. Data Analytics, Dordrecht, The Netherlands, 2023, pp. 680–701, doi: 10.2991/978-94-6463-136-4_59.
[4] N. Goel, A. Mansi, and N. Sethi, “Cyber threat intelligence: A survey on progressive techniques and challenges,” in Proc. Int. Conf. Big Data IoT Cyber Sect. Inf. Technol., Pune, India, 2022, pp. 37–41.
[5] S. Silvestri, S. Islam, S. Papastergiou, C. Tzagkarakis, and M. Ciampi, “A machine learning approach for the NLP-based analysis of cyber threats and vulnerabilities of the healthcare ecosystem,” Sensors, vol. 23, no. 2, Jan. 2023, Art. no. 651, doi: 10.3390/s23020651.
[6] D. M. Blei, A. Y. Ng, and M. I. Jordan, “Latent Dirichlet allocation,” J. Mach. Learn., vol. 3, pp. 993–1022, Jan. 2003.
[7] I. Deliu, C. Leichter, and K. Franke, “Collecting cyber threat intelligence from hacker forums via a two-stage, hybrid process using support vector machines and latent dirichlet allocation,” in Proc. 2018 IEEE Big Data, Seattle, WA, USA, 2018, pp. 5008–5013, doi: 10.1109/BigData.2018.8622469.
[8] Y. Wang, M. A. Bashar, M. Chandramohan, and R. Nayak, “Exploring topic models to discern cyber threats on Twitter: A case study on Log4Shell,” Intell. Syst. Appl., vol. 20, Nov. 2023, Art. no. 200280, doi: 10.1016/j.iswa.2023.200280.
[9] E. Irshad and A. Basit Siddiqui, “Cyber threat attribution using unstructured reports in cyber threat intelligence,” Egyptian Inform. J., vol. 24, no. 1, pp. 43–59, Mar. 2023, doi: 10.1016/j.eij.2022.11.001.
[10] W. Yang and K.-Y. Lam, “Automated cyber threat intelligence reports classification for early warning of cyber attacks in next generation SOC,” in Proc. Inf. Commun. Secur., J. Zhou, X. Luo, Q. Shen, and Z. Xu, Eds. Cham, Switzerland: Springer, 2020, vol. 11999, pp. 145–164, doi: 10.1007/978-3-030-41579-2_9.
[11] V. Behzadan, C. Aguirre, A. Bose, and W. Hsu, “Corpus and deep learning classifier for collection of cyber threat indicators in Twitter stream,” in Proc. IEEE Big Data, Seattle, WA,USA,2018,pp.5002.5007,doi:10.1109/BigData.2018.8622506.40
[12] J. Liu et al., “TriCTI: An actionable cyber threat intelligence discovery system via trigger-enhanced neural network,” Cybersecurity, vol. 5, no. 1, Dec. 2022, Art. no. 8, doi: 10.1186/s42400- 022-00110-3
[13] Roger Dingledine, Nick Mathewson, Paul F Syverson, et al. Tor:The second-generation onion router. In USENIX security symposium, volume 4, pages 303–320, 2004.
[14] Bassam Zantout, Ramzi Haraty, et al. I2p data communication system. In Proceedings of ICN, pages 401–409. Citeseer, 2011.
[15] Shubhdeep Kaur and Sukhchandan Randhawa. Dark web: A web of crimes. Wireless Personal Communications, 112:2131–2158, 2020.