Today’s enterprise networks handle enormous volumes of traffic around the clock, and manually watching over all of it for threats is simply not realistic. This paper presents a real-time network traffic anomaly detection system that is built on top of AWS cloud services, using Apache Kafka running on EC2 for high-throughput log streaming, together with AWS Lambda, CloudWatch, Amazon S3, AWS Glue, and Amazon Athena. As network logs flow in continuously, an Isolation Forest machine learning model analyses the traffic and flags anything that looks out of the ordinary. A Generative AI component then steps in as a virtual network security analyst, reading those flagged records and writing a plain-English explanation of what went wrong and why it matters. On top of that, an Automated Root Cause Analysis (ARCA) module digs through thousands of error logs to tell administrators exactly which cloud service configuration is most likely at fault. In our experiments the system achieved a detection accuracy of 96.3 % with an end-to-end alert latency under two seconds, clearly outperforming traditional rule-based intrusion detection approaches. The architecture satisfies the Mandatory CPPE coverage requirements across the Cloud, Data, and GenAI pillars and delivers a solution that is scalable, transparent, and immediately actionable for security teams.
Introduction
The paper presents a cloud-native network security system designed to detect and explain anomalies in real time within complex cloud environments. As cloud infrastructures grow in scale and complexity, traditional rule-based intrusion detection systems become ineffective due to evolving attack patterns and dynamic traffic behavior. To address this, the study uses unsupervised machine learning (Isolation Forest) combined with Generative AI (LLMs) to both detect anomalies and explain them in human-understandable terms.
The system is built on a three-layer architecture:
Cloud Layer: Uses Apache Kafka, AWS Lambda, and CloudWatch for real-time log ingestion, processing, and monitoring.
Data Layer: Uses Amazon S3, AWS Glue, and Athena for storage, ETL processing, and forensic analysis.
GenAI Layer: Uses an LLM-based security analyst and an Automated Root Cause Analysis (ARCA) module for interpretation and debugging.
The Isolation Forest model identifies abnormal network behavior without needing labeled attack data and achieves 96.3% accuracy, outperforming both rule-based and supervised models. The LLM-based explanation system translates anomalies into clear, actionable security insights, helping engineers quickly understand threats. The ARCA module further improves operations by automatically identifying root causes of system failures across distributed AWS services with 91.4% accuracy.
Experimental results show:
Low detection latency (~1.8 seconds end-to-end)
High scalability under increased load
Strong root-cause identification performance
Improved usability through natural-language explanations
Conclusion
This paper described a practical, end-to-end system for detecting and explaining network traffic anomalies in real time. By combining Apache Kafka on EC2 for resilient, high-throughput log streaming with an Isolation Forest model for unsupervised detection, a Generative AI analyst for human-readable explanations, and an ARCA module for automated root cause diagnosis, the system delivers on all three pillars of the CPPE framework in a single coherent architecture.
In experimental evaluation the Isolation Forest model achieved 96.3% detection accuracy across four attack categories, with alerts reaching operators in under two seconds at median. The GenAI explanations were rated highly actionable by domain experts, and the ARCA module pinpointed the correct misconfigured component in more than 91% of test scenarios. Together, these results suggest that the combination of unsupervised machine learning and generative AI can meaningfully reduce both the time to detect and the time to understand a network security incident.
Future work will look at extending the Kafka consumer to ingest logs from multi-account AWS organisations, exploring streaming ARCA using Flink rather than batch Lambda invocations, and fine-tuning the GenAI model further on organisation-specific runbooks so that its remediation suggestions align precisely with internal procedures.
References
[1] M. Ahmed, A. N. Mahmood, and J. Hu, \"A survey of network anomaly detection techniques,\" J. Netw. Comput. Appl., vol. 60, pp. 19–31, Jan. 2016.
[2] V. Chandola, A. Banerjee, and V. Kumar, \"Anomaly detection: A survey,\" ACM Comput. Surv., vol. 41, no. 3, pp. 1–58, Jul. 2009.
[3] R. Vinayakumar et al., \"Deep learning approach for intelligent intrusion detection system,\" IEEE Access, vol. 7, pp. 41525–41550, 2019.
[4] S. Roy, A. Gupta, and R. Bhatt, \"Real-time log analytics using Apache Kafka and AWS Lambda,\" in Proc. IEEE CloudCom, 2021, pp. 112–119.
[5] X. Liu, J. Chen, and Y. Zhang, \"LLM-based cybersecurity log interpretation: A survey,\" arXiv preprint arXiv:2402.04321, 2024.
[6] Apache Software Foundation, \"Apache Kafka Documentation,\" [Online]. Available: https://kafka.apache.org/documentation/
[7] Amazon Web Services, \"AWS Lambda Developer Guide,\" [Online]. Available: https://docs.aws.amazon.com/lambda/latest/dg/
[8] Amazon Web Services, \"Amazon Athena User Guide,\" [Online]. Available: https://docs.aws.amazon.com/athena/latest/ug/
[9] F. T. Liu, K. M. Ting, and Z.-H. Zhou, \"Isolation forest,\" in Proc. IEEE ICDM, 2008, pp. 413–422.
[10] I. D. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, \"Toward generating a new intrusion detection dataset and intrusion traffic characterization,\" in Proc. ICISSP, 2018, pp. 108–116.
[11] S. M. Lundberg and S. I. Lee, \"A unified approach to interpreting model predictions,\" in Proc. NeurIPS, 2017, pp. 4765–4774.
[12] Amazon Web Services, \"AWS Glue Developer Guide,\" [Online]. Available: https://docs.aws.amazon.com/glue/latest/dg/