Email is one of the most widely used communication tools, yet it remains a primary vector for cyberattacks such as spam, phishing, and malicious links. Traditional spam filters often fail when organizations operate in isolation, while cross-organization data sharing raises privacy concerns. To address this, the CyberDART framework introduces a federated, privacy-preserving email threat detection system that integrates rule-based filters like Spam Assassin, phishing link and sender verification, and machine learning/NLP methods such as k-Nearest Neighbors (k-NN), hashing, Jaccard similarity, and the Lucene NLP pipeline, orchestrated through the PATCH algorithm for anonymized clustering and similarity analysis. Experiments on the Enron and TREC datasets reported nearly 58% improvement in spam detection accuracy over standalone systems while keeping false positives low. However, CyberDART has several drawbacks and limitations: it is restricted mainly to spam and phishing detection, lacking support for malware attachments and advanced spear-phishing; it faces a privacy–accuracy trade-off due to heavy anonymization; its performance depends strongly on the dataset used; and scalability may suffer under large-scale, real-time traffic. To address these gaps, the system can be enhanced with deep NLP models (e.g., BERT/transformers) for semantic phishing detection, static and dynamic malware analysis for attachment inspection, federated learning to share model updates instead of signatures, and cryptographic techniques such as homomorphic encryption or secure multi-party computation to strengthen privacy. These improvements will transform CyberDART from a spam-centric filter into a comprehensive, privacy-preserving email security framework capable of mitigating spam, phishing, and malware attacks with higher accuracy and broader coverage.
Introduction
Email remains a major target for cyber threats such as spam, phishing, business email compromise, and malware. Traditional standalone email security systems are limited in detecting large-scale and coordinated attacks. To overcome this, the CyberDART framework was proposed as a federated, privacy-preserving solution that allows organizations to collaboratively share anonymized threat intelligence without exposing raw email data. While CyberDART improves detection compared to isolated systems, it faces limitations such as restricted threat coverage, reliance on federation participation, privacy–accuracy trade-offs, and limited real-world evaluation.
This project analyzes CyberDART and proposes targeted enhancements, including improved feature representation, stronger clustering and similarity analysis, and the integration of modern machine learning techniques. The enhanced system expands detection beyond basic spam and phishing while maintaining privacy-preserving principles. Experimental results show a 5–8% improvement in detection accuracy and a reduction in false positives, demonstrating improved robustness and suitability for enterprise environments.
The literature survey highlights extensive research in email threat detection using traditional machine learning, deep learning (LSTM, CNN, BERT-based models), and federated learning. While deep learning achieves high accuracy, challenges remain in scalability, privacy, computational cost, and real-world generalization. Federated learning addresses privacy concerns but suffers from communication overhead and reduced accuracy compared to centralized models. The survey identifies a research gap in scalable, privacy-preserving, and comprehensive detection frameworks capable of handling evolving threats.
CyberDART’s methodology is built on a two-tier federated architecture with Local Nodes and a Central Node, supported by privacy-preserving data sharing, PATCH-based anonymization, rule-based filtering, feature extraction, similarity analysis, machine learning classification, federation-level correlation, and secure communication. Additional components such as log-based processing, visualization, continuous learning, and modular scalable design ensure adaptability, efficiency, and long-term viability. Overall, the enhanced CyberDART framework demonstrates that collaborative, privacy-aware, and intelligent approaches can significantly strengthen email threat detection in modern organizations.
Conclusion
This work studied the CyberDART framework, which introduces a federated and privacy-preserving approach for mitigating email-based threats through collaborative intelligence sharing among organizations. By combining techniques such as PATCH-based anonymization, hash-based similarity matching, and rule-based spam filtering within a two-tier architecture, CyberDART effectively overcomes the limitations of standalone email security systems. The original study demonstrates that federated analysis can improve spam detection effectiveness by up to approximately 58% compared to isolated deployments, highlighting the advantages of cross-organizational collaboration. However, the CyberDART framework primarily focuses on spam and phishing detection and does not explicitly address malware attachments delivered through email. To overcome this limitation, this work proposes an enhancement that integrates malware detection techniques into the existing CyberDART Application Core while preserving its privacy-aware and federated design. Based on simulated and literature-driven analysis using established malware detection approaches such as static feature extraction and machine learning models, the enhanced system is expected to achieve malware detection accuracy in the range of 90% to 94%, with an acceptable false positive rate. Overall, the proposed enhancement extends CyberDART into a more comprehensive email threat mitigation framework capable of addressing both social engineering and malware-based attacks. The results indicate that CyberDART provides a strong foundation for collaborative email security, and with the inclusion of malware detection, it becomes more suitable for real-world enterprise environments. This study confirms that federated, privacy-preserving techniques can significantly improve email threat detection while maintaining scalability and data confidentiality, and it opens avenues for future work involving advanced deep learning models and real-time malware analysis.
References
[1] Yuwei Sun, NgChong, Hideya Ochiai(2022, “Federated Phish Bowl: LSTM-Based Decentralized Phishing Email Detection,” IEEE Access, vol. 9, pp. 112193–112203.
[2] Mohammad hassan, Mark A. Gregory and Shuoli, “Multi-Domain Federation Utilizing Software Defined Networking—A Review,” IEEE Access, vol. 11, pp. 19202–19227, 2023.
[3] Amr I. Elkhawas, Thomas M. Chen, Ilir Gashi, “Privacy-Preserving Federated Learning for Phishing Detection,” IEEE Access, vol. 13, pp. 14261–14272, 2025.
[4] Edafe Maxwell Damatie, Amna Eleyan, and Tarek Bejaoui, “Real-Time Email Phishing Detection Using a Custom DistilBERT Model,” Computers (MDPI), vol. 13, no. 5, p. 115, 2024.
[5] Divya Jennifer Dsouza, Anisha P. Rodrigues, and Roshan Fernandes, “Multi-Modal Comparative Analysis on Execution of Phishing Detection Using AI,” International Journal of Advanced Computer Science and Applications (IJACSA), vol. 15, no. 2, pp. 67–74, 2024
[6] Tandin Wangchuk and Tad Gonsalves, “Multimodal Phishing Detection on Social Networking Sites: A Systematic Review,” Future Internet (MDPI), vol. 17, no. 2, p. 21,2025.
[7] Ume Zara, Kashif Ayyub, Hikmat Ullah Khan, Ali Daud, Tariq Alsahfi, and Saima Gulzar Ahmad, “Phishing Website Detection Using Deep Learning Models,” Mathematics (MDPI), vol. 12, no. 9, p. 1450, 2024.
[8] Jannatul Ferdous, Rafiqul Islam, Arash Mahboubi, and Md. Zahidul Islam, “A Review of State-of-the-Art Malware Attack Trends and Defense Mechanisms,” IEEE Access, vol. 11, pp. 121118-121141, 2023
[9] Tarini Saka, Kami Vaniea, and Nadin Kokciyan, “SoK: Grouping Spam and Phishing Email Threats for Smarter Security,” IEEE Access, vol. 13, pp. 54938–54953, 2025.
[10] Vitor Jesus, Balraj Bains, and Victor Chang, “Sharing Is Caring: Hurdles and Prospects of Open, Crowd-Sourced Cyber Threat Intelligence,” IEEE Transactions on Engineering Management, vol. 71, no. 6, pp. 6854–6868, 2024
[11] Ishwarya R., Siva Sharma Karthick, Muthumani S., and Suriya S., “Separation of Phishing Emails Using Probabilistic Classifiers,” in Proc. 2023 9th Int. Conf. Advanced Computing and Communication Systems (ICACCS), 2023, pp. 1676–1682
[12] Nader bouacida and Prasant Mohapatra, “Vulnerabilities in Federated Learning,” IEEE Access, vol. 9, pp. 63229–63245, 2021.
[13] Raza M. Abdulla, Hiwa A. Faraj, Choman O. Abdullah, Askandar H. Amin, and Tarik A. Rashid, “Analysis of Social Engineering Awareness Among Students and Lecturers,” IEEE Access, vol. 11, pp. 101098–101110, 2023.
[14] Malhar S. Jere; Tyler Farnan; Farinaz Koushanfar [ It’s A multi authored work 2021, “A Taxonomy of Attacks on Federated Learning,” IEEE Transactions on Big Data, vol. 8, no. 6, pp. 1550–1564, 2022.
[15] Yi Wei; Masaya Nakayama; Yuji Sekiya (2025), “Enhancing Generalization in Phishing URL Detection via a Fine-Tuned BERT-Based Multimodal Approach,” IEEE Access, vol.11, pp. 101200–101215, 2025.
[16] Mazal Bethany, Athanasios Galiopoulos, Emet Bethany, Mohammad Bahrami Karkevandi,Nicole Beebe, Nishant Vishwamitra, Peyman Najafirad(2025), “Lateral Phishing with Large Language Models: A Large Organization Comparative Study,” in Proc. 2023 Int. Conf. Cybersecurity and AI, pp. 88–95.
[17] Shahid Alam, Amina Jameel, Zahida Parveen and Ehab Alnfrawy (2025) “SHRED: An Ensemble-Based Machine Learning Model to Sift Email Messages for Real-Time Spam Detection,” IEEE Trans. Inf. Forensics Security, vol. 18, pp. 1450–1462, 2022.
[18] Sebastien kanj bonhard, Pau garcia villalta, Oriol rosés, and Josep pegueroles, “A Review of Tactics, Techniques, and Procedures (TTPs) of MITRE Framework for Business Email Compromise (BEC) Attacks,” IEEE Access, vol. 11, pp. 100980–100995, 2023.
[19] Yong fang, Cheng zhang, Cheng huang, liang liu, and Yue yang, “Phishing Email Detection Using Improved RCNN Model with Multilevel Vectors and Attention Mechanism,” in Proc.2023 Int. Conf. Machine Learning and Cybersecurity, pp. 112–120.
[20] Muhammad Khalid Mehmood, Humaira Arshad, Moatsum Alawida, and Abid Mehmood, “Enhancing Smishing Detection: A Deep Learning Approach for Improved Accuracy and Reduced False Positives,” IEEE Access, vol. 12, pp. 108345–108356, 2024.