There is a new wave of increasingly sophisticated phishing, necessitating sophisticated detection systems which combine many aspects for accuracy and real time functionality. To improve detection robustness this research proposes a scalable hybrid system for real-time detection of phishing, taking into consideration URL-based attributes, content attributes, plus DOM structure attributes as a three-dimensional approach to URL-based phishing detection. Our approach incorporates explainable machine learning techniques (SHAP) and ensemble models (SVM-Deep Learning) to provide good accuracy as well as support for security analysts and their decisions. The system uses adaptive learning and dynamic feature selection to remain robust against evolving techniques in phishing.
The experimental caried with many datasets, and proposed approach outperforms traditional single-feature approaches while achieving an exceptionally low false positive rate of 0.7%, with 98.3% detection accuracy. This research makes it feasible to connect high-performing Artificial Intelligence with meaningful cybersecurity practicalities, especially regarding a current, relevant, scalable, and real-time response to today\'s phishing attacks, while seamlessly adopting it to web browsers and security gateways.
Introduction
I. Problem Overview
Phishing is one of the most costly and dangerous cyber threats, costing businesses over $4.9 billion annually.
Modern phishing attacks are highly sophisticated, using dynamic URLs, cloaked content, and malicious DOM structures, making them hard to detect using traditional systems such as blacklists and static HTML analysis.
There is a critical need for detection systems that can analyze phishing from multiple angles and adapt to evolving tactics.
II. Proposed Solution: HybridPhishNet
HybridPhishNet is a real-time, hybrid phishing detection framework that combines:
DOM structure analysis
HTML/content inspection
URL-based feature extraction
It uses a hybrid machine learning model (CNN-LSTM + SVM) and integrates explainable AI (XAI) through SHAP to aid analyst understanding and decision-making.
Achieves 98.1% detection accuracy, 0.7% false-positive rate, and <50ms latency.
III. Technical Approach
Data Collection
50,000 total websites (25,000 phishing, 25,000 legitimate).
Sources: APWG, PhishTank, OpenPhish, and Tranco Top 10K.
Used Selenium and Puppeteer for full page rendering and DOM/script extraction.
HybridPhishNet offers a scalable and effective defense against current phishing threats as it combines high detection capability, real-time detection, and actionable interpretability. Not only does HybridPhishNet build on previous research, but its modular hybrid architecture also allows it to be applied to various enterprise contexts easily and simply. Future work will focus on reinforcement learning, growing multilingual datasets, and deploying to edge-devices, even if the current limitations consist of dealing with non-English phishing content and adapting to new attack vectors. In the end, HybridPhishNet is a leap toward a robust, advanced phishing defense.
References
[1] Ramirez-Thompson, Eric. \"The Measurement of Crime.\" Criminology: Foundations and Modern Applications (2023).
[2] SUNDARAM, J. and CISA, I., Analyzing and Adapting Cybersecurity Lessons: Safeguarding Organizations Through Strategic Alignment and Continuous Improvement.
[3] Sahingoz, O.K., Buber, E., Demir, O. and Diri, B., 2019. Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, pp.345-357.
[4] [6]Bhatia, A. and Kumar, A., 2025. AI Explainability and Trust in Cybersecurity Operations. In Deep Learning Innovations for Securing Critical Infrastructures (pp. 57-74). IGI Global Scientific Publishing.
[5] Pourmohamad, R., Wirsz, S., Oest, A., Bao, T., Shoshitaishvili, Y., Wang, R., Doupé, A. and Bazzi, R.A., 2024, July. Deep Dive into Client-Side Anti-Phishing: A Longitudinal Study Bridging Academia and Industry. In Proceedings of the 19th ACM Asia Conference on Computer and Communications Security (pp. 638-653).
[6] Prakash, S., Rama Krishna, K. and Verma, I., 2024. Security Issues with Social Media Data. Indradeep, Security Issues with Social Media Data (July 03, 2024).
[7] Sharma, I. and Sharma, A.K., 2023. Anti-phishing tools: A thorough comparison of features and performance. International Journal for Research in Applied Science and Engineering Technology, 11, pp.478-482.
[8] Abdulraheem, R., Odeh, A., Al Fayoumi, M. and Keshta, I., 2022, January. Efficient Email phishing detection using Machine learning. In 2022 IEEE 12th Annual Computing and Communication Workshop and Conference (CCWC) (pp. 0354-0358). IEEE.
[9] Atlam, H.F. and Oluwatimilehin, O., 2022. Business email compromise phishing detection based on machine learning: A systematic literature review. Electronics, 12(1), p.42.
[10] Li, Q., Cheng, M., Wang, J. and Sun, B., 2020. LSTM based phishing detection for big email data. IEEE transactions on big data, 8(1), pp.278-288.
[11] Bergholz, A., Chang, J.H., Paass, G., Reichartz, F. and Strobel, S., 2008, August. Improved Phishing Detection using Model-Based Features. In CEAS.
[12] Salloum, S., Gaber, T., Vadera, S. and Shaalan, K., 2022. A systematic literature review on phishing email detection using natural language processing techniques. IEEE Access, 10, pp.65703-65727.
[13] Thakur, K., Ali, M.L., Obaidat, M.A. and Kamruzzaman, A., 2023. A systematic review on deep-learning-based phishing email detection. Electronics, 12(21), p.4545.
[14] ?entürk, ?., Yerli, E. and So?ukp?nar, ?., 2017, October. Email phishing detection and prevention by using data mining techniques. In 2017 International Conference on Computer Science and Engineering (UBMK) (pp. 707-712). IEEE.
[15] Moizuddin, M.K., Kabeer, M. and Misbahuddin, M., 2024, October. Cyber-Phishing Analysis offering Cyber Security for Social Networks. In 2024 IEEE International Conference on Blockchain and Distributed Systems Security (ICBDS) (pp. 1-5). IEEE.