Phishing Website Detection Using Artificial Intelligence

Authors: Nidhi Malik, Lakshay Sharma, Priyansh Sharma, Manav Jindal, Bharat

DOI Link: https://doi.org/10.22214/ijraset.2025.75389

Abstract

Phishing has turned out to be one of the most common and harmful cybercrimes, in which attackers deceive users into divulging important credentials, such as passwords, banking details, personal information, by spoofing legitimate websites. Traditional defence mechanisms, such as blacklisting-based and rule-based approaches, are reactive in nature and more often than not remain unable to detect newly created or rapidly evolving phishing sites, since attackers are easily modify URLs or elements within a page, which may escape static filters. In this regard, this paper proposes an effective real-time phishing detection system powered by Artificial Intelligence, which makes use of Machine Learning and Deep Learning models to dynamically analyze and classify websites based on multiple features. The proposed system will extract and process several indicators from a website, including URL structure, lexical and host-based features, webpage content, HTML and JavaScript behaviour, SSL certificate data, domain registration information, to identify whether a site is legitimate or malicious. The architecture consists of three main components: • Feature Extraction Layer: Real-time data is gathered from URLs and web pages • AI Classification Layer: This layer utilizes the pre-trained ML/DL models, such as Random Forest, Support Vector Machine, or Deep Neutral Networks, for website classification. • Decision Layer: Provides immediate feedback or alerts to the users for protection proactively. It leverages public phishing datasets like Phish Tank, UCI Machine Learning Repository, and Alexa Top Sites for training and testing. It applies data pre-processing techniques, feature selection methods, and model optimization to enhance accuracy and minimize false alarms. The performance of the system against real-time attacks can be corroborated based on the following metrics: accuracy, precision, recall, F-1 score, ROC-AUC, and detection latency. These experimental results demonstrate that the AI-based system significantly outperforms traditional blacklist and heuristic methods in terms of detection accuracy with high adaptability against zero-day phishing attacks. The findings have important implications for the use of AI as an active, effective, and real-time defence mechanism in contemporary cyber security frameworks.

Introduction

Phishing remains a major cyber threat, using fraudulent emails or websites to steal sensitive information like login credentials or financial data. Attackers often impersonate legitimate organizations to trick victims, leading to identity theft, financial fraud, or unauthorized access. Phishing accounts for over 36% of global data breaches, and unlike traditional malware, it exploits psychological manipulation rather than malicious code, making detection difficult. Conventional security tools like firewalls, antivirus, and URL blacklists are limited, especially against new or rapidly changing phishing tactics.

AI for Phishing Detection
Artificial Intelligence, particularly Machine Learning (ML) and Deep Learning (DL), has become a promising solution for real-time phishing detection. ML algorithms (e.g., Random Forest, SVM, Decision Trees, Gradient Boosting) classify websites based on features like URL structure, domain age, content, SSL certificates, and network behavior. DL models (CNNs, RNNs) can capture deeper semantic and temporal patterns for more robust detection. AI-powered systems can integrate with browsers, email gateways, and security platforms to provide proactive, real-time alerts, reducing reliance on static blacklists.

Literature Review

ML Approaches: Traditional ML models classify phishing URLs using lexical, host, and content-based features. Hybrid models combining rules with ML (e.g., Decision Trees, Random Forest, SVM) reduce false positives and improve detection accuracy.
DL Approaches: DL overcomes the limitations of manual feature extraction, learning hierarchical representations from raw data. CNNs analyze webpage structure, while RNNs (LSTM) capture sequential dependencies in URLs and content. NLP-enhanced models further improve detection by analyzing textual context.
Hybrid/Ensemble Methods: Combining multiple AI techniques (ML ensembles or hybrid DL architectures) increases robustness and scalability, effectively detecting zero-day attacks. Overall, research has shifted from static, rule-based systems to adaptive, AI-driven solutions capable of real-time and context-aware phishing detection.

Methodology
The proposed detection system involves four stages:

Data Collection: Gathering labeled phishing and legitimate URLs from sources like PhishTank, OpenPhish, Alexa Top Sites, Kaggle datasets, and internal enterprise logs.
Feature Extraction: Deriving URL, content, and behavioral features.
Model Training: Applying ML/DL algorithms to learn patterns distinguishing phishing from legitimate websites.
Real-time Detection: Integrating the model into browsers, email gateways, or security platforms for immediate alerting.
The methodology emphasizes reproducibility, scalability, and privacy-aware practices.

Conclusion

It remains among the most persistent and adaptive threats, largely exploiting human trust and the dynamic nature of the internet. Most traditional defense mechanisms, such as blacklists, heuristic filters, and rule-based systems, have been demonstrated to be inefficient for modern phishing attacks that rapidly change through URL manipulation, obfuscation, and social engineering. The paper proposed an Artificial Intelligence-driven real-time phishing detection framework by integrating Machine Learning and Deep Learning techniques for dynamic, adaptive, and proactive protection against these emerging cyber threats. The proposed system provides a broad pipeline ranging from feature extraction at multiple dimensions-lexical, host-based, content-based, and behavioral-to classification with sophisticated ML/DL models like Random Forest, SVM, and Deep Neural Networks. The system architecture, comprising a Feature Extraction Layer, an AI Classification Layer, and a Decision Layer, is efficient for low-latency detection and real-time decision-making. Experimental analysis, based on benchmark datasets such as PhishTank and Alexa Top Sites, shows that AI-based models significantly outperform traditional static approaches in terms of accuracy, recall, and robustness to zero-day phishing attacks. This work further enhances generalization and resilience to evolving attack patterns by incorporating adversarial hardening, temporal validation, and ensemble learning. Although not without its limitations, such as the computational cost and potential privacy concerns with content analysis, this study primarily points to AI as a game-changing tool in cybersecurity. The proposed framework, continuously retrained, with explainable AI components and ethical safeguards, can form the basis for the development of scalable, real-time phishing detection systems that can be deployed on browsers, email gateways, and enterprise networks. In all, the integration of Machine Learning and Deep Learning in phishing detection marks a paradigm shift from reactive to proactive cybersecurity. Providing intelligent, self-learning, and context-aware defense mechanisms, AI thus enables organizations and users to stay ahead of phishing threats, thereby strengthening the general resilience of the digital ecosystem.

References

[1] Almomani, A., Gupta, B. B., Atawneh, S., Meulenberg, A., & Al–Khateeb, A. (2013). Phishing detection and prevention techniques. International Journal of Computer Networks and Security, 2(8), 68–83. [2] Mohammad, R. M., Thabtah, F., & McCluskey, L. (2014). Intelligent rule-based phishing websites classification. IET Information Security, 8(3), 153–160. https://doi.org/10.1049/iet-ifs.2013.0202 [3] Jain, A. K., & Gupta, B. B. (2018). Phishing detection: Analysis of visual similarity-based approaches. Security and Privacy, 1(1), e9. https://doi.org/10.1002/spy2.9 [4] Basit, A., Zafar, M., Liu, X., Javed, A. R., Jalil, Z., & Kifayat, K. (2021). A comprehensive survey of AI-enabled phishing attacks detection techniques. Computers & Security, 106, 102310. https://doi.org/10.1016/j.cose.2021.102310 [5] Marchal, S., Saari, K., Singh, N., & Asokan, N. (2017). Know your phish: Novel techniques for detecting phishing sites and their targets. In 2016 IEEE 36th International Conference on Distributed Computing Systems (ICDCS) (pp. 323–333). IEEE. [6] Adebowale, M. A., Lwin, K. T., Sánchez, E., & Hossain, M. S. (2020). Intelligent phishing detection scheme using deep learning algorithms. Journal of Network and Computer Applications, 157, 102537. https://doi.org/10.1016/j.jnca.2020.102537 [7] Rao, R. S., & Pais, A. R. (2019). Detection of phishing websites using an efficient feature-based machine learning framework. Neural Computing and Applications, 31, 3851–3873. https://doi.org/10.1007/s00521-017-3305-0 [8] Abdelhamid, N., Ayesh, A., & Thabtah, F. (2017). Phishing detection based associative classification data mining. Expert Systems with Applications, 41(13), 5948–5959. [9] Patil, S., & Patil, D. R. (2022). Phishing website detection using deep learning and natural language processing: A review. International Journal of Information Security Science, 11(1), 20–35. [10] Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). Machine learning-based phishing detection from URLs. Expert Systems with Applications, 117, 345–357. https://doi.org/10.1016/j.eswa.2018.09.029 [11] Zhang, Y., Hong, J. I., & Cranor, L. F. (2007). CANTINA: A content-based approach to detecting phishing web sites. In Proceedings of the 16th International Conference on World Wide Web (WWW) (pp. 639–648). ACM. [12] Verma, R., & Das, A. (2017). What\'s in a URL: Fast feature extraction and malicious URL detection. In Proceedings of the 7th ACM Conference on Data and Application Security and Privacy (CODASPY) (pp. 111–122). ACM. [13] UCI Machine Learning Repository. (2024). Phishing Websites Dataset. Retrieved from https://archive.ics.uci.edu/ml/datasets/Phishing+Websites [14] PhishTank. (2024). Phishing Website Database. Retrieved from https://www.phishtank.com [15] Alexa Internet, Inc. (2024). Top Sites List for Global. Retrieved from https://www.alexa.com/topsites

Copyright

Copyright © 2025 Nidhi Malik, Lakshay Sharma, Priyansh Sharma, Manav Jindal, Bharat . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET75389

Publish Date : 2025-11-12

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here