Phishing has emerged as one of the most common types of cybercrime, impacting millions of internet users every day. Cybercriminals create fake websites, often mimicking trusted services such as banking sites, payment services, and ecommerce sites to capture sensitive information from users. Existing phishing detection technology, such as blacklists and rule-based filtering systems, are inappropriate solutions because they are static and do not address the fact that it is easy to trick users into entering their sensitive information into newly created phishing sites. This paper proposes an elegant solution, called ABHI-SHIELD (Artificial Brain for Harm Identification), which provides real-time intelligent phishing detection via browser connectivity, using machine-learning algorithms to classify a URL as legitimate or as being a phishing site. The ABHI-SHIELD machine-learning model analyzes the URL using various lexical-based, structural-based, and security-based URL features leveraging attributes obtained from SSL certificates and WHOIS data. The data used to train the model leverages the XGBOOST machine-learning algorithm and uses a combination of dataset resources from PhishTank and instantaneous user feedback. The system achieves a detection accuracy of 91.3% and classifies URLs in less than three seconds, making it highly suitable for real-time browser environments. The proposed system not only protects users but also contributes to cybersecurity education by explaining the reasoning behind each detection.
Introduction
Phishing scams have become a leading cause of identity theft and online fraud, with over a million phishing sites detected monthly. Traditional defense mechanisms—such as blacklisting, heuristic filtering, and static rule-based methods—are limited by their reactive nature and slow updates. To address these challenges, ABHI-SHIELD introduces a real-time, AI-driven phishing detection and education system, implemented as a Chrome extension. It uses machine learning (ML) for dynamic URL classification and an AI-based chatbot (“ABHI”) to educate users about phishing awareness.
A review of existing research highlights that early blacklist and heuristic systems were static, while later ML, deep learning, and hybrid models (e.g., Random Forest, LSTM, CNN, and XGBoost ensembles) achieved high accuracy but were computationally intensive and unsuitable for real-time use. Few browser-based solutions support adaptive learning or user feedback. Hence, ABHI-SHIELD aims to fill this gap by combining lightweight, high-accuracy detection with user interaction and education.
The methodology involves five main components:
Data Collection – Aggregates phishing and legitimate URLs from PhishTank, Alexa Top 1M, custom datasets, and user feedback, resulting in a balanced dataset of 40,000 URLs.
Feature Extraction – Builds over 25 lexical, host-based, and content-based features (e.g., URL length, SSL validity, domain age, login form presence).
Model Selection & Training – After evaluating several algorithms, XGBoost was chosen for its interpretability, low overfitting, and strong performance. The model was optimized using GridSearchCV and evaluated using standard metrics (accuracy, precision, recall, F1-score, ROC-AUC).
Backend Integration – A Flask API links the trained model to the Chrome extension, enabling real-time URL evaluation. When a phishing URL is detected, users receive an alert and a confidence score, while the chatbot provides educational feedback.
Conclusion
To summarize, this study shows that machine learning-based, browser-enabled phishing detection is a practical, convenient, and effective solution to address threats as their focus. ABHI-SHIELD is an important connection between the machine learning work in academic literature and its applied use in cybersecurity.
ABHI-SHIELD is composed of a combination of XGBOOST model, real-time prediction with Flask, and user education featuring human-centered artificial intelligence. Together, these elements provide an engaging educational and security package. The system has demonstrated a high level of accuracy, low latency, and low resource utilization (in the context of regular users).
References
[1] Maneriker, P., Kumar, R., & Shah, S. (2021). URLTran: Improving Phishing URL Detection using Transformers. arXiv preprint arXiv:2106.05256. https://arxiv.org/abs/2106.05256
[2] Yerima, S. Y., & Alzaylaee, M. K. (2020). High Accuracy Phishing Detection using CNN. arXiv preprint arXiv:2004.03960. https://arxiv.org/abs/2004.03960
[3] Aslam, S., Ahmed, Z., & Hussain, F. K. (2024). AntiPhishStack: LSTM + XGBoost for Optimized Detection of Phishing URLs. arXiv preprint arXiv:2401.08947. https://arxiv.org/abs/2401.08947
[4] Zia, M. F., & Kalidass, S. H. (2025). Web Phishing Net: Real-Time ML-based Detection. arXiv preprint arXiv:2502.13171. https://arxiv.org/abs/2502.13171
[5] Müller, A., & Guido, S. (2016). Introduction to Machine Learning with Python. O’Reilly Media.
[6] Stallings, W., & Brown, L. (2017). Computer Security: Principles and Practice. Pearson Education.
[7] PhishTank Dataset. (2024). Verified Phishing Data Repository. Available: https://www.phishtank.com
[8] Chrome Developers. (2024). Chrome Extension Developer Documentation. Available: https://developer.chrome.com