Phishing attacks pose a significant cybersecurity threat, prompting the development of Phish Net, a client-side security solution that utilizes machine learning for real-time detection of phishing websites. Integrated as a Google Chrome extension, PhishNet employs Random Forest, SVM(support vector machine) and XGBoost algorithms to analyze website attributes, achieving 99% accuracy in distinguishing legitimate from fraudulent pages. The tool incorporates Support Vector Machine, Boost, and a Stacking Classifier to improve detection capabilities, ensuring a dynamic response to evolving web spoofing tactics. Unlike traditional blacklist-based methods, PhishNet continuously adapts to evolving phishing tactics, minimizing the risk of identity theft. Its lightweight architecture ensures seamless operation, while an intuitive interface provides real-time alerts, making it a robust defense against increasingly sophisticated phishing threats
Introduction
Introduction
Phishing remains a serious cybersecurity threat, with attackers using deceptive tactics like fake websites, phishing emails, and session hijacking to steal user credentials. A notable attack in 2022 targeted INRIA through a fake login page.
PhishNet is proposed as a real-time, client-side phishing detection system, implemented as a Chrome extension. It uses machine learning algorithms (Random Forest, XGBoost) to classify websites with 98.5% accuracy. Unlike traditional blacklist approaches, it is adaptive and lightweight, offering real-time alerts without impacting system performance.
2. Objectives of PhishNet
Detect and block phishing URLs using Random Forest.
Prevent identity theft during web browsing.
Provide a seamless user interface via browser integration.
Overcome server-side limitations through client-side protection.
3. Problem Statement
Current anti-phishing tools are often slow and ineffective. Phishers exploit visual and structural similarities in websites, making detection harder. There is a need for smarter, adaptive solutions like PhishNet and PhishCatcher, which combine machine learning, content analysis, and behavioral detection.
PhishNet is self-learning, requiring minimal updates, making it resilient against evolving phishing tactics.
4. Related Work
PhishNet draws on existing research:
SpoofCatch: Used for visual similarity detection.
2FA (Schneier): Adds a security layer against credential theft.
Logistic Regression (Garera et al.): Detects phishing based on URL structure.
CANTINA: Applies TF-IDF for content-based phishing detection.
Session Protection: Uses Secure and HttpOnly cookies (Bugliesi et al.).
Trusted UI Elements (Herzberg): Help users identify legitimate sites.
Multi-layered Security: Combines SSL/TLS, heuristics, and visual analysis.
BogusBiter (Yue and Wang): Uses decoy credentials to detect malicious behavior.
Session Fixation Defense: Prevents attacks where session IDs are pre-assigned by attackers.
These combined techniques make PhishNet robust against both traditional and modern phishing attacks.
5. Existing Systems
Current phishing techniques include:
Fake jackpot claims.
QR code and mobile-based spoofing.
Cloned websites that mimic legitimate ones.
Existing defenses are often error-prone and lack adaptability, highlighting the need for smarter, real-time tools.
6. Proposed System: PhishCatcher
PhishCatcher enhances detection using:
Random Forest + XGBoost for URL classification.
PHISHTANK dataset to train models on real phishing URLs.
Client-side browser extension for real-time alerts.
Ensemble Learning: Combines Random Forest, Extra Trees, and XGBoost in a Stacking Classifier for higher accuracy.
Non-functional Requirements:
Security: Defend user data.
Scalability & Reliability: Handle many users.
Usability: Easy to use with fast response times.
7. Methodology
Machine Learning Models: Includes Random Forest, SVM, XGBoost, and Stacking Classifiers.
Web Interface: Built with Flask and SQLite.
Training & Testing: Uses data preprocessing, feature extraction, and ensemble models for classification.
User Interaction: Real-time predictions classify websites as phishing or legitimate.
Activity Flow: Involves data import, model training, user sign-up/sign-in, input classification, and phishing alert generation.
8. Implementation
Datasets:
Collected from social media (e.g., Twitter, Facebook) focused on suspicious activity like cyberbullying and phishing.
Data is labeled manually and prepared through preprocessing for model training.
Feature Extraction & Model Training:
Uses Naïve Bayes, Logistic Regression, Random Forest, AdaBoost, and Voting/Stacking Classifiers.
Optimization techniques include Fuzzy Genetic Algorithms and Particle Swarm Optimization (PSO).
Key Algorithms Used:
Random Forest: Builds multiple trees to manage complex patterns.
SVM: Separates data classes using optimal hyperplanes.
XGBoost: Efficient gradient boosting for enhanced predictive accuracy.
PhishCatcher achieved up to 99% accuracy using ensemble methods and optimization techniques. The system provides real-time phishing detection, improving both reliability and user safety.
Conclusion
The project achieved the development and integration of Phish-net, a client-side defense tool featuring Random Forest (RF) and additional extensions: Support Vector Classifier (SVC), XGBoost, and a stacking classifier. The stacking classifier notably outperformed other models. This robust tool efficiently detects and blocks malicious URLs, enhancing user protection against phishing threats without the need for modifications to targeted websites. Through meticulous feature extraction, Phish-net incorporates a diverse set of URL characteristics, including address bar attributes, domain-based features, and HTML/JavaScript properties. This comprehensive approach enhances the model\'s ability to discern between phishing and legitimate URLs, contributing to its accuracy and reliability. The integration of Phish-net into a Flask-based front-end, coupled with user authentication using SQLite, ensures a seamless and secure user experience. The user-friendly interface
facilitates input processing, leveraging the trained models for predictions, and ultimately displaying the final outcome in a clear and accessible manner. The project has gone beyond the conventional approach by exploring alternative machine learning models to enhance predictive accuracy. This effort ensures that Phish-net remains robust and adaptable to evolving phishing threats, contributing to a more resilient defense mechanism. Phish-net not only focuses on efficient machine learning algorithms but also addresses user-centric concerns by minimizing reliance on website modifications. This client-side emphasis, coupled with the incorporation of diverse features, signifies a holistic approach to online security. The project stands as a significant step towards providing users with a comprehensive defense against the evolving landscape of web-based phishing threats.
References
[1] W. Khan, A. Ahmad, A. Qamar, M. Kamran, and M. Altaf, ‘‘SpoofCatch: A client-side protection tool against phishing attacks,’’ IT Prof., vol. 23, no. 2, pp. 65–74, Mar. 2021.
[2] B. Schneier, ‘‘Two-factor authentication: Too little, too late,’’ Commun. ACM, vol. 48, no. 4, p. 136, Apr. 2005.
[3] S. Garera, N. Provos, M. Chew, and A. D. Rubin, ‘‘A framework for detection and measurement of phishing attacks,’’ in Proc. ACM Workshop Recurring malcode, Nov. 2007, pp. 1–8.
[4] R. Oppliger and S. Gajek, ‘‘Effective protection against phishing and web spoofing,’’ in Proc. IFIP Int. Conf. Commun. Multimedia Secur. Cham, Switzerland: Springer, 2005, pp. 32–41.
[5] T. Pietraszek and C. V. Berghe, ‘‘Defending against injection attacks through context-sensitive string evaluation,’’ in Proc. Int. Workshop Recent Adv. Intrusion Detection. Cham, Switzerland: Springer, 2005, pp. 124–145.
[6] M. Johns, B. Braun, M. Schrank, and J. Posegga, ‘‘Reliable protection against session fixation attacks,’’ in Proc. ACM Symp. Appl. Comput., 2011, pp. 1531–1537.
[7] M. Bugliesi, S. Calzavara, R. Focardi, and W. Khan, ‘‘Automatic and robust client-side protection for cookie-based sessions,’’ in Proc. Int. Symp. Eng. Secure Softw. Syst. Cham, Switzerland: Springer, 2014, pp. 161–178.
[8] A. Herzberg and A. Gbara, ‘‘Protecting (even na?ve) web users from spoofing and phishing attacks,’’ Cryptol. ePrint Arch., Dept. Comput. Sci. Eng., Univ. Connecticut, Storrs, CT, USA, Tech. Rep. 2004/155, 2004.
[9] N. Chou, R. Ledesma, Y. Teraguchi, and J. Mitchell, ‘‘Client-side defense against web-based identity theft,’’ in Proc. NDSS, 2004, 1–16.
[10] B. Hämmerli and R. Sommer, Detection of Intrusions and Malware, and Vulnerability Assessment: 4th International Conference, DIMVA 2007 Lucerne, Switzerland, July 12-13, 2007 Proceedings, vol. 4579. Cham, Switzerland: Springer, 2007.
[11] C. Yue and H. Wang, ‘‘BogusBiter: A transparent protection against phishing attacks,’’ ACM Trans. Internet Technol., vol. 10, no. 2, pp. 1–31, May 2010.
[12] W. Chu, B. B. Zhu, F. Xue, X. Guan, and Z. Cai, ‘‘Protect sensitive sites from phishing attacks using features extractable from inaccessible phishing URLs,’’ in Proc. IEEE Int. Conf. Commun. (ICC), Jun. 2013, pp. 1990–1994.
[13] Y. Zhang, J. I. Hong, and L. F. Cranor, ‘‘Cantina: A content-based approach to detecting phishing web sites,’’ in Proc. 16th Int. Conf. World Wide Web, May 2007, pp. 639–648.
[14] D. Miyamoto, H. Hazeyama, and Y. Kadobayashi, ‘‘An evaluation of machine learning-based methods for detection of phishing sites,’’ in Proc. Int. Conf. Neural Inf. Process. Cham, Switzerland: Springer, 2008, pp. 539–546.
[15] E. Medvet, E. Kirda, and C. Kruegel, ‘‘Visual-similarity-based phishing detection,’’ in Proc. 4th Int. Conf. Secur. privacy Commun. Netowrks, Sep. 2008, pp. 1–6.
[16] W. Zhang, H. Lu, B. Xu, and H. Yang, ‘‘Web phishing detection based on page spatial layout similarity,’’ Informatica, vol. 37, no. 3, pp. 1–14, 2013.
[17] J. Ni, Y. Cai, G. Tang, and Y. Xie, ‘‘Collaborative filtering recommendation algorithm based on TF-IDF and user characteristics,’’ Appl. Sci., vol. 11, no. 20, p. 9554, Oct. 2021.
[18] W. Liu, X. Deng, G. Huang, and A. Y. Fu, ‘‘An antiphishing strategy based on visual similarity assessment,’’ IEEE Internet Comput., vol. 10, no. 2, pp. 58–65, Mar. 2006.
[19] A. Rusu and V. Govindaraju, ‘‘Visual CAPTCHA with handwritten image analysis,’’ in Proc. Int. Workshop Human Interact. Proofs. Berlin, Germany: Springer, 2005, pp. 42–52.