PhishSight AI: Explainable NLP-Based Phishing and Scam Detection with Visual Attention Assist

Authors: S. Jayapradha, A. Afiga Begum, S. Shahika, A. Noorul Hasna , R. Sowmiya

DOI Link: https://doi.org/10.22214/ijraset.2026.79410

Abstract

Online phishing has quietly become one of the hardest problems in everyday cybersecurity—not because it is technically complex, but because it exploits the one component no patch can fix: human attention. PhishSight AI is designed to assist users where it actually matters, right inside the browser at the exact second a user is about to make a mistake. The system runs as a Chrome extension and checks emails, SMS content, and live web pages against a classifier that combines a Random Forest model, TF-IDF text features, and a fine-tuned BERT network. Every prediction comes with an explanation—LIME and SHAP identify the specific words and URL parts that most affected the output, highlighted directly on screen. An optional gaze-tracking layer using WebGazer.js watches whether users actually look at the sender address and URL bar before clicking; if they do not, a soft prompt reminds them to check. On a held-out test set, the system reached 95.2% accuracy with end-to-end response time under two seconds, demonstrating that real-time detection and explanation can be achieved without noticeable latency.

Introduction

It explains that phishing attacks have become increasingly sophisticated, often using AI-generated messages that are difficult to distinguish from legitimate ones. While modern machine learning models achieve high detection accuracy, users often do not trust or understand “black-box” warnings, which limits their effectiveness. PhishSight AI addresses this by combining phishing detection with explainable AI and user interaction tools.

The system is built as a multi-layer architecture:

A browser extension interface for users
A backend (FastAPI) handling detection and processing
Machine learning models for email/SMS and URL analysis
A data layer for feedback and retraining

It uses a hybrid detection approach:

Email/SMS classification using a combination of TF-IDF + Random Forest and BERT
URL/webpage analysis using 24 lexical features and Random Forest
Explainability layer using SHAP and LIME to highlight why a message is flagged
Gaze-tracking module (WebGazer) to encourage users to visually verify URLs before clicking

The system also includes a feedback loop where user responses (“safe” or “scam”) are stored for future model improvement.

The literature review shows that while existing research achieves high accuracy, most systems fail to combine multi-channel detection, explainability, real-time interaction, and user behavior guidance in one platform.

Conclusion

PhishSight AI demonstrates that high-accuracy phishing detection and practical usability can be achieved together within a browser extension that adds no meaningful friction to the user\'s workflow. The hybrid Random Forest–BERT classifier, reinforced by lexical URL analysis, reaches 95% accuracy across email, SMS, and web-page threat vectors. The LIME and SHAP explainability layer turns each decision into a readable explanation, and the gaze-tracking module addresses the behavioral gap that purely algorithmic approaches leave open. The modular architecture is designed to accommodate the extensions outlined above without requiring a redesign of the core system.

References

[1] P. R. G. Hernandes, \"Phishing Detection Using URL-based XAI Techniques,\" in Proc. IEEE IDEAL, 2021. [2] B. V. Pavani, D. Mahitha, and B. U. Maheswari, \"Enhancing Online Safety: Phishing URL Detection Using Machine Learning and Explainable AI,\" 2024. [3] R. Alam, \"E2Phish: Explainable Ensemble Machine Learning Model for Enhanced Phishing URL Detection,\" IEEE Trans. Dependable and Secure Computing, 2024. [4] M.K.P. Madushanka, W.M.K.S. Ilmini, and Yakandawala, \"Explainable AI for Transparent Phishing Email Detection,\" in Proc. IEEE Int. Conf., 2024. [5] Y. Gao, B. Ampel, and S. Samtani, \"Examining the Robustness of Machine Learning-Based Phishing Website Detection: Action-Masked Reinforcement Learning for Automated Red Teaming,\" IEEE Trans. Information Forensics and Security, 2025. [6] R. Marchal et al., \"Explainable Phishing Website Detection for Secure and Trustworthy Web Browsing,\" Scientific Reports, vol. 15, 2025. [7] N. Alotaibi and S. R. Alotaibi, “Explainable Artificial Intelligence in Web Phishing Detection,” Alexandria Eng. J., 2025. [8] A. Papadopoulos et al., “Hybrid Phishing Detection Model: Integrating BERT with TF IDF Features,” Int. J. Recent Adv. Cyber Security, vol. 4, no. 2, 2025. [9] A. Papoutsaki, P. Sangkloy, J. Laskey, J. B. Huang, and J. Hays, “WebGazer: Scalable Webcam Eye Tracking Using User Interactions,” in Proc. Int. Joint Conf. Artificial Intelligence (IJCAI), 2016. [10] M. Songailait?, “BERT-Based Models for Phishing Detection,” in Proc. Int. Workshop on Machine Learning for Cyber Security, 2023. [11] J. Doe, “EXPLICATE: Enhancing Phishing Detection Through Explainable AI,” in Proc. Int. Conf. Cybersecurity and Trustworthy Systems, 2025. [12] A. Fatih and D. Hamonangan, “Evaluating LIME Based Explainability for Phishing URL Detection,” in Proc. Int. Conf. Information Security and Privacy, 2024. [13] M. Hosseinzadeh et al., “Improving Phishing Email Detection Performance Through Context?Aware Deep Learning,” Scientific Reports, vol. 15, 2025. [14] S. Lundberg and S.-I. Lee, “A Unified Approach to Interpreting Model Predictions,” in Advances in Neural Information Processing Systems (NeurIPS), 2017. [15] M. T. Ribeiro, S. Singh, and C. Guestrin, “Why Should I Trust You? Explaining the Predictions of Any Classifier,” in Proc. ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, 2016.

Copyright

Copyright © 2026 S. Jayapradha, A. Afiga Begum, S. Shahika, A. Noorul Hasna , R. Sowmiya. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET79410

Publish Date : 2026-04-04

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here