Automated Machine Learning Approaches for Phishing Detection and Prevention

Authors: Dr. M V D S Krishna Murty, Mirupathi Sathvik Reddy, Rohit Kumar Sharma, Mohammed Nayyar

DOI Link: https://doi.org/10.22214/ijraset.2026.79315

Abstract

The most common type of financial cybercrimes in the highly digitalized economy of India is also identified to be social engineering attacks/phishing. According to the Ministry of Home Affairs, in the financial year 2023-24, cybercriminals have withdrawn ?22,845.73 Crores from citizens through online scams. It is also identified in the context of global cybercrimes that in the year 2023, the number of attacks recorded by the Anti-Phishing Working Group is 4.9 million. This is the highest number of cybercrimes ever recorded in history.The tools used for detection are mostly unimodal and reactive, utilizing \'black box\' systems that are impossible to interpret. This project proposes PhishShield, which is a proactive multi-vector AI framework used for detecting cyber threats in real time, which is developed in the form of a publicly available Web Dashboard as well as a lightweight Chrome Browser Extension. This AI framework utilizes two highly efficient machine learning classifiers, namely, Naive Bayes Classifier with TF-IDF Vectorization for semantic markers in unstructured text data such as SMS and Email, as well as Decision Tree Classifier for structural features in URL data. The accuracy of the URL classifier is enhanced with the integration of Real-Time OSINT Heuristics. The output of the AI framework is analyzed, and the results are compared with the output of the Explainable AI module, which provides a definite two-class output in the form of \'Verified Safe\' and \'Threat Detected\' with Logic Trace. The testing of the AI framework is conducted, which provides promising results in the form of 92.6% accuracy with Naive Bayes NLP Classifier for detecting semantic threats as well as 90.7% accuracy with Decision Tree Classifier and OSINT Heuristics for detecting deceptive URLs.

Introduction

The text discusses the rapid rise of digital adoption in India, along with a sharp increase in cybercrimes and phishing attacks due to growing internet usage, especially through platforms like UPI and Aadhaar. Cybercrime financial losses have surged significantly, and India has become one of the most affected countries for phishing, largely due to mobile-based fraud, SMS scams, and low digital awareness in rural areas.

Globally, phishing attacks are also increasing, with millions of malicious sites and a strong shift toward mobile-based threats. In India, a major challenge is digital illiteracy, making users more vulnerable to social engineering attacks.

To address this, the proposed system PhishShield is introduced as a multi-vector phishing detection framework. It analyzes both URL structure and text content using machine learning techniques like Naive Bayes, TF-IDF, and Decision Trees, along with OSINT tools (e.g., WHOIS and SSL checks). It also includes Explainable AI to make detection results understandable to users.

The literature review shows that rule-based and traditional systems are too slow and ineffective against modern phishing. While deep learning and transformer models are accurate, they are computationally heavy and lack transparency, making them unsuitable for lightweight browser-based tools. Therefore, PhishShield uses optimized classical machine learning with real-time analysis for faster and explainable detection.

Overall, the project aims to create a fast, transparent, and multi-layered phishing detection system to improve cybersecurity in India’s rapidly growing digital environment.

Conclusion

This paper proposed the PhishShield framework for the detection of multi-vector social engineering and financial fraud attacks against internet users by integrating semantic textual analysis and structural URL analysis. The deployed framework addresses some of the loopholes in the current reactive defense mechanisms against such attacks, particularly in the context of the black box models that are becoming more prevalent because of the rise in the volume of zero-day malicious infrastructure. The optimization considerations that were taken into account in order to adhere to the sub-second latency requirement in the deployed framework, which is currently deployed in the context of the Chrome Extension, are the following: all of these considerations relate to the refinement of the Natural Language Processing module that utilizes the Naive Bayes classifier and the real-time implementation of the Decision Tree classifier in the context of the live OSINT heuristics, in addition to the computational efficiency of the system in the context of standard browser environments. The area of focus for the future of this line of research would be to extend the logic for Explainable AI (XAI), as well as the SQLite human-in-the-loop adversarial feedback feature. With these objectives set for the future, Phish-Shield is still a very efficient security system for vulnerable users around the world.

References

[1] H. S. Lamsal and T. Kumar, \"Smishing detection using machine learning algorithms,\" Journal of Cybersecurity and Information Management, vol. 14, no. 1, pp. 8–19, 2025. [2] S. S. Banu and P. M. Kumar, \"Smishing detection using machine learning algorithms,\" Tuijin Jishu/Journal of Propulsion Technology, vol. 45, no. 2, pp. 3890–3896, 2024. [3] R. K. Jha, S. K. Singh, and S. V. N. S. R. Rao, \"Smishing detection using machine learning and deep learning,\" in 2024 2nd International Conference on Disruptive Technologies (ICDT), Greater Noida, India, 2024, pp. 303–307. [4] D. A. Oyeyemi and A. K. Ojo, \"SMS spam detection and classification to combat abuse in telephone networks using natural language processing,\" Journal of Advances in Mathematics and Computer Science, vol. 38, no. 10, pp. 144–156, 2023 [5] P. S. Rayalla, G. Katakam, S. V. Vivek, H. Golla, and M. Zabeeulla A. N., \"Detecting phishing websites using deep learning,\" in 1st International Conference on Recent Innovations in Computer Science and Technology, 2023. [6] S. R. A. Samad, P. Ganesan, J. Rajasekaran, M. Radhakrishnan, and H. Ammaippan, \"SmishGuard: Leveraging machine learning and natural language processing for smishing detection,\" International Journal of Advanced Computer Science and Applications, vol. 14, no. 11, pp. 659–666, 2023. [7] W. L. T. T. N. Kumarasiri, M. K. J. C. Siriwardhana, S. A. D. S. L. Suraweera, A. N. Senarathne, and S. M. B. Harshanath, \"CyberSmish: A proactive approach for smishing detection and prevention using machine learning,\" in 2023 7th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud), Kirtipur, Nepal, 2023, pp. 210–217. [8] M. Sánchez-Paniagua, E. Fidalgo Fernández, E. Alegre, W. Al-Nabki, and V. González-Castro, \"Phishing URL detection: A real-case scenario through login URLs,\" IEEE Access, vol. 10, pp. 42949–42960, 2022. [9] N. Noah, A. Tayachew, S. Ryan, and S. Das, \"PhisherCop: Developing an NLP-based automated tool for phishing detection,\" SSRN Electronic Journal, 2022. [10] T. Wood, V. Basto-Fernandes, E. Boiten, and I. Yevseyeva, \"Systematic literature review: Anti-phishing defences and their application to before-the-click phishing email detection,\" IEEE Access, vol. 10, pp. 1–21, 2022. [11] O. Abayomi-Alli, S. Misra, and A. Abayomi-Alli, \"A deep learning method for automatic SMS spam classification: Performance of learning algorithms on the indigenous dataset,\" Concurrency and Computation: Practice and Experience, vol. 34, no. 17, Art. no. e6989, 2022. [12] O. N. Akande, O. Gbenle, O. C. Abikoye, R. G. Jimoh, H. B. Akande, A. O. Balogun, and A. Fatokun, \"SMSPROTECT: An automatic smishing detection mobile application,\" ICT Express, vol. 9, no. 2, pp. 168–176, 2022. [13] K.-R. Kont, \"Cyber literacy skills of Estonians: Activities and policies for encouraging knowledge-based cyber security attitudes,\" Information & Media, vol. 96, pp. 80–94, 2023. [14] M. K. Mahadi et al., \"A phishing detection approach for empowering cybersecurity with explainable AI,\" in 2024 27th International Conference on Computer and Information Technology (ICCIT), 2024, pp. 1–6. [15] A. Verma and S. K. Sharma, \"A hybrid machine learning approach for phishing detection using URL and content features,\" International Journal of Information Security and Privacy, vol. 18, no. 1, pp. 45–58, 2024.

Copyright

Copyright © 2026 Dr. M V D S Krishna Murty, Mirupathi Sathvik Reddy, Rohit Kumar Sharma, Mohammed Nayyar. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET79315

Publish Date : 2026-04-02

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here