Phishing remains one of the most prevalent and evolving cybersecurity threats, exploiting humanvulnerabilities through deceptive digital communication. This study proposes a dynamic, Windows-specific phishing detection model leveraging Random Forest machine learning techniques. By integrating Term Frequency–Inverse Document Frequency (TF-IDF) vectorization with structured email features, the model classifies phishing and legitimate emails with high accuracy. Using secondary data and publicly available datasets, the model achieved a classification accuracy of 98.31% and demonstrated balanced performance across precision, recall, and F1-score metrics. This research underscores the effectiveness of hybrid feature strategies and ensemble learning for phishing detection while outlining key limitations and future directions, including model generalization and real-world deployment readiness.
Introduction
The rapid advancement of technology has been accompanied by a sharp increase in cybercrimes, notably phishing attacks, which exploit human psychology to steal sensitive information. Phishing attacks surged by 173% in 2023, posing growing threats to individuals and organizations, especially on widely used Windows systems. Traditional phishing detection methods, mostly reliant on URL analysis, are increasingly ineffective due to attackers’ evolving tactics.
This study focuses on developing a dynamic, machine learning-based phishing detection model specifically for Windows platforms using secondary data. It employs a Random Forest classifier combined with TF-IDF vectorization of email text and structured metadata features. The model aims solely at detection and classification, not mitigation, enabling modular integration with broader cybersecurity frameworks.
A comprehensive literature review highlights recent advances in phishing detection, including hybrid approaches (e.g., SMOTETomek-XGBoost), deep learning models (e.g., BERT-CNN), explainable AI techniques (SHAP and LIME), and adversarial training to improve robustness. The Random Forest algorithm consistently shows strong performance for phishing email detection.
Methodologically, the study uses publicly available datasets, applying extensive preprocessing, feature engineering, and training/testing split. The proposed model achieved a high detection accuracy of approximately 98%, with balanced precision and recall across phishing and legitimate emails, demonstrating robustness and practical viability.
Conclusion
The proposed phishing detection model achieves state-of-the-art performance, boasting 98.31% accuracy, along with highly balanced precision, recall, and F1 scores across legitimate and phishing classifications. These results underscore the effectiveness of TF-IDF text representations combined with ensemble learning techniques, rigorously optimized through extensive training and evaluation protocols. This research validates the feasibility of deploying robust phishing detection systems in real-world cybersecurity applications, ensuring proactive threat mitigation and enhanced digital security. Future refinements will focus on scalability, resilience against adversarial attacks, and forensic tracing of phishing sources to further strengthen its practical implementation.
References
[1] Mustapha, A., & Sinha, A. (2024). Cyberfraud in the Nigerian banking sector: The techniques and preventive measures. International Journal of Innovative Science and Research Technology, 9(8), 171–179. https://doi.org/10.38124/ijisrt/IJISRT24AUG395
[2] Abdelhamid, N., Ayesh, A., &Thabtah, F. (2014). Phishing detection based on associative classification data mining. Expert Systems with Applications, 41(13), 5948–5959.
[3] Aljofey, A., Bello, S. A., Lu, J., & Xu, C. (2025). Comprehensive phishing detection: A multi-channel approach with variants TCN fusion leveraging URL and HTML features. Journal of Network and Computer Applications, 238, 104170. https://doi.org/10.1016/j.jnca.2025.104170
[4] Basnet, R., Sung, A. H., & Liu, Q. (2012). Learning to detect phishing URLs. International Journal of Research in Engineering and Technology (IJRET), 1(2), 1–12.
[5] Burita, L., Matoulek, P., Halouzka, K., & Kozak, P. (2021). Analysis of phishing emails. AIMS Electronics and Electrical Engineering, 5(1), 93–116. https://doi.org/10.3934/electreng.2021006
[6] Goh, Y. T. (2021). Phishing Email Detection Using Machine Learning. Nanyang Technological University, Singapore. https://dr.ntu.edu.sg/handle/10356/148664
[7] Gupta, B. B., Gaurav, A., Arya, V., Attar, R. W., Bansal, S., Alhomoud, A., & Chui, K. T. (2024). An advanced BERT and CNN-based computational model for phishing detection in enterprise systems. Computational Methods for Engineering Science, 141(3), 1–15. https://doi.org/10.58510/cmes.v141n3.2024
[8] Kyaw, P. H., Gutierrez, J., &Ghobakhlou, A. (2024). A systematic review of deep learning techniques for phishing email detection. Electronics, 13(19), 3823. https://doi.org/10.3390/electronics13193823
[9] Li, Z., Yang, J., Wang, J., Shi, L., Feng, J., & Stein, S. (2024). \"LBKT: A LSTM BERT-Based Knowledge Tracing Model for Long-Sequence Data.\" Proceedings of the 20th International Conference on Intelligent Tutoring Systems, pp. 174–184. DOI: 10.1007/978-3-031-63031-6_15
[10] Mohammad, R. M., Thabtah, F., & McCluskey, L. (2015). Predicting phishing websites based on a self-structured neural network. Neural Computing and Applications, 25(2), 443–458.
[11] Odeh, A., Abu Al-Haija, Q., Aref, A., & Abu Taleb, A. (2023). Comparative Study of CatBoost, XGBoost, and LightGBM for Enhanced URL Phishing Detection: A Performance Assessment. Journal of Internet Services and Information Security (JISIS), 13(4), 1-11. DOI: 10.58346/JISIS. 2023.I4.001
[12] Omari, K., &Oukhatar, A. (2025). Advanced phishing website detection with SMOTETomekXGB: Addressing class imbalance for optimal results. Procedia Computer Science, 252, 289–295.
[13] Prajapati, P. et al. (2024). Phishing E-mail Detection Using Machine Learning. Smart Systems: Innovations in Computing. Springer. https://doi.org/10.1007/978-981-97-3690-4_32
[14] Rathee, D., & Mann, S. (2022). Detection of E-Mail Phishing Attacks – using Machine Learning and Deep Learning. International Journal of Computer Applications, 183(47), 1–7. https://doi.org/10.5120/ijca2022918687
[15] Sahingoz, O. K., Buber, E., Demir, O., & Diri, B. (2019). Machine learning based phishing detection from URLs. Expert Systems with Applications, 117, 345–357.
[16] Salloum, S. A., Gaber, T., Vadera, S., & Shaalan, K. (2022). \"A Systematic Literature Review on Phishing Email Detection Using Natural Language Processing Techniques.\" IEEE Access, Volume 10, pp. 65703-65730. DOI: 10.1109/ACCESS.2022.3183083.
[17] Salloum, S., Gaber, T. M. A., Vadera, S., & Shaalan, K. (2021). Phishing email detection using natural language processing techniques: A literature survey. Procedia Computer Science, 189, 19–28. https://doi.org/10.1016/j.procs.2021.05.077
[18] Shafin, S. S. (2024). An explainable feature selection framework for web phishing detection with machine learning. Data Science and Management, 116, 1–15. https://doi.org/10.1016/j.dsm.2024.08.004
[19] Somesha, M., & Pais, A. R. (2022). \"Classification of Phishing Email Using Word Embedding and Machine Learning Techniques.\" Journal of Cyber Security and Mobility, Volume 11, pages 279–320. DOI: 10.1305/JCSM.2022.11.3.279.
[20] Sudar, K. M., Rohan, M., & Vignesh, K. (2024). Detection of Adversarial Phishing Attack Using Machine Learning Techniques. S?dhan?, 49(232). Springer. https://link.springer.com/article/10.1007/s12046-024-02582-0
[21] Vade Secure. (2023, October 17). Q3 2023 phishing and malware report: Phishing and malware threats increase by 173% and 110%. Retrieved from
[22] Verma, R., & Das, A. (2017). What\'s in a URL: Fast Feature Extraction and Malicious URL Detection. In Proceedings of the 2017 European Symposium on Research in Computer Security (ESORICS).