Phishing attacks remain one of the most critical cybersecurity threats, exploiting users through fraudulent websites to obtain sensitive information such as credentials and financial data. Traditional rule-based detection systems often lack adaptability to evolving attack strategies. This study proposes an AI-driven cybersecurity framework for phishing website detection using supervised machine learning models. The UCI Phishing Website Dataset consisting of 1353 instances and 10 security-related attributes was used for experimentation. Three classifiers—Logistic Regression, Support Vector Machine (SVM), and Random Forest—were implemented and comparatively evaluated. Hyperparameter optimization using GridSearchCV with 5-fold cross-validation was performed to enhance predictive performance. The dataset was split into 70% training and 30% testing subsets. Performance evaluation was conducted using accuracy, precision, recall, F1-score, and confusion matrix analysis. Experimental results show that the optimized Random Forest model achieved approximately 91% accuracy, outperforming Logistic Regression and SVM models. Feature importance analysis highlights that attributes such as SFH and SSLfinal_State significantly influence classification outcomes. The findings demonstrate that ensemble-based AI techniques strengthen phishing detection systems and provide scalable, intelligent cybersecurity defense mechanisms.
Introduction
Phishing is a cyberattack method used to steal sensitive user information by creating fake websites that imitate legitimate ones. As internet usage grows, phishing attacks have become more advanced, making them harder to detect using traditional blacklist or rule-based systems. To address this issue, the study proposes an AI-driven cybersecurity framework that uses machine learning techniques to automatically classify websites as legitimate or phishing.
The research uses the UCI Phishing Website Dataset, which contains 1353 instances and 10 features, including attributes such as SFH, SSLfinal_State, URL_Length, web_traffic, age_of_domain, and having_IP_Address. The dataset was divided into 70% training data and 30% testing data for model evaluation.
Three machine learning algorithms were implemented: Logistic Regression, Support Vector Machine (SVM), and Random Forest. Logistic Regression was used as a baseline model, SVM for handling high-dimensional data, and Random Forest as an ensemble model to improve prediction accuracy. To enhance performance, GridSearchCV with 5-fold cross-validation was used to optimize the Random Forest hyperparameters.
Model performance was evaluated using accuracy, precision, recall, F1-score, and confusion matrix. Experimental results showed that the optimized Random Forest model achieved the highest accuracy of about 91%, outperforming SVM (≈88%) and Logistic Regression (≈86%). Feature importance analysis indicated that SFH and SSLfinal_State are the most significant factors for detecting phishing websites.
Overall, the study demonstrates that ensemble-based machine learning models, particularly Random Forest, provide more effective phishing detection than traditional linear models, improving cybersecurity against evolving phishing attacks.
Conclusion
This research presented an AI-driven cybersecurity framework for phishing website detection using optimized machine learning models. Comparative analysis demonstrated that the Random Forest classifier achieved superior accuracy and robustness. Feature importance evaluation improved interpretability and understanding of critical phishing indicators.
The proposed framework offers scalability and adaptability for real-world cybersecurity systems. Future work may involve integration of deep learning techniques, real-time deployment in browser-based environments, and expansion to larger phishing datasets for improved generalization.