An Advanced Meta-Learning Ensemble Framework for Interpretable Phishing Detection via RSTHFS Optimization

Authors: Dr. Rahul M. Dhokane, Mr. Wakchaure Sanchit Sanjay, Miss. Kale Jayshree Sandip, Miss. Dange Shreya Rajesh, Mr. Wakchaure Ganesh Shivaji

DOI Link: https://doi.org/10.22214/ijraset.2026.78706

Certificate: View Certificate

Abstract

Phishing remains a dominant and highly sophis- ticated form of cybercrime, where attackers deploy deceptive websitestotrickusersintorevealingsensitiveinformation, such as passwords and financial credentials [1], [5]. Despite significant advancements in cybersecurity, accurately detecting these malicious domains remains a critical challenge due to the lack of universally accepted identification parameters and the rapid emergence of ”zero-day” phishing sites [1], [6]. This paper introduces an advanced detection framework that integrates Rough Set Theory-based Hybrid Feature Selection (RSTHFS) withanInnovativeMeta-Learning-BasedEnsembleapproach[3], [6]. The proposed methodology utilizes a multi-layer stacking architecturetocapturebothglobalnon-linearandlocalpatterns, leveraging base learners such as Residual Multi-Layer Percep- trons (ResMLP) and XGBoost, which are aggregated by a meta- classifier to enhance predictive stability [1], [3]. To ensure the system is lightweight enough for real-time browser deployment, the RSTHFS method is employed to identify a ”minimal reduct” of features, successfully reducing the computational featurespace by over 60% while maintaining high reliability [5], [6]. Furthermore, the framework incorporates Explainable AI (XAI) through SHAP values to provide granular transparency into the model’s decision-making process [6]. Experimental evaluations on benchmark datasets demonstrate a peak accuracy of 98.4%, providing a scalable, efficient, and interpretable solution for modern web security [3], [5].

Introduction

Phishing is a major cybersecurity threat that tricks users into revealing sensitive information through deceptive websites and messages. Traditional defenses like blacklist-based systems are reactive and ineffective against new (zero-hour) phishing attacks, prompting a shift toward Machine Learning (ML) and Deep Learning (DL) techniques.

Key Challenges in Existing Systems

High computational cost of deep learning models limits real-time use (e.g., browser extensions)
Lack of interpretability (“black-box” models reduce user trust)
Inefficiency in handling evolving phishing patterns

Proposed Solution

The research introduces an advanced phishing detection system combining:

RSTHFS (Rough Set Theory-based Feature Selection): Reduces unnecessary features for faster processing
Meta-Learning Ensemble: Combines multiple models for better accuracy
Explainable AI (XAI – SHAP): Provides clear reasons for detection decisions

This system achieves high accuracy (98.4%), fast detection (<100 ms), and transparent outputs for users.

System Architecture

The system works in three main stages:

Feature Extraction:
Extracts URL-based features (length, symbols, SSL status, domain info)
Feature Optimization (RSTHFS):
Reduces features by ~69%, improving speed and efficiency
Meta-Learning Classification:
- Base models: ResMLP, XGBoost, LightGBM
- Meta-learner: Logistic Regression combines predictions
- SHAP explains why a URL is flagged

Implementation

Frontend: Chrome extension for URL interception
Backend: Python-based ML server
Processing: Real-time URL analysis with caching and API communication

Results

Accuracy: 98.4% (higher than traditional ML and DL models)
Speed: Reduced from 240 ms → 92 ms (61% faster)
Feature Reduction: 32 → 11 features
User Trust: Explainable alerts improved user response by 35%

Conclusion

This research has successfully developed and validated an advanced framework for phishing detection that addresses the critical balance between predictive accuracy and computa- tional efficiency [1], [5]. By building upon the foundational conceptsofhybridmachinelearning[6],wehaveintroduceda multi-layer meta-learning ensemble that leverages the unique strengths of ResMLP, XGBoost, and CatBoost [1], [3]. The integrationofaLogisticRegressionmeta-classifierhasproven effective in reducing the variance inherent in single-model architectures, resulting in a peak detection accuracy of 98.4% [3]. Asignificantcontributionofthisworkistheapplica- tion of Rough Set Theory-based Hybrid Feature Selection (RSTHFS), which successfully identified a minimal feature reduct, reducing the input dimensionality by 69.11% [5]. This optimization was the key enabler for transitioning the model from a high-resource server environment into a lightweight, real-time Chrome extension with sub-100ms latency [5], [6]. Furthermore,theinclusionofExplainableAI(XAI)viaSHAP values has transformed the system from a “black-box” classi- fier into a transparent security tool, providing users with the necessaryjustificationstotrustand actuponsecuritywarnings [6]. In conclusion, this project provides a scalable and proactive defense mechanism against the evolving threat of zero-hour phishing attacks [2], [4]. The results confirm that the synergy of feature optimization and meta-learning not only enhances thesecurityofthewebbrowsingexperiencebutalsosetsanew standard for interpretable and efficient cybersecurity solutions in the browser environment [3], [5], [6].

References

[1] L.R.Kalabarige,R.S.Rao,A.R.Pais,andL.A.Gabralla,”ABoosting-Based Hybrid Feature Selection and Multi-Layer Stacked Ensemble Learning Model to Detect PhishingWebsites,”IEEEAccess,vol.11,pp.71180-71193,2023. [2] U.Zara,K.Ayyub,H.U.Khan,A.Daud,T.Alsahfi,andS.G.Ahmad, ”Phishing Website Detection Using Deep Learning Models,”IEEE Access, vol. 12, pp. 167072-167087, 2024. [3] S. Naseeb, S. Ramzan, A. Raza, M. S. A. Hashmi, Y. Gu, M. Syafrudin,and N. L. Fitriyani, ”Website Phishing Attack Detection Using Innova-tive Meta Learning-Based Ensemble Approach,” IEEE Access, vol. 13, pp.164249-164264,2025. [4] A. Karim, M. Shahroz, K. Mustofa, S. B. Belhaouari, and S. R. K. Joga,”Phishing Detection System Through Hybrid Machine Learning Basedon URL,” IEEE Access, vol. 11, pp. 36805-36822, 2023. [5] J.H.Setu,N.Halder,A.Islam,andM.A.Amin,”RSTHFS:ARoughSetTheory-BasedHybridFeatureSelectionMethodforPhishingWebsite Classification,” IEEE Access, vol. 13, pp. 68820-68840, 2025. [6] R.M.Dhokane,S.S.Wakchaure,J.S.Kale,S.R.Dange,and G. S. Wakchaure, ”Phishing Website Detection Using Hybrid MachineLearningand Feature Optimization Techniques,”SVITNashikResearchPublication, 2024. [7] S. Remya et al., ”An Effective Detection Approach for Phishing URLUsing ResMLP,” IEEE Access, vol. 12, pp. 79367-79380, 2024. [8] S.Asirietal.,”ASurveyofIntelligentDetectionDesignsofHTMLURL Phishing Attacks,” IEEE Access, vol. 11, pp. 6421-6438, 2023. [9] R. Zieni et al., ”Phishing or Not Phishing? A Survey on the Detectionof Phishing Websites,” IEEE Access, vol. 11, pp. 18499-18515, 2023.

Copyright

Copyright © 2026 Dr. Rahul M. Dhokane, Mr. Wakchaure Sanchit Sanjay, Miss. Kale Jayshree Sandip, Miss. Dange Shreya Rajesh, Mr. Wakchaure Ganesh Shivaji. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET78706

Publish Date : 2026-03-24

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here