A Survey on Phishing Email Detection Techniques: Using LSTM and Deep Learning

Authors: Annie Jaison J S, Halima Sadiya, Himashree S, M Jomi Maria Sijo, Dr. Anitha T G

DOI Link: https://doi.org/10.22214/ijraset.2025.73836

Abstract

Phishing attacks, often delivered through deceptive emails, remain one of the most dangerous cyber threats, aiming to steal sensitive information such as passwords and financial data. Traditional detection methods like blacklists and rule-based filters struggle to keep up with evolving tactics. This paper surveys recent deep learning approaches to phishing email detection, focusing on models such as Long Short-Term Memory (LSTM), Convolutional Neural Network(CNN), Bidirectional Long Short-Term Memory (BiLSTM), Transformer encoders, and hybrid architectures. These methods analyze email components like subject lines, content, metadata, and user behavior. The study also reviews commonly used datasets, feature extraction techniques, and evaluation metrics, providing insights into current trends, strengths, and challenges in developing effective phishing detection systems.

Introduction

Phishing is a major cybersecurity threat that tricks users into giving away sensitive information via deceptive emails and websites. Traditional detection methods like blacklists and rule-based systems are increasingly ineffective against evolving tactics. As phishing attacks become more sophisticated, particularly with the use of AI-generated content, there’s a growing need for intelligent, adaptive detection systems.

Key Focus of the Study:

The paper explores the use of Long Short-Term Memory (LSTM) networks for detecting phishing emails by analyzing email content. LSTM, a type of deep learning model suitable for sequential data like text, is shown to outperform traditional machine learning methods (e.g., Naive Bayes) by capturing complex linguistic patterns.

Main Contributions:

Proposes LSTM-based email classification to detect phishing attempts.
Compares LSTM with other models such as CNN-LSTM hybrids, BiLSTM, Transformers, and traditional ML algorithms.
Evaluates practical deployment, real-world examples, and challenges (like privacy, adversarial attacks, and mobile deployment).
Highlights the advantages of deep learning in adaptability, accuracy, and handling language-based features.

Detailed Insights from Literature Survey:

LSTM-Based Models:
- Capture sequential patterns in email content.
- Achieve high accuracy (~96–97%) on datasets like Enron, PhishTank.
- Struggle with very short or ambiguous messages.
Hybrid Models (e.g., CNN-LSTM):
- Combine local feature extraction and sequence modeling.
- Good for real-time applications but may increase model complexity.
Multimodal Detection:
- Integrates text and visual analysis (like email screenshots).
- More robust but computationally expensive.
Edge-Based Lightweight Models:
- Adapted for mobile/low-resource environments.
- Offer privacy but face limitations in training and data availability.
Transformer vs. LSTM:
- Transformers excel with large, complex data.
- LSTM and BiLSTM are better in low-resource settings.
Metadata + Content Models:
- Using headers, sender history, domain reputation alongside body text boosts detection performance.
Behavioral Detection Models:
- Analyze user activity patterns (e.g., browsing habits) using LSTMs.
- Effective but raise privacy concerns.
Attention Mechanisms in LSTMs:
- Improve focus on phishing-relevant parts of emails.
- Enhance interpretability and performance but increase complexity.
Comparison with Traditional ML:
- LSTM models outperform Naive Bayes, Random Forests, etc., especially on variable-length text.
Robustness to Adversarial Attacks:
- LSTM models trained on adversarial examples can resist phishing attempts with manipulated content.

Methodology Summary:

Literature Selection: Peer-reviewed studies using deep learning for phishing detection.
Data Sources: IEEE, Springer, ScienceDirect, Google Scholar, arXiv.
Model Categories: LSTM-only, hybrid models, transformer-based, multimodal, behavioral, edge-optimized.
Evaluation: Accuracy, Precision, Recall, F1-score, and real-world deployment viability.

Conclusion

Phishing continues to be one of the most widespread and dangerous cyber threats, often exploiting human trust to steal sensitive information. This survey explored how Deep Learning, especially models like LSTM, BiLSTM, and Attention mechanisms, is transforming phishing detection by enabling systems to learn patterns, adapt to new tactics, and improve accuracy. As a future direction, we propose a hybrid phishing detection method that combines TF-IDF-based subject analysis with BiLSTM and Attention for the email body. While this model is yet to be implemented, it draws on strengths observed across existing approaches and presents a strong potential for detecting phishing attempts more effectively by capturing both shallow textual features and deeper contextual patterns. Looking ahead, integrating advanced phishing detection into real-time email systems can greatly improve accuracy and resilience. As phishing strategies continue to evolve in complexity, the need for adaptive, intelligent models becomes increasingly critical. This survey serves as a foundational guide for researchers and practitioners by consolidating existing Deep Learning-based approaches, highlighting their capabilities and limitations, and pointing towards promising directions. Future efforts may focus on scalable deployment, adversarial robustness, and privacy-preserving mechanisms to develop more effective, context-aware, and trustworthy email security solutions.

References

[1] S. Baskota, \"Phishing URL Detection using Bi-LSTM,\" arXiv preprint arXiv:2504.21049, 2025. [2] R. Achary, S. N. Bugath, G. Chakrapani, and M. Venkatesh, \"Enhanced Phishing Detection Using LSTM, CNN, and SVM Techniques,\" in ICTCS 2024, Springer, 2025, pp. 185–204. [3] H. Malik, P. Awasthi, and R. Sharma, \"Deep Learning for Cybersecurity: Threat Detection and Prevention in Complex Networks,\" Journal of Network and Computer Applications, vol. 229, 103134, 2024. [4] T. Nguyen, H. Nguyen, and Q. Nguyen, \"Email Phishing Detection Using BERT and Transfer Learning,\" IEEE Access, vol. 10, pp. 105421–105432, 2022. [5] Y. Zhu and J. Lin, \"A Survey on Generative AI for Cyber Threats,\" Computers & Security, vol. 123, 102942, 2023. [6] O. Christou, N. Pallis, and G. Pallis, \"Phishing URL Detection Through Top-level Domain Analysis,\" arXiv preprint arXiv:2005.06599, 2022. [7] M. Sravanth, K. N. Rao, A. Gupta, and R. Raj, \"Adversarial Learning for Secure AI Systems,\" ACM Transactions on Privacy and Security, vol. 26, no. 1, 2023. [8] K. Zhou, L. Wang, and Z. Zhang, \"Advanced Malware and Phishing Detection Using CNN-LSTM Hybrid Models,\" Computers & Security, vol. 126, 102972, 2023. [9] P. Singh, R. Dey, and S. Bose, \"Federated Learning for Email Threat Detection with Privacy Preservation,\" IEEE Transactions on Dependable and Secure Computing, 2024. [10] L. Chan, A. Y. Lee, and M. T. Ho, \"Deploying Lightweight Deep Models for Real-Time Phishing Detection,\" Expert Systems with Applications, vol. 213, 119008, 2024.

Copyright

Copyright © 2025 Annie Jaison J S, Halima Sadiya, Himashree S, M Jomi Maria Sijo, Dr. Anitha T G. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET73836

Publish Date : 2025-08-25

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here