The growing sophistication in cyber threats, particularly malice in URLs, and Android malware necessitates more progressive features in the detection mechanisms for protecting user and data security. This review paper is an integrated analysis of two key domains: malicious URL detection and Android malware detection, keeping feature extraction techniques and machine learning models used in both the domains under study. Compared to other works on malicious URL detection based on URL-based features, focusing on lexical, host-based, and content-based analysis, our work mainly draws upon API-based feature extraction through API sequences and clustering related to Android malware detection. In addition, more comprehensive datasets result in improvement with Support Vector Machine algorithms combined with heuristic methods used to handle problem identification and analysis, which increase the efficiency by 85%+. This paper proposes a unified framework that highlights shared challenges and techniques for improving the detection across platforms. We identify gaps in literature and some future directions for advancing cross-platform systems. Keywords: Malicious URL Detection, Android Malware, Feature Extraction, SVM, Heuristic Methods, Cybersecurity.
Introduction
With the increasing reliance on digital and mobile platforms, cybersecurity threats—particularly malicious URLs and Android malware—have grown more sophisticated and widespread. Traditional signature-based detection methods struggle to keep up with evolving attacks, especially zero-day threats, due to their static nature and inability to adapt in real time.
Machine learning (ML), especially Support Vector Machines (SVM), offers promising solutions for detecting such threats by classifying malicious activities using various features extracted from URLs and apps. However, traditional SVM models are limited by their need for retraining and inability to efficiently handle streaming data. Incremental SVM, which updates models continuously without full retraining, alongside heuristic rules, can improve detection accuracy and adaptability.
The study uniquely integrates malicious URL and Android malware detection into a unified framework, bridging a gap in existing research that often treats these threats separately. It reviews prior methods, feature extraction techniques, datasets, and machine learning models, emphasizing the effectiveness and challenges of SVM-based approaches.
Key findings highlight the importance of robust feature extraction (lexical, host-based, and behavioral) and the benefits of SVM’s interpretability and efficiency. Yet, challenges persist around real-time detection, feature limitations, cross-platform generalization, adversarial evasion, and dataset imbalance.
To address these, the proposed system is a scalable, real-time detection platform with user and admin modules. It uses SVM classifiers, efficient feature extraction, and offers voice-based interaction and an analytics dashboard. The design emphasizes security, performance, adaptability, and accessibility, including error recovery and multimodal feedback.
Overall, the work advocates for hybrid, adaptive, and continuous learning-based cybersecurity systems capable of meeting the dynamic and complex threat landscape of digital environments.
Conclusion
The detection of malicious URLs and applications remains a critical challenge in online security, especially with the growing sophistication of cyber threats. This survey reviewed key studies across malicious URL detection, machine learning methods, and SVM-based classification techniques, mapping the current state of research in this area. By highlighting the strengths of SVM classifiers, feature extraction methods, and performance benchmarks, we have outlined a framework for developing robust, scalable, and adaptive malicious URL and application detection systems. Our review identified significant gaps, including the need for real-time detection, improved feature extraction, cross-platform generalization, adversarial robustness, and addressing dataset imbalance issues. Closing these gaps is essential for enhancing detection systems\' reliability and ensuring they can effectively combat evolving malicious threats in real-world settings. In conclusion, this survey highlights the importance of combining machine learning, real-time data processing, and strong backend architectures in the development of effective malicious URL and application detection systems. The proposed system design offers a concrete solution to the identified gaps, helping to pave the way for more adaptive, scalable, and secure online environments. We hope this work serves as a valuable resource for researchers and practitioners working on next-generation detection systems, advancing the fight against cyber threats in the digital landscape.
References
[1] Dhanalakshmi Ranganayakulu, Chellappan C., Detecting Malicious URLs in E-mail – An Implementation, AASRI Procedia, Vol. 4, 2013, Pages 125-131, ISSN 2212 6716, https://doi.org/10.1016/j.aasri.2013.10.020.
[2] Yu, Fuqiang, Malicious URL Detection Algorithm based on BM Pattern Matching, International Journal of Security and Its Applications, 9, 3344, 10.14257/ijsia.2015.9.9.04.
[3] K. Nirmal, B. Janet and R. Kumar, Phishing - the threat that still exists, 2015 International Conference on Computing and Communications Technologies (ICCCT), Chennai, 2015, pp. 139-143, doi: 10.1109/ICCCT2.2015.7292734.
[4] F. Vanhoenshoven, G. N´apoles, R. Falcon, K. Vanhoof and M. K¨oppen, Detecting malicious URLs using machine learning techniques, 2016 IEEE Symposium Series on Computational Intelligence (SSCI), Athens, 2016, pp. 1-8, doi: 10.1109/SSCI.2016.7850079.
[5] https://www.kaggle.com/xwolf12/ malicious-and benign-websites accessed on 27.01.2021
[6] https://openphish.com/ accessed on 27.01.2021
[7] Doyen Sahoo, Chenghao lua, Steven C. H. Hoi, Malicious URL Detection using Machine Learning: A Survey, arXiv:1701.07179v3 [cs.LG], 21 Aug 2019
[8] Rakesh Verma, Avisha Das, What’s in a URL: Fast Feature Extraction and Malicious URL Detection, ACM ISBN 978-1-4503- 4909-3/17/03
[9] https://github.com/ShantanuMaheshwari/Malicious Website Detection
[10] Frank Vanhoenshoven, Gonzalo Napoles, Rafael Falcon, Koen Vanhoof and Mario Koppen, Detecting Malicious URLs using Machine Learning Techniques, 978 1-5090-4240-1/16 2016, IEEE
[11] S. Marchal, J. Francois, R. State, and T. Engel, \"PhishStorm: Detecting Phishing with Streaming Analytics,\" IEEE Transactions on Network and Service Management, vol. 11, no. 4, pp. 458–471, Dec. 2014. doi: 10.1109/TNSM.2014.2377295.
[12] A. Le, A. Markopoulou, and M. Faloutsos, \"PhishDef: URL Names Say It All,\" in Proceedings of the 2011 IEEE International Conference on Computer Communications (INFOCOM), Shanghai, China, Apr. 2011, pp. 191–195. doi: 10.1109/INFCOM.2011.5935252.
[13] Y. Zhang, J. Hong, and L. Cranor, \"CANTINA: A Content-Based Approach to Detecting Phishing Web Sites,\" in Proceedings of the 16th International Conference on World Wide Web (WWW), Banff, Canada, May 2007, pp. 639–648. doi: 10.1145/1242572.1242660.
[14] M. Aburrous, M. Hossain, K. Dahal, and F. Thabtah, \"Predicting Phishing Websites Using Classification Mining Techniques with Experimental Case Studies,\" in Proceedings of the 2009 Seventh International Conference on Information Technology: New Generations (ITNG), Las Vegas, NV, USA, Apr. 2009, pp. 176–181. doi: 10.1109/ITNG.2009.317.
[15] J. Ma, L. K. Saul, S. Savage, and G. M. Voelker, \"Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs,\" in Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Paris, France, Jun. 2009, pp. 1245 1254. doi: 10.1145/1557019.1557153.
[16] M. Sharif, J. M. S. Islam, M. A. H. Akhand, and M. A. Rahman, \"A Machine Learning Approach to Detect Malicious Websites Using URL Features,\" in Proceedings of the 2018 International Conference on Computer, Communication, Chemical, Material and Electronic Engineering (IC4ME2), Rajshahi, Bangladesh, Feb. 2018, pp. 1–4. doi: 10.1109/IC4ME2.2018.8465456.
[17] A. Almomani, B. B. Gupta, S. Atawneh, A. Mehmood, and K. J. Knapp, \"A Survey of Phishing Email Filtering Techniques,\" IEEE Communications Surveys & Tutorials, vol. 15, no. 4, pp. 2070–2090, Fourth Quarter 2013. doi: 10.1109/SURV.2013.030713.00020.
[18] S. Garera, N. Provos, M. Chew, and A. D. Rubin, \"A Framework for Detection and Measurement of Phishing Attacks,\" in Proceedings of the 2007 ACM Workshop on Recurring Malcode (WORM), Alexandria, VA, USA, Nov. 2007, pp. 1–8. doi: 10.1145/1314389.1314391.
[19] H. Choi, B. B. Zhu, and H. Lee, \"Detecting Malicious Web Links and Identifying Their Attack Types,\" WebApps \'11: Proceedings of the 2nd USENIX Conference on Web Application Development, Jun. 2011, pp. 11–11.
[20] https://www.usenix.org/conference/webapps11/detecting malicious- web-links-and-identifying-their-attack-types. SpringerLink
[21] S. Abu-Nimeh, D. Nappa, X. Wang, and S. Nair, \"A Comparison of Machine Learning Techniques for Phishing Detection,\" in Proceedings of the Anti-Phishing Working Groups 2nd Annual eCrime Researchers Summit (eCrime), Pittsburgh, PA, USA, Oct. 2007, pp. 60–69. doi: 10.1145/1299015.1299021.