Legacy systems, characterized by their heterogeneity and outdated coding practices, present significant security challenges in modern software infrastructure. Recent advancesin Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) offer promising solutions for vulnerability detection, as demonstrated by successful implementations of knowledge-levelretrievalframeworks[1].Thisresearchproposes LegacyGuard, a hybrid framework that integrates state-of-the- art code-specific LLMs with traditional static analysis andRAG-enhanced knowledge retrieval to detect vulnerabilities in multi-lingual legacy codebases. The framework leverages LLM- based semantic analysis for deep code understanding, while incorporating external vulnerability intelligence through RAGto enhance detection accuracy. Through systematic evaluation using precision, recall, and F1-score metrics, this work aims to demonstrate improved vulnerability detection rates and provide actionableinsightsthroughchain-of-thoughtreasoning.Themodulararchitectureensuresextensibilityandadaptabilityforfuture security analysis applications, contributing to both theoretical foundations and practical implementations of AI-driven vulnerability detection in legacy systems.
Introduction
Background:
Legacy systems remain critical yet pose significant security risks due to outdated, heterogeneous codebases with poor documentation and inconsistent coding practices. Traditional static analyzers struggle with these complexities. Recent advances in Large Language Models (LLMs) and Retrieval Augmented Generation (RAG) show promise in improving code vulnerability detection by combining semantic analysis and external knowledge integration.
Problem Statement:
Legacy codebases face three major security challenges:
Inconsistent and outdated coding standards make automated detection difficult.
Limited contextual understanding restricts effective use of external vulnerability data.
Fragmented data sources hamper comprehensive vulnerability analysis.
Research Objectives:
Develop a hybrid framework (“LegacyGuard”) that integrates LLM-based semantic analysis, static analysis, and RAG for enhanced vulnerability detection.
Improve detection accuracy and explainability through multi-modal analysis and chain-of-thought reasoning.
Create an extensible system adaptable to various legacy architectures.
Significance:
This study offers both theoretical innovations (hybrid LLM-RAG-static analysis, explainability improvements) and practical benefits (better legacy system security, reduced false positives, actionable reports).
Literature Review:
Traditional static tools are insufficient for heterogeneous legacy systems. LLMs (e.g., CodeBERT, GPT models) have shown strong vulnerability detection capabilities, but mostly for modern codebases. RAG enhances LLM performance by integrating external data but is underexplored for legacy systems. Challenges remain in multi-language support, explainability, and curated datasets.
Research Methodology:
Data Collection: Diverse legacy codebases (COBOL, C/C++, Java, FORTRAN, Visual Basic) plus vulnerability knowledge bases (NVD, CVE, OWASP).
Sampling: Stratified and vulnerability-focused sampling ensures diverse, balanced, and context-rich datasets.
Data Analysis: Three-phase approach using static analysis, fine-tuned LLMs, and RAG-enhanced integrated analysis combining results via confidence scoring.
Tools: Language-specific static analyzers (CodeSonar, FindBugs, Veracode), LLMs (CodeBERT, CodeT5) fine-tuned on vulnerabilities, and RAG implemented with vector databases (e.g., ChromaDB) including strong security controls.
Conclusion
Legacy systems remain the underpinning infrastructure of mission-critical systems within the majority of organizations, despite increasing problems with maintenance and security. Legacy systems, typically written in multiple programming languages and with non-standard coding practices, are a huge security risk to modern software infrastructures. Large Language Model (LLM) and Retrieval Augmented Generation (RAG) research has provided promising evidence within the fields of code analysis and vulnerability detection [2], thus creating new ways to address these challenges.
References
[1] X.Du,G.Zheng,K.Wang,J.Feng,W.Deng,M.Liu,B.Chen, X. Peng, T. Ma, and Y. Lou, “Vul-rag: Enhancing llm-basedvulnerabilitydetectionviaknowledge-levelrag,”2024.[Online]. Available:https://arxiv.org/abs/2406.11147
[2] J.Jiang,F.Wang,J.Shen,S.Kim,andS.Kim,“Asurveyonlarge language models for code generation,” 2024. [Online]. Available:https://arxiv.org/abs/2406.00515
[3] E.Shereen,D.Ristea,S.Vyas,S.McFadden,M.Dwyer, C. Hicks,andV.Mavroudis,“Sok:Onclosingtheapplicabilitygap in automated vulnerability detection,” 2024. [Online]. Available:https://arxiv.org/abs/2412.11194
[4] X. Zhou, T. Zhang, and D. Lo, “Large language model for vulnerabilitydetection:Emergingresultsandfuturedirections,”inProceedingsof the 2024 ACM/IEEE 44th International Conference on SoftwareEngineering: New Ideas and Emerging Results, ser. ICSE-NIER’24.New York, NY, USA: Association for Computing Machinery, 2024, p.47–51. [Online]. Available: https://doi.org/10.1145/3639476.3639762
[5] B. Zhang, T. H. M. Le, and M. A. Babar, “Mvd: A multi-lingualsoftware vulnerabilitydetection framework,” 2024.[Online]. Available:https://arxiv.org/abs/2412.06166
[6] Polymer, “Solving the security challenges of retrieval-augmentedgeneration(rag),”Online,January2025.[Online].Avail-able:https://www.polymerhq.io/blog/ai/solving-the-security-challenges-of-retrieval-augmented-generation-rag/
[7] J.Brokman,O.Hofman,O.Rachmil,I.Singh,V.Pahuja,R.S.A.Priya, A. Giloni, R. Vainshtein, and H. Kojima, “Insights and current gaps inopen-source llm vulnerability scanners: A comparative analysis,” 2024.[Online]. Available: https://arxiv.org/abs/2410.16527
[8] W.KlieberandL.Flynn,“Evaluatingstaticanalysisalertswithllms,” Carnegie Mellon University, Software Engineering Institute’sInsights (blog), Oct 2024, accessed: 2025-Mar-3. [Online]. Available:https://doi.org/10.58012/dr7w-bs81
[9] T. L. Team, “Rag security: Risks and mitigation strategies,” Online,October 2024. [Online]. Available: https://www.lasso.security/blog/rag-security
[10] Z. A. Khan, A. Garg, Y. Guo, and Q. Tang, “Evaluating pre-trainedmodelsformulti-languagevulnerabilitypatching,”2025.[Online]. Available:https://arxiv.org/abs/2501.07339
[11] C.o.D.KenHuangandV.ofResearchatCSAGCR,“Mitigatingsecurityrisksinretrievalaugmentedgeneration(rag)llmapplications,”Online,November2023.[Online].Available:https://cloudsecurityalliance.org/blog/2023/11/22/mitigating-security-risks-in-retrieval-augmented-generation-rag-llm-applications
[12] J. Bae, S. Kwon, and S. Myeong, “Enhancing software codevulnerability detection using gpt-4o and claude-3.5 sonnet: A study onpromptengineeringtechniques,”Electronics,2024.[Online].Available:https://api.semanticscholar.org/CorpusID:271079058
[13] A.Bahaa,A.E.-R.Kamal,H.Fahmy,andA.S.Ghoneim,“Db-cbil:Adistilbert-basedtransformerhybridmodelusingcnnandbilstmforsoftwarevulnerabilitydetection,”IEEEAccess, vol. 12, pp. 64446–64460, 2024. [Online]. Available:https://api.semanticscholar.org/CorpusID:269559461
[14] F.He,F.Li,andP.Liang,“Enhancingsmartcontractsecurity: Leveraging pre-trained language models for advancedvulnerability detection,” IET Blockchain, 2024. [Online]. Available:https://api.semanticscholar.org/CorpusID:268851002
[15] Dazz,“Aiisnowexploitingknownvulnerabilities-andwhatyoucandoaboutit,”Online,June2024.[Online].Available: https://cloudsecurityalliance.org/blog/2024/06/26/ai-is-now-exploiting-known-vulnerabilities-and-what-you-can-do-about-it
[16] C.Scherb,L.B.Heitz,andH.Grieder,“Divideandconquerbased symbolic vulnerability detection,” 2024. [Online]. Available:https://arxiv.org/abs/2409.13478
[17] J. Groppe, S. Groppe, D. Senf, and R. Mo¨ller, “There are infinite waysto formulate code: How to mitigate the resulting problems for bettersoftware vulnerability detection,” Inf., vol. 15, p. 216, 2024. [Online].Available: https://api.semanticscholar.org/CorpusID:269106349