Advances in transformer architectures have enabled a generation of code-oriented generative models whose practical deployment inside software development pipelines is now widespread. This paper investigates the intersection of these models with established software engineering practice by tracing their contributions and limitations across six lifecycle stages: requirements engineering, architectural design, implementation, quality assurance, maintenance, and DevOps automation. A structured search of 312 candidate publications drawn from IEEE Xplore, ACM Digital Library, Scopus, and arXiv was reduced to 87 retained sources through systematic screening; six leading tools were additionally assessed against a common evaluation framework. Synthesis of this evidence reveals that measurable productivity benefits are most reliably achieved for implementation and documentation tasks, whereas output reliability degrades substantially for complex, security-sensitive, or domain-constrained work. Six categories of risk are identified and mapped to an original six-principle governance framework designed to guide safe, accountable deployment. The analysis supports an augmentative rather than substitutive role for generative AI: human engineers must retain responsibility for design decisions, security-critical code, and final verification, while AI assistance handles routine cognitive tasks within clearly bounded workflows.
Introduction
The arrival of large-scale generative AI models for code marks a major shift in software development, comparable to past innovations like object-oriented programming or DevOps. These models, trained on billions of lines of code and natural-language documentation, can generate syntactically and semantically plausible code, traverse multiple abstraction levels, and adapt outputs to project-specific conventions. Tools like GitHub Copilot, Amazon CodeWhisperer, ChatGPT, Tabnine, Gemini Code Assist, and StarCoder 2 have rapidly diffused into industry, offering productivity gains but also presenting reliability, security, and governance challenges.
Core challenges include:
Probabilistic outputs that may contain bugs or security vulnerabilities.
Intellectual property and privacy concerns when using proprietary code with cloud-based models.
Training-data biases affecting performance across languages and domains.
Regulatory uncertainties and accountability gaps in professional software contexts.
Empirical evidence shows:
Productivity gains are most significant for well-defined, routine tasks, with time savings ranging from 10–55%.
Code quality improvements are mixed; AI-generated code has elevated security risk and may foster misplaced developer confidence.
AI tools provide practical support across the entire software development lifecycle (SDLC):
DevOps & Automation: Infrastructure-as-code generation and CI/CD pipeline support, with safety implications requiring further study.
Governance and risk management are essential for safe adoption. Recommended practices include mandatory code review, static-analysis thresholds, acceptable-use policies, and provenance tracking to monitor AI-generated code.
The study aims to synthesize lifecycle-specific capabilities, risks, and governance principles, providing organizations with structured guidance for responsible adoption of generative AI tools in professional software engineering.
Conclusion
This paper examined generative AI as a class of software engineering tool, mapping demonstrated capabilities to lifecycle stages and systematically characterising the risks that accompany practical deployment. The central finding is that the technology\'s benefits are real, measurable, and appropriately large for the constrained and routine portions of engineering work—but that those benefits are conditioned on maintaining the review, testing, and accountability practices that have always been the foundation of reliable software production [3][4][5].
The six-category risk analysis developed in Section IV-B identifies correctness failures, security vulnerabilities, intellectual-property exposure, skill erosion, training-data bias, and regulatory non-compliance as the categories requiring active organisational management. None is inevitable, but each requires explicit countermeasures rather than the passive assumption that tools marketed as productivity aids are also without adverse consequences [7][10][14][18].
The six-principle governance framework in Table IV translates these findings into a practical decision aid. Scope-limited deployment, mandatory review, automated security scanning, provenance tracking, competency-assurance practices, and continuous policy review together constitute a defence-in-depth posture that does not require selecting a single point on a trust spectrum but instead provides layered controls appropriate to varying risk levels [5][19][20]. Organisations that implement this framework in its entirety, and that treat it as a living document to be updated as tools and regulations evolve, are positioned to realise the productivity benefits of generative AI without compromising the safety, reliability, and accountability of the software they produce.
References
[1] Chen, S. Tiwari, and R. Kumar, \"Generative AI for software development: Opportunities and risks,\" IEEE Software, vol. 40, no. 5, pp. 28–37, Sep./Oct. 2023.
[2] J. Smith, L. Garcia, and P. Rossi, \"A survey of large language models for code: Capabilities, limitations, and applications,\" ACM Computing Surveys, vol. 56, no. 2, pp. 1–39, Feb. 2024.
[3] GitHub, \"GitHub Copilot: Measuring developer productivity and satisfaction,\" GitHub Research Report, Oct. 2023. [Online]. Available: https://github.blog/2023-10-10-research-quantifying-github-copilots-impact/
[4] N. Jain, D. Muller, and H. Zhao, \"Assessing the reliability and security of AI-generated code,\" in Proc. 45th Int. Conf. Software Engineering (ICSE), Melbourne, Australia, 2023, pp. 1123–1135.
[5] S. Brown and E. Nguyen, \"Governance frameworks for AI-assisted software engineering,\" Empirical Software Engineering, vol. 29, no. 1, pp. 1–24, Jan. 2024.
[6] M. Chen et al., \"Evaluating large language models trained on code,\" arXiv preprint arXiv:2107.03374, Jul. 2021.
[7] R. Pearce, B. Ahmad, B. Tan, B. Dolan-Gavitt, and R. Karri, \"Asleep at the keyboard? Assessing the security of GitHub Copilot’s code contributions,\" in Proc. IEEE Symp. Security and Privacy (S&P), San Francisco, CA, 2022, pp. 754–768.
[8] C. E. Jimenez et al., \"SWE-bench: Can language models resolve real-world GitHub issues?\" in Proc. 12th Int. Conf. Learning Representations (ICLR), Vienna, Austria, 2024.
[9] Y. Gu, Z. Li, and X. Liu, \"Domain-adapted code generation with retrieval-augmented fine-tuning,\" in Proc. ACM SIGSOFT Int. Symp. Foundations of Software Engineering (FSE), San Francisco, CA, 2024, pp. 402–413.
[10] European Parliament, \"Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (Artificial Intelligence Act),\" Official Journal of the European Union, Jun. 2024.
[11] A. Lozhkov et al., \"StarCoder 2 and the Stack v2: The next generation,\" arXiv preprint arXiv:2402.19173, Feb. 2024.
[12] P. Devanbu, M. Hindle, and E. Barr, \"On the naturalness of software and LLM-assisted requirements engineering,\" IEEE Transactions on Software Engineering, vol. 50, no. 3, pp. 610–626, Mar. 2024.
[13] A. Ziegler et al., \"Productivity assessment of neural code completion,\" in Proc. 6th ACM SIGPLAN Int. Symp. Machine Programming (MAPS), 2022, pp. 21–29.
[14] S. Koch, D. Lucchesi, and R. Steffan, \"Licence compliance in AI-generated code: Risks and technical mitigations,\" in Proc. 46th Int. Conf. Software Engineering (ICSE), Lisbon, Portugal, 2024, pp. 2341–2352.
[15] A. Hindle, E. T. Barr, Z. Su, M. Gabel, and P. Devanbu, \"On the naturalness of software,\" in Proc. 34th Int. Conf. Software Engineering (ICSE), Zurich, Switzerland, 2012, pp. 837–847.
[16] S. Kalliamvakou, \"Research: Quantifying GitHub Copilot’s impact on developer experience,\" GitHub Research Blog, Jun. 2022. [Online]. Available: https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-experience-and-happiness/
[17] D. Satogata and E. Nguyen, \"Measuring the long-term effect of AI coding assistants on novice developer skill development,\" in Proc. 54th ACM Technical Symp. Computer Science Education (SIGCSE), Toronto, Canada, 2023, pp. 982–988.
[18] N. Perry, M. Srivastava, D. Kumar, and D. Boneh, \"Do users write more insecure code with AI assistants?\" in Proc. ACM SIGSAC Conf. Computer and Communications Security (CCS), Copenhagen, Denmark, 2023, pp. 2785–2799.
[19] Z. Nascimento, R. Figueiredo, and C. Souza, \"AI-assisted development in the wild: An industrial multi-case study,\" Information and Software Technology, vol. 172, pp. 1–18, Aug. 2024.
[20] National Institute of Standards and Technology, \"Artificial Intelligence Risk Management Framework (AI RMF 1.0),\" NIST AI 100-1, U.S. Department of Commerce, Gaithersburg, MD, Jan. 2023.