Software requirement specifications (SRS) often contain repetitive, ambiguous, or inconsistent requirements, which drives up costs and delays project timelines. Because they mainly rely on syntactic similarity, traditional redundancy detection methods like TF-IDF and Word2Vec have trouble detecting semantic overlaps. This study proposes a semantic pruning framework that uses advanced NLP techniques, with a focus on transformer-based models like BERT, to find and eliminate superfluous requirements from SRS documents. Precision, recall, F1-score, and runtime were used as evaluation criteria to compare several methods, including CountVectorizer, TF-IDF, Word2Vec, and BERT. The findings show that deep learning models outperform conventional methods, which yield high precision but poor recall. Despite having a longer runtime, BERT outperformed Word2Vec with F1 = 0.87 and recall = 0.77.
The outcomes show the effectiveness of transformer-based embeddings for re-dundancy detection and provide a scalable approach to improve SRS quality while reducing the amount of manual review effort.
Introduction
Software Requirement Specifications (SRS) define both functional and non-functional system requirements, providing a shared understanding among stakeholders. Redundancy in SRS—semantically similar requirements expressed differently—leads to miscommunication, higher development costs, and project delays. Traditional redundancy detection methods, such as manual reviews, TF-IDF, and Word2Vec, focus on surface-level similarity and often fail to capture semantic equivalence in complex statements.
Recent advances in NLP, particularly transformer-based models like BERT and Sentence-BERT, enable contextual understanding, improving semantic redundancy detection. Hybrid approaches combining deep learning with traditional techniques balance accuracy and computational efficiency. Despite progress, gaps remain in large-scale, context-aware, and efficient solutions for real-world SRS analysis.
The paper proposes a semantic pruning framework that uses BERT embeddings to automatically detect and remove redundant requirements, improving SRS quality, maintainability, and the efficiency of software development by reducing manual review effort.
Conclusion
This study demonstrates how semantic pruning can improve the quality of Software Requirement Specifications (SRS). While classic approaches like TF-IDF and Word2Vec produce quick and understandable results, they are confined to surface-level similarity and frequently fail to uncover semantically identical requirements. In contrast, transformer-based models, such as BERT, improve redundancy detection by collecting more contextual information.
With an F1-score of 0.87 and recall of 0.77, the experimental evaluation showed that BERT performed better than both conventional and embedding-based methods. This is a feasible option for real-world SRS analysis, especially in large and complex projects, despite requiring more runtime due to its increased accuracy. Word2Vec strikes a decent mix between efficiency and semantic detection, whereas purely statistical techniques are more appropriate for lightweight or small-scale use cases.
The findings emphasize the importance of including contextual embeddings into requirement engineering to save manual review time, eliminate ambiguity, and increase stakeholder communication. Future study will look into domain adaption of transformer models, large-scale industrial validation, and integration with hybrid pipelines to improve accuracy-runtime trade-offs. This study advances redundancy identification, which helps to develop more dependable, maintainable and efficient software systems.
References
[1] N. Soni, S. Singh and J. Singh, “Semantic Pruning of Requirement Specifications: An NLP Framework for Redundancy Detection,” TechRxiv, Oct. 2025. [Online]. Available: https://www.techrxiv.org/users/971427/articles/1340011-semantic-pruning-of-requirement-specifications-an-nlp-framework-for-redundancy-detection
[2] Patil, Omkar Sanjay. \"Filtering Redundant Requirements in software requirement specifications (SRS) using machine learning (ML).\" (2025).
[3] Anand, Aayush and Pandey, Sushank and Shah, Kirtan and Mohnot, Chetan, Optimizing Redundancy Detection in Software Requirement Specifications Using BERT Embeddings (December 02, 2024).
[4] Necula, C., Popescu, M., & Ionescu, L. (2023). NLP Applications in Software Engineering. IEEE Software.
[5] Jha, S., & Verma, P. (2021). Machine Learning for Redundancy Detection in SRS. Journal of Systems and Software.
[6] Tarawneh, M. M. (2020). Using NLP and SVD for Classifying Software Requirements. TERA-PROMISE Dataset Study.
[7] Choudhary, S. et al. (2019). Automated Requirement Extraction Using NLP Techniques. Journal of Software Engineering.
[8] Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL.
[9] Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient Estimation of Word Representations in Vector Space. arXiv preprint arXiv:1301.3781.
[10] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is All You Need. Advances in Neural Information Processing Systems (NeurIPS).
[11] Jurafsky, D., & Martin, J. H. (2023). Speech and Language Processing (3rd ed.). Draft, Stanford University.
[12] Zhang, Y., & Yang, Q. (2022). A Survey on Multi-task Learning. IEEE Transactions on Knowledge and Data Engineering.
[13] Li, Y., Wang, S., & Liang, P. (2021). Sentence-BERT for Semantic Textual Similarity. arXiv preprint arXiv:2104.09542.
[14] Mavin, A., Wilkinson, P., & Harwood, A. (2020). Big Data Analytics in Requirements Engineering. Requirements Engineering Journal.
[15] Bhatia, A., & Kumar, M. (2019). A Comparative Study of Redundancy Detection in Text Mining. International Journal of Computer Applications.
[16] Kici, D., Baloglu, U., & Aydin, R. (2021). Automatic Detection of Duplicate and Contradictory Requirements Using BERT. Empirical Software Engineering.
[17] Kitchenham, B., & Charters, S. (2007). Guidelines for Performing Systematic Literature Reviews in Software Engineering. Technical Report, EBSE.
[18] Lutz, R. R. (1993). Analyzing Software Requirements Errors in Safety-Critical, Embedded Systems. IEEE Transactions on Software Engineering.
[19] Jaitly, N., Hinton, G.: Vocal Tract Length Perturbation (VTLP) Improves Speech Recognition. In: Proc. ICML Workshop on Deep Learning for Audio, Speech and Language (2013).
[20] Amodei, D., et al.: Deep Speech 2: End-to-End Speech Recognition in English and Mandarin. In: Proc. ICML (2016).
[21] Sharma, R.K., Yadav, S.K.: A Review on Real-Time Speech Translation Systems. In: IEEE International Conference on Computing, Communication and Automation (ICCCA), pp. xx–yy (2021).
[22] Lu, Y., Jiang, Y., Zhang, Z.: Real-Time Multilingual Speech Translation with Deep Learning. IEEE Transactions on Audio, Speech and Language Processing 28, 1123–1135 (2020).
[23] Garg, R., Joshi, P.: Offline Speech Recognition and Translation using Transformer Models. In: IEEE Conference on Computational Intelligence and Communication Networks (CICN), pp. 303–308 (2023).