Manual identification of Non-Functional Requirements (NFRs) from software documents is often laborious, error-prone, and inconsistent. This paper presents a system that leverages Natural Language Processing (NLP) and Machine Learning (ML) to automate the extraction and classification of NFRs from unstructured documents. The approach supports various formats such as .txt, .pdf, and .docx, and categorizes extracted requirements into standard NFR categories such as performance, security, usability, and maintainability. Our model demonstrates promising accuracy using a labeled data set and provides a user-friendly web interface built with Flask to facilitate input and output visualization. Results confirm the effectiveness of automated NFR detection for enhancing early-stage software requirement engineering.
Introduction
This paper presents a machine-assisted system for automatically extracting and classifying Non-Functional Requirements (NFRs) from software requirement documents using Natural Language Processing (NLP) and Machine Learning (ML), particularly transformer-based models like BERT.
Key Highlights:
NFRs Importance: Define system quality attributes (e.g., performance, security, usability). Often poorly documented and mixed with functional requirements, making them hard to identify manually.
Solution Approach:
Dataset: Annotated dataset from Kaggle with over 500 NFR samples across categories.
Preprocessing: Includes tokenization, stopword removal, POS tagging, and lemmatization.
Feature Extraction: Utilized TF-IDF, Word2Vec, and BERT embeddings.
Models Tested: Logistic Regression, SVM, Random Forest, and fine-tuned BERT (which outperformed others).
System Architecture:
Built using Flask with a web interface for uploading .txt, .csv, .pdf, .docx files.
Backend performs NLP preprocessing and classification.
Outputs categorized NFRs grouped by type in a user-friendly format.
Performance:
Fine-tuned BERT model achieved:
Precision: 87.2%
Recall: 85.6%
F1-score: 86.4%
Effective at detecting implicit NFRs.
UI displays results clearly, improving requirement analysis and early SDLC decision-making.
Conclusion
This project demonstrates the feasibility and effectiveness of automating Non-Functional Requirements (NFR) identification from unconstrained software documents using a combination of Natural Language Processing (NLP) and robust Machine Learning (ML) techniques. By transforming unstructured text into semantically enriched vectors and applying state-of-the-art classifiers like BERT, the system accurately identifies and classifies key NFR categories such as performance, security, usability, and maintainability.
The automation of NFR extraction significantly reduces manual effort, improves consistency, and enables faster turnaround times in early-stage software requirement engineering. The system also provides a user-friendly interface for practitioners, making it accessible even to non-technical stakeholders. The model\'s ability to handle multiple document formats and detect implicit requirements adds to its practical utility.
Beyond improving the quality and efficiency of requirement analysis, this approach lays the groundwork for integrating intelligent NFR detection into modern DevOps pipelines. In future iterations, the system can be enhanced by:
? Expanding the dataset with industry-specific and multilingual NFR samples to improve generalization
? Leveraging more advanced large language models such as GPT-based architectures or fine-tuned domain-specific transformers
? Adding explainability and transparency layers for better traceability of classifications
? Incorporating real-time feedback loops for interactive model improvement
Additionally, deployment on cloud platforms with scalable APIs can enable continuous learning, collaborative annotations, and large-scale enterprise adoption. Ultimately, this project provides a strong foundation for building intelligent software tools that streamline requirements engineering in real-world scenarios.
References
[1] Glinz, M. (2007). On Non-Functional Requirements. 15th IEEE International Requirements Engineering Conference.
[2] Jureta, I. J., Mylopoulos, J., & Faulkner, S. (2008). Revisiting the Core Ontology and Problem in Requirements Engineering. IEEE Transactions on Software Engineering.
[3] Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv preprint arXiv:1810.04805.
[4] Pedregosa, F., Varoquaux, G., Gramfort, A., et al. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
[5] Bird, S., Klein, E., & Loper, E. (2009). Natural Language Processing with Python. O\'Reilly Media.
[6] NLTK Documentation. Available: https://www.nltk.org
[7] Kaggle Dataset on NFRs. Available: https://www.kaggle.com/datasets
[8] Cleland-Huang, J., Gotel, O., Zisman, A., et al. (2014). Software Traceability: Trends and Future Directions. In Proceedings of the Future of Software Engineering.
[9] Mairiza, D., Zowghi, D., & Nurmuliani, N. (2010). A Systematic Literature Review of Software Requirement Prioritization Research. In 2010 Asia Pacific Software Engineering Conference.
[10] [Hussain, F. K., Chang, E., & Dillon, T. S. (2012). Ontology-based recommender systems for requirement engineering. In Proceedings of the 2012 International Conference on Advanced Engineering Computing and Applications in Sciences.
[11] Hiemstra, D., & Kraaij, W. (1998). Twenty-One at TREC-7: Ad-Hoc and Cross-Language Track. In Proceedings of the Seventh Text Retrieval Conference (TREC-7).
[12] Bashir, S., & Qureshi, M. R. J. (2015). Automatic Extraction of Non-Functional Requirements: A Review. In Journal of Computer and System Sciences, Elsevier.