Dependencies among requirements significantly impact the design, development, and testing of evolving software products. Requirements Dependencies Extraction (RDE) is a cognitively complex task due to rich semantics in natural language-based requirements, which impose challenges in automating the extraction and analysis of dependencies. The challenges intensify further when dependency types are considered. RDE is a part of the extensive decision support system to make effective software release planning, development, and testing decisions.
Recently, Machine Learning and Natural Language Processing techniques have successfully automated tasks in Requirements Engineering to a large extent. Despite this success, there are some challenges to the automation of RDE - 1) Due to the nature of the problem, it is cognitively difficult to identify all the dependencies among requirements; hence generating or procuring high-quality annotations for automation through Machine Learning is an arduous task. 2) In the real-world, unlabelled data is abundant and supervised ML techniques need a training set. Lack of data for training is one of the challenges when using ML for RDE. 3) Textual requirements lack structure due to natural language, and feature extraction (transformation of the raw text into suitable internal numerical representations i.e.feature vector) techniques of NLP lead to ML techniques’ success. However, feature extraction method identification and application are cost and effort-intensive. 4) While there is a broad spectrum of Machine Learning techniques to choose from for RDE automation, not all techniques are economically viable in all the scenarios considering data size and effort investment. Hence, there is a need to evaluate the ML techniques beyond just performance measures for effective decision making.
This thesis addresses these challenges and provides solutions. The results described in this thesis are derived from a series of empirical studies on industry and open-source software (OSS) datasets. The main contributions are as follows: