Classification of Non-Functional Requirements From IoT Oriented Healthcare Requirement Document

Internet of Things (IoT) involves a set of devices that aids in achieving a smart environment. Healthcare systems, which are IoT-oriented, provide monitoring services of patients' data and help take immediate steps in an emergency. Currently, machine learning-based techniques are adopted to ensure security and other non-functional requirements in smart health care systems. However, no attention is given to classifying the non-functional requirements from requirement documents. The manual process of classifying the non-functional requirements from documents is erroneous and laborious. Missing non-functional requirements in the Requirement Engineering (RE) phase results in IoT oriented healthcare system with compromised security and performance. In this research, an experiment is performed where non-functional requirements are classified from the IoT-oriented healthcare system's requirement document. The machine learning algorithms considered for classification are Logistic Regression (LR), Support Vector Machine (SVM), Multinomial Naive Bayes (MNB), K-Nearest Neighbors (KNN), ensemble, Random Forest (RF), and hybrid KNN rule-based machine learning (ML) algorithms. The results show that our novel hybrid KNN rule-based machine learning algorithm outperforms others by showing an average classification accuracy of 75.9% in classifying non-functional requirements from IoT-oriented healthcare requirement documents. This research is not only novel in its concept of using a machine learning approach for classification of non-functional requirements from IoT-oriented healthcare system requirement documents, but it also proposes a novel hybrid KNN-rule based machine learning algorithm for classification with better accuracy. A new dataset is also created for classification purposes, comprising requirements related to IoT-oriented healthcare systems. However, since this dataset is small and consists of only 104 requirements, this might affect the generalizability of the results of this research.


INTRODUCTION
One of the most important tasks of developing high-quality software is gathering the right requirements and ensuring no missing requirements. Often during the process of Requirement Engineering (RE), more attention is given to eliciting the functional requirements than non-functional requirements. This results in the poor quality end product and results in loss of cost, effort, and even failure of the project. Non-functional requirements explain the important quality attributes (1,2) and constraints (3) that must be implemented in the system. If these requirements are left ignored, the architecture is not designed properly (4). Thus, classifying the non-functional requirements becomes a mandatory task in the RE process for designing the architecture design and performing other related activities accordingly (4,5). Traditionally, non-functional requirements are identified and extracted manually which is an erroneous process (5,6) and laborious (7). The nonfunctional requirements are intertwined with the functional requirements in RE documents, not properly structured. Therefore, proper identification and categorization are required (8). Since the architect needs to know the types of non-functional requirements, non-functional requirements must be classified properly (9). Currently, the trend of using ML algorithms for classifying the non-functional requirements is gaining attention (8). However, there is still a need to classify the non-functional requirements accurately. By using automated techniques, the chance of missing the non-functional requirements is reduced, classification accuracy can be improved, and time and effort are also saved. IoT possesses a vital role in the health care systems of the present era. It includes a set of devices, which help store, process, and transfer data to achieve smart services (10)(11)(12)(13). Healthcare systems, which are IoT-oriented, provide monitoring services of patients' data and help take immediate steps in an emergency (14)(15)(16). Thus, it is critical to ensure that these smart healthcare systems are extremely secure and perform reliably so that patients' sensitive data is not only kept protected from manipulation and attacks but also transferred entirely and safely in a timely fashion (17)(18)(19). In order to ensure that the IoT-oriented healthcare system possesses all these important non-functional requirements, it is necessary to extract all the non-functional requirements from the requirement document of the IoT-oriented healthcare system in the RE phase. Machine learning-based techniques are adopted to implement nonfunctional requirements in smart health care systems. However, attention needs to be given to the first classification of the non-functional requirements from the requirement document to ensure that non-functional requirements are not missed in the RE phase. By performing this classification task automatically, the chances of missing non-functional requirements will be reduced, and the probability of developing a high-performing and secure IoT-oriented healthcare system will increase (20).
In this research, the main focus is on finding machine learning algorithm and relevant features which helps in the classification of non-functional requirements with higher accuracy. For features extraction, Bag of Words (BOW) and Term Frequency-Inverse Document Frequency (TF-IDF) are adopted. The algorithms considered by this study are Logistic Regression (LR), Support Vector Machine (SVM), Multinomial Naïve Bayes (MNB), K-Nearest Neighbors (KNN), ensemble, Random Forest (RF), and hybrid KNN rule-based ML algorithms. The two main distinct additions here are the ensemble and hybrid classifiers.
The existing studies have limitations. Some studies have reported low performance (1,6,21). The types of non-functional requirements considered by some studies are very less (4,22,23). The dataset adopted by different studies has a limited number of non-functional requirements (2,9,21). Some studies adopted the manual process of validation which may have chances of error (21).
This article makes the following contributions.
• Creation of dataset, which includes requirements related to IoT-oriented healthcare system. The requirements included in this dataset belong to 8 categories: Accuracy, Reliability, Security and Privacy, Performance, Compatibility, Usability, Functional, and Maintainability. The numbers of requirements in this dataset are 104. IoT-oriented healthcare system requirement documents create this dataset (24,25).
Since there are no currently such dataset available that contains labeled requirements related to IoT-oriented healthcare systems, this will serve as an aid for future research purposes in automating the classification of nonfunctional requirements from IoT-oriented healthcare system requirement documents. • Development of a novel hybrid KNN rule-based machine learning algorithm, which provides better classification accuracy than traditional machine learning algorithms like SVM, KNN, RF, MNB, LR, and ensemble. This novel hybrid machine learning algorithm helps classify nonfunctional requirements from IoT-oriented healthcare system requirement documents with an average classification accuracy of 75.9%. • Provision of features relevant to non-functional requirements that the IoT-oriented healthcare system must possess. These features will help the researchers to create better classifiers by including them in machine learning-based techniques.
This research article is structured in the following manner. Section 2 describes the relevant work conducted earlier in the literature related to the classification of the non-functional requirements. Section 3 describes the research methodology which is adopted in this research. Section 4 shows the experiment portion. Section 5 presents the results and discussion. Section 6 provides the conclusion and highlights possible future directions.

LITERATURE REVIEW
In literature, many studies have highlighted the problem of ignored and missed non-functional requirements in the RE process (8,21). Non-functional requirements are sometimes hidden and not clearly stated, due to which it becomes difficult to identify them, and chances of missing them increase. Initially, more importance is given to eliciting the functional requirements where non-functional requirements are discovered at late stages which results in issues related to architectural design (1-3, 5, 6), cost management (9,22,23), time management (4,5,7,23), and risk (26) and quality management (5,7,27,28). The problem extrapolates due to manual identification and classification of non-functional requirements (5). The manual classification is quite a time-consuming and laborious process (22,29). Search function provided by tools is used by typing keywords and searching, which is an exhaustive process and may lead to missing the non-functional requirement (30). The concept of automating or semi-automating this task gained popularity (7,27). However, this trend of automating the nonfunctional requirement classification is still young and requires further research for preprocessing, selecting optimum feature sets, acquiring relevant datasets, and using appropriate ML algorithms (8). Moreover, different studies have worked on different categories of non-functional requirements (8) and they considered different types of documents like Certification Commission for Healthcare Information Technology (CCHIT) Ambulatory requirement document (9), Emergency Department Information Systems Functional Document (31), Mercedes-Benz Specification Document (26), and European Union e-Procurement documents (27) from where the non-functional requirements are extracted. Due to different factors, the classification performance reported by different studies is different, and the comparison is not possible because of the element of biasness. Few studies have managed to report the full process of classifying the non-functional requirements from the requirement document (8). Table 1 shows various techniques which are adopted by different studies for the classification of non-functional requirements.

Non-Functional Requirements Classification Using Naive Bayes
One of the techniques to classify the non-functional requirements from the requirement document is by Naive Bayes (5). In this technique, the probability is calculated to classify non-functional requirements. Nine non-functional requirements are focused on, and the PROMISE dataset is used. For evaluation, leave one out cross-validation is performed having 15 iterations. The classifier's performance is very poor precision, which is 12.4%. The classification error of the above classifier indicates that it needs improvement. Similarly, in another study, the Naive Bayes algorithm is used to classify the non-functional requirement but with Expectation-Maximization (4). Non-functional requirements categories in which the data is classified are 9. The classifier's performance in terms of accuracy is reported to be about 75%.

Non-Functional Requirements Classification Using Decision Trees
Decision trees are also used to classify the non-functional requirements (22). PROMISE dataset is used, and 10-fold cross-validation is performed for evaluation. The accuracy of classification is 98.56%. The study above has not mentioned the types of non-functional requirements.

Non-Functional Requirements Classification Using Bagging
Research classifies the non-functional requirements from the requirement document by using machine learning ensemble meta-algorithm Bagging (29). The dataset consisted of 21,969 user reviews. The evaluation is performed using 10-fold crossvalidation. The classifier's performance in terms of precision is found to be 71.4% and recall of 72.3%. The non-functional requirements types that are considered by the study are 4. First of all, the types of non-functional requirements considered are only four. Furthermore, the performance of the classifier is relatively low.

Non-Functional Requirements Classification Using SVM
In one research, a SVM is used to extract the non-functional requirements from the requirement document (23). In the technique, the documents are first preprocessed. After preprocessing, SVM is applied. Cross-validation is applied for validation in which the dataset is divided into subsets, and validation is performed in iterations. One subset is used for validation in each iteration, and the remaining subsets are used for training. The performance of the classifier is evaluated using 6-fold cross-validation. The performance on the PROMISE corpus in terms of precision is 77%. The non-functional requirements types that are considered are 5. In terms of performance, there is still work that needs to be done to increase the precision. The Sequential Minimal Optimization (SMO) algorithm is used for classifying the non-functional requirements from the requirement document (9). First, preprocessing is done on text, then SMO is applied. The evaluation is performed using 10-fold cross-validation. The performance is found to be 72.8% in terms of precision. The precision of the classifier needs improvement.
Another study reported using the SVM algorithm to classify the non-functional requirements from the specification of Mercedes Benz (26). This approach is semi-supervised. Manual evaluation is performed. The performance of the classifier in terms of precision is >60%. The research is done using an automotive industry document; there is no surety about how the classifier performs in other industry specifications. SVM is used by one more study to extract the non-functional requirements (33). In this technique, the documents are first preprocessed, then applied SVM. The performance of the classifier is evaluated using 10-fold cross-validation. PROMISE dataset is used for classification. The non-functional requirements which are considered by the study are only 4. The performance in terms of precision is found to be 78.25%.

Non-Functional Requirements Classification Using KNNs
One of the studies focused on classifying security-related requirements (31,34,35). In the study, first preprocessing is done, then KNN is applied. The types of security requirements into which the sentences are classified are 7. The evaluation is performed using 10-fold cross-validation. The classifier's performance in terms of precision is found to be 82%. The above study focused on only security-related requirements, not other non-functional requirements.
In literature, another study used KNN to classify the non-functional requirements (27). In the article, first, the text is preprocessed, then KNN is applied. The classifier's performance is found to be 97.73% in terms of precision. The datasets used for evaluation contain requirement documents with 57 and 88 non-functional requirement sentences. The data set used for validation has a minimal number of nonfunctional requirements.
One of the research used the KNN algorithm concept to classify the non-functional requirements (32). The dataset used for classification is the PROMISE dataset. The classifier's performance is evaluated by iterating 14 times using different threshold values. The classifier's performance in terms of precision is found to be only 50.65%.

Non-Functional Requirements Classification Using LR
Logistic regression is used by research to classify non-functional requirements (7). This algorithm works based on the probability function. The data set used the PROMISE_exp dataset. The types of non-functional requirements considered for classification purposes are 11. The classifier's performance is calculated by performing 10-fold cross-validation, only 75% in terms of precision.

Non-Functional Requirements Classification Using Convolutional Neural Network (CNN)
One study proposed CNN for classifying the non-functional requirements from the requirement document (30). We focus on a total of 5 non-functional requirements. The text is first preprocessed in the study, then CNN is applied. A PURE dataset which consists of 1,247 requirement sentences, is used. The classifier's performance is calculated by performing 2-fold crossvalidation, and it is found to be 92.2% in terms of precision.
Similarly, various techniques are proposed to ensure the security of IoT-oriented health care systems. One technique proposed is based on a machine-learning algorithm that uses a biometric framework for ensuring the security of sensitive data (36). In this technique, electrocardiogram (ECG) signals are used to extract the features used by a machine-learning algorithm. The proposed system is based on a multilayer perception model, where in order to secure the signal and protect it from possible interference, a secure communication channel is developed. The authentication process is carried out in the testing phase by considering unique generated EIs from the ECG and coefficients from polynomial approximation.
One study highlights the usage of the biometric cryptosystem (BCS) for ensuring security (37). The proposed approach considers Artificial Neural networks (ANN) to analyze the signal energy variations of implanted devices. The inertial measurement units (IMU) are fixed inside the implanted devices, which help detect signal energy changes. In this technique, ANN is trained so that the first sensor is placed on the chest, and then signals are processed by the algorithm. ANN in the proposed technique comprises three layers: an input layer, an output layer, and a hidden layer comprising 10 hidden nodes. The security is achieved by a 128-bit key generated by gait cycles. This key is difficult for hackers since it cannot be achieved by a modern attacking mechanism like dictionary attacks.
In research study, a novel machine learning-based security framework is proposed to detect malicious activities in healthcare systems (38). The proposed framework named Health Guard is developed by considering four machine learning algorithms: KNN, RF, Decision Trees, and ANN. The malicious activities are detected by observing the vital signals of the implanted devices and then correlating the vitals to identify the variations in the patient body. The classifiers are developed by training using nine databases of eight smart implantable devices and by considering activities, among which 7 are normal user activities, and five are disease-related. The validation process is conducted by testing through only three threats: denial of Service (DoS) attacks, tempered medical devices, and false data injection.
The literature review reflects the limitations of the current study. The performance of some techniques is quite average (1,6,26). The non-functional requirements classes considered by some studies are very few (4,23,31). The dataset contains limited non-functional requirement sentences for training and testing purposes (2,3,9). Studies have mostly considered classifiers, which are developed using single machine learning algorithms (8). Different combinations of hybrid and ensemble classifiers are barely considered.

RESEARCH METHODOLOGY
In order to find the relevant features and determine the ML algorithm that achieves better accuracy for the classification of non-functional requirements, the research answers the following research questions: • RQ.1. Which features help to increase the accuracy of nonfunctional requirement classification? • RQ.2. Which machine learning algorithm classifies nonfunctional requirements with better accuracy?
The research method used for the validation of the ML algorithm is experimentation (39). Experimentation is a systematic method that involves observation, manipulation, and control to generate accurate and reliable results. It is used to study the cause-effect relationship and prefers standardizing tools for maintaining control and achieving precision. It helps develop automatic techniques and evaluate the accuracy of classifying the non-functional requirements while controlling variables like data used for classification, number of nonfunctional requirements to be classified, and validation method adopted. Simulation, which is similar to experimentation, is not adopted because of the unavailability of its immense requirements to simulate the technique correctly and because of more chances of uncertainty in results due to its abstract nature (40). PROMISE_exp dataset is used for classification (7). This dataset contains requirements from real projects, and it is the expanded form of the existing PROMISE dataset. The use of a large dataset also helps in generalizing the results (7). Eleven nonfunctional requirements are chosen for classification: Availability, Legal and Licensing, Look and Feel, Maintainability, Operability, Performance, Scalability, Security, Usability, Fault Tolerance, and Portability. These non-functional requirements are the common non-functional requirements in many projects and are also misclassified and ignored due to their ambiguous nature (41). The ambiguity level is reduced in this research by using features relevant to non-functional requirements. In addition to this, the PROMISE_exp dataset also supports learning on these types of non-functional requirements (7).
Bag of Words and TF-IDF are used separately to answer the first research question and find the relevant features. The reason for using these methods for finding relevant features is that they work well for classifying the non-functional requirements (7,8). Moreover, both methods cover different weighing features, and experimenting with both methods helps find more relevant features. BoW ignores the sequence of words; however, in the case of non-functional requirement classification, some algorithms can work well irrespective of the order of the information. BoW method weighs the features according to their number of occurrences in all documents (29) while the TF-IDF method gives more weight to those features which have less overall frequency but more frequency in a specific document (4, 7). Some highly relevant features are also added to help rule-based learning, part of the hybrid KNN-rule based ML algorithm.
To answer the second research question and find the ML algorithm that helps classify non-functional requirements with better accuracy, LR, SVM, MNB, KNN, the ensemble made from these algorithms, RF and hybrid KNN-rule based ML algorithms are applied. The reason for choosing these ML algorithms for classifying the non-functional requirements is because these supervised ML algorithms perform better than unsupervised or semi-supervised ML algorithms (8). The algorithms are also selected while keeping in front the dataset, which is used to classify non-functional requirements. To evaluate the accuracy of these classifiers, 10-fold cross-validation is performed. The reason for choosing this method of evaluation is that it helps in producing more accurate and unbiased results (4,23). Furthermore, many studies adopt 10-fold cross-validation method which helps in comparing the results without bias (4,5,7,23). The steps to the classification of non-functional requirements are given in Figure 1.
Internet of Things plays a crucial role in the health care systems of the modern era. It provides a facility to monitor, control, and prevent diseases by collecting and processing healthrelated data through sensors (42). Since these health care systems are critical, it is important to ensure that they work efficiently, encompassing all the required non-functional requirements. Health care systems have compromised performance or security, resulting in the loss of critical health data and even threats and attacks by hackers, which may affect the life of patients (17). An IoT-oriented health care system generally has three layers, which are the perception layer, fog layer, and cloud layer (43)(44)(45). The layered architecture is shown in Figure 2. The perception layer includes actuators or devices that help collect sensory data. The fog layer processes the data to produce the required response quickly. The Cloud layer deals with storing the data and big data analytics (44).

Dataset Description
In this research, the PROMISE_exp dataset (46) is used for the classification of non-functional requirements. This dataset is the expanded form of the original PROMISE dataset, which comprised only a total of 625 labeled requirement sentences (21). The expanded form contains more labeled requirement sentences including 444 functional requirements and 525 non-functional requirements (7). The 11 types of nonfunctional requirements are distributed among the sentences in an unbalanced fashion. Table 2 shows the formulation of this dataset.

Text Preprocessing
Text preprocessing is the first step in which the data is cleaned to remove redundant and less relevant features (8).
In-text preprocessing, first, the data is normalized. During normalization, natural language text is first converted to lower case, then non-alphabetical characters, symbols, and punctuation are removed. Tokenization is performed after normalization, in which text is converted to tokens. Then in the next step, the stop words are removed, which are also less relevant features. Finally, words are lemmatized in which they are converted to their root form in order to remove extra features (7).

Feature Extraction
Feature extraction is the second step in which the preprocessed data is converted into vectors. To extract the features, BoW and TF-IDF are applied. The description of both techniques is given below.

Bag of Words
Bag of Words is a simple technique in which the words are converted into a numerical format based on their number of occurrences. Each feature gets its value equal to the number of times it appears in the requirement sentence. The vector of requirement sentence 'j' is represented in Equation 1.
In Equation 1, x i,j represents the weight of the feature which is calculated on the basis of the occurrence of 'i' in the requirement 'j', whereas 'n' represents the total number of words (7). When the vectors are created, they are given as an input to the ML algorithms in the next step.

Term Frequency-Inverse Document Frequency
In this technique of vectorization, two metrics play their role. The first metric is the Term Frequency (TF) which represents the number of times a particular word occurs in a requirement. The second metric is the Inverse Document Frequency which is achieved by dividing a total number of requirement sentences by the requirement occurrence number for each word and then applying a logarithmic function on the output (47). Below is the mathematical representation of TF-IDF.
In Equation 2, tf i,j shows the frequency of the term i in the requirement j, where idf i is the Inverse Document Frequency of 'i' which is mathematically represented as Equation 3:

Machine Learning Algorithms
The machine learning algorithms LR, SVM, MNB, KNN ensemble, RF, and hybrid KNN-rule based ML algorithms are applied to vectorized data. The classifiers are constructed not only by using the monolithic concept but also ensemble and hybrid are also considered since they yield better results according to literature (8).

Evaluation
The performance of the classifiers in terms of accuracy is evaluated using 10-fold cross-validation (7). In 10-fold crossvalidation concept, the training set is divided into ten subsets of data which are of almost the exact sizes, and then testing takes place in 10 iterations, wherein each iteration, one fold which comprises 10% of the dataset is left for testing and other nine folds which comprises 90% of the dataset are used for training. In this way, each sample of the data is used once in both training and testing (30). The purpose of performing 10-fold cross-validation is to reduce the chance of biasness in results. In this research, the metric ' Accuracy' is considered to get an overall idea of the correct classification made by the classifier. Accuracy of the classifier is the measure of how many correct predictions are made by the classifier in contrast to a total number of predictions as shown in Equation 4.
Where TP represents the predictions correctly identified as positive, TN represents those correctly identified as negative.

EXPERIMENT
In order to implement the technique and perform the experiment, the Anaconda tool is used. PROMISE_exp dataset is obtained and converted into CSV format to process by the tool. The data is first preprocessed in order to remove useless features.
In the first step of data preprocessing, the text is normalized. In the next step, the sentences are converted into tokens. Then in the next step, the stop words are removed. Then in the final step of text preprocessing, words are lemmatized and converted to their root form to remove extra features. After preprocessing, the data is vectorized using BoW and TF-IDF. For classification of non-functional requirements, 7 ML algorithms are considered, which are LR, SVM, MNB, KNN, ensemble, RF, and hybrid KNN-rule based ML algorithms. The Hyperparameter of the classifiers is tested and set manually by checking their performance. The performance is evaluated in terms of accuracy by using 10-fold cross-validation.
The work in this research differs from the recent study (7) in the sense that this study considers more ML algorithms like RF, Ensemble, and hybrid KNN-rule based ML algorithm. Moreover, the relevant features are also added to increase the performance in terms of accuracy.
The non-functional requirements of an IoT oriented health care system generally include security (17,42,44) privacy interoperability, scalability, reliability (17,44), accuracy, usability (17), performance (17,44,48), and maintainability (48). The system must possess these non-functional requirements since missing any critical non-functional requirement results in severe loss of healthcare data and other threats to patients' lives. Thus it is important to extract and classify all the nonfunctional requirements from the requirement document to be implemented accordingly in the IoT-oriented healthcare system. The manual process of extracting the non-functional requirements is erroneous and laborious. To avoid missing non-functional requirements, this research aims to adopt a machine learning-based approach to automatically classify the non-functional requirements from the IoT-oriented healthcare system's requirement document.
To implement the automatic technique, experimentation is performed. First dataset is created, comprising of functional and non-functional requirements of IoT-oriented health care systems (24,25). This dataset contains requirements belonging to 8 classes: Accuracy, Reliability, Security and Privacy, Performance, Compatibility, Usability, Functional, and Maintainability. The dataset obtained is then preprocessed. IoT-oriented healthcare systems' requirement sentences are first normalized in the preprocessing stage. The text is converted to lower case in normalization, and punctuation marks and non-alphabetical characters are removed. In the second phase of preprocessing, the text is converted into tokens. In the third phase of preprocessing, stop words are removed. In the fourth phase, lemmatization converts the words into their lemma or dictionary format. After text preprocessing, vectorization is performed using TF-IDF. The features extracted by TF-IDF are then used to create classifiers by applying a machine learning algorithm. The machine learning algorithms adopted in this experiment are SVM, LR, KNN, MNB, RF, Ensemble, and hybrid KNN-rule based machine learning algorithm. For evaluation of classifiers performance, 10-fold cross-validation is performed.

RESULTS AND DISCUSSION
This section presents the experimentation results, which are conducted to find out the relevant features that increase the accuracy of classification of non-functional requirements (RQ1) and explore the ML algorithm, which performs better in terms of accuracy for classification of non-functional requirements (RQ2).

Relevant Features for Non-Functional Requirements Classification
In the experiment stage, two feature extraction techniques are adopted: BoW and TF-IDF. After preprocessing of text and extracting the features with the help of vectorization techniques, the top relevant features which are obtained from both the techniques are shown in Table 3. Furthermore, some coding is also performed to add more features.
In addition to these features obtained by the vectorization techniques, the features which are added according to the relevancy of the type of non-functional requirement are shown in Table 4. These added features helped achieve an accuracy as high as 85% on average, as can be seen, which is achieved by the hybrid KNN rule-based ML algorithm.
The performance of the ML algorithms when fed with the features is shown in Figure 4. Figure 4 highlights the effect of BoW and TF-IDF on the classification accuracy of ML algorithms. In the hybrid KNN rule-based ML algorithm application, TF-IDF outperforms BoW to extract more relevant features and provide more accurate results. In the case of the ensemble, BoW provides more relevant features. BoW also outperforms TF-IDF in extracting more relevant features when applied LR and MNB. In the case of SVM and KNN, the features extracted by TF-IDF proved to be more relevant. Both BoW and TF-IDF provide similar accuracy with RF.

Machine Learning Algorithm With Higher Accuracy
The second research question (RQ2) is about finding the ML algorithm that provides more accurate results than others classifying non-functional requirements. In Figure 5, the accuracy provided by all the 7 ML algorithms is illustrated in trend line fashion when BoW is used. There are a total of 10 iterations along which the accuracy is highlighted. The accuracy provided by the hybrid KNN rule-based ML algorithm is the highest overall. The accuracy of the ensemble drops at iteration 3, but overall, it provides almost average classification accuracy compared to the hybrid classifier. On the other hand, LR provides better accuracy than the ensemble overall but is still lower than the hybrid classifier's accuracy. SVM does not show    good classification accuracy and performs low overall compared to other ML algorithms except KNN. The classification accuracy provided by MNB is low compared to hybrid and LR overall but still better than other ML algorithms. KNN provides the least good accuracy compared to other ML algorithms overall. RF performs better in terms of accuracy overall than SVM and KNN. In Figure 6, the classification accuracy of all ML algorithms is shown when TF-IDF is adopted. The hybrid KNN rulebased ML algorithm outperforms all other ML algorithms in providing better classification accuracy. Compared to other ML algorithms, ensemble only proved to perform better than MNB in providing better accuracy overall. LR provides better accuracy than MNB, SVM, and ensemble overall, while SVM provides better classification accuracy than just ensemble and MNB. The classification accuracy provided by MNB is low overall compared to all other ML algorithms. KNN, in this case, provides better accuracy than other ML algorithms except for hybrid. RF performs lower in accuracy than hybrid and KNN but provides better classification accuracy overall than other ML algorithms.
The classification results in terms of accuracy are shown in Tables 5, 6. The result in Table 5 clearly shows that hybrid KNN-rule based ML algorithm outperforms other ML algorithms by achieving an average accuracy of 0.778 with BoW. As a result of its nature, a hybrid classifier considers the usage of rules constructed with relevant features, which helps in the classification of non-functional requirements more accurately. Similarly, Table 6 also shows that the highest classification accuracy among all the ML algorithms is achieved by the hybrid KNN-rule based ML algorithm, which is 0.857 accuracy on average. The results show that by using TF-IDF for feature extraction and hybrid KNN-rule based ML algorithm for classification, an average accuracy of 85.7% can be achieved, which is quite promising. Table 5 shows the average classification accuracy of ML algorithms in 10 iterations when BoW is applied. The hybrid KNN-rule based ML algorithm achieves the highest average classification accuracy, which is 77.8%.
The results of experimentation using IoT-oriented healthcare requirement documents are shown in Table 7. Table 6 shows the average classification accuracy of ML algorithms in 10 iterations when TF-IDF is applied. The hybrid KNN-rule based ML algorithm achieves the highest average classification accuracy, 85.7%.
The results show that the hybrid KNN-rule based machine learning algorithm outperforms others by showing average classification accuracy of 75.9%. In existing studies, many machine learning-based techniques are adopted to classify the non-functional requirements from the requirement document. However, to our knowledge, this is the first research that solely focuses on IoT-oriented healthcare system requirement documents to classify the non-functional requirements automatically. The automatic technique proposed in this research classifies the non-functional requirements with reasonable accuracy, enabling the development of an excellent IoT-oriented healthcare system.
Certain limitations are present in the approach presented by this research. First, the dataset adopted is minimal and contains only 104 requirements, which may affect the generalizability of the results of this research. Second, the non-functional requirements related to IoT-oriented healthcare systems covered by this research are only 7 in number. Third, the machine learning algorithms considered by this research for classification of non-functional requirements from IoToriented healthcare system requirement document are only 7, including only one ensemble and hybrid algorithm and five supervised machine learning algorithms. This research does not cover neural Networks, Semi-supervised or unsupervised machine algorithms.

THREATS TO VALIDITY
This section presents the possible threats which affect the validity of this research. The steps taken to mitigate the effect of threats are partially highlighted.

Construct Validity
Construct validity refers to the concept that defines the degree to which the variables measures accurately what they are supposed to measure (41). Many studies consider this measure for measuring the performance of classification, and it is believed to capture the aspect of performance required. In this research, the standard measure "Accuracy" is adopted to measure classifiers' performance.

Internal Validity
Internal validity is related to measuring the extent to which the experimental results are derived from the data and not any unconsidered variables (29). In this research, a threat is considered related to the over-fitting of test data on the machine learning algorithms. The effect of this threat is mitigated by using 10-fold cross-validation.

External Validity
External validity refers to the extent to which the results of this research apply to other settings (29). In this research, the domain and size of the dataset affect the external validity of this research. This threat is mitigated partially by selecting the PROMISE-exp dataset (46), which contains requirement sentences from the software domain, and contains more requirements than the existing PROMISE dataset.

CONCLUSION AND FUTURE DIRECTIONS
Non-functional requirements are essential in the RE process since they play an essential role in driving the software architecture and determining the product quality. Since these requirements are written in natural language and often get twined up with the functional requirements, they are often missed or ignored. In order to overcome the difficulties caused due to human manual processes and inadequate tools, this research has focused on using ML algorithms for the automatic classification of non-functional requirements. Since this task of non-functional requirements classification has to be performed accurately, this study has focused on finding relevant features and ML algorithms that can perform this task with great accuracy. In this research, BoW and TF-IDF are used for feature extraction. Further, some features are also added for increasing the accuracy. The ML algorithms adopted in this research are LR, SVM, MNB, KNN, ensemble, RF, and hybrid KNN-rule based ML algorithms. The types of non-functional requirements considered are 11. The dataset used for classification is the PROMISE_exp dataset. In order to compare the accuracy of classifiers created by different ML algorithms and features, experimentation is considered. The experiment results show that by using TF-IDF for extracting the features and hybrid KNN-rule based ML algorithm for classification, an average accuracy of 85.7% can be achieved, which is a pretty excellent performance. It is noteworthy that the accuracy level achieved by this study is higher than the accuracy achieved by a recent study (7) on the PROMISE_exp dataset. Automatic classification of nonfunctional requirements using high-performing ML algorithms and relevant features helps the Requirement Engineers accurately classify the non-functional requirements. It saves their critical time by using fewer advanced tools and adopting human manual processes. The research for using the machine learning approach for automatic classification of non-functional requirements with better performance is still in progress. The literature noted that the studies barely consider hybrids and ensembles to classify non-functional requirements. In the future, further study can be done to increase the size of the dataset and incorporate more types of non-functional requirements. Further attention can also be given to creating more custom classifiers by using different combinations of ML algorithms.

DATA AVAILABILITY STATEMENT
The original contributions presented in the study are included in the article/supplementary material, further inquiries can be directed to the corresponding author/s.