Hypertension-Related Drug Activity Identification Based on Novel Ensemble Method

Hypertension is a chronic disease and major risk factor for cardiovascular and cerebrovascular diseases that often leads to damage to target organs. The prevention and treatment of hypertension is crucially important for human health. In this paper, a novel ensemble method based on a flexible neural tree (FNT) is proposed to identify hypertension-related active compounds. In the ensemble method, the base classifiers are Multi-Grained Cascade Forest (gcForest), support vector machines (SVM), random forest (RF), AdaBoost, decision tree (DT), Gradient Boosting Decision Tree (GBDT), KNN, logical regression, and naïve Bayes (NB). The classification results of nine classifiers are utilized as the input vector of FNT, which is utilized as a nonlinear ensemble method to identify hypertension-related drug compounds. The experiment data are extracted from hypertension-unrelated and hypertension-related compounds collected from the up-to-date literature. The results reveal that our proposed ensemble method performs better than other single classifiers in terms of ROC curve, AUC, TPR, FRP, Precision, Specificity, and F1. Our proposed method is also compared with the averaged and voting ensemble methods. The results reveal that our method could identify hypertension-related compounds more accurately than two classical ensemble methods.


INTRODUCTION
Hypertensive disease is a frequent cardiovascular disease characterized by elevated arterial blood pressure and accompanied by the target organ injury or clinical diseases (Essiarab et al., 2011;Owlia and Bangalore, 2016). It is a risk factor leading to many serious complications such as stroke, hypertensive heart disease, renal failure, atherosclerosis, and so on (Sakai and Sigmund, 2005;Brinks and Eckhart, 2010). Due to the increasing pressure of work and life, many people do not develop good eating and living habits, and often stay up late. The age of hypertensive patients tends to be younger. Therefore, the prevention and treatment of hypertension has become very important for human health.
Network pharmacology (NP) could construct a multi-dimensional network based on "traditional Chinese medicine prescription-chemical component-targets-disease targets" to analyze the relationships between traditional Chinese medicine multi-components and activity, which could provide a theoretical basis for further experimental research on a pharmacodynamic material basis and action mechanism Xu et al., 2018). In recent years, network pharmacology has revealed therapeutic targets for hypertension and become a research hotspot, as it has been clinically verified to be an effective method of drug screening . Chen et al. screened out the key compounds and targets of JiaWeiSiWu granule to reveal the mechanism of JiaWeiSiWu granule in treating hypertension by NP method (Chen et al., 2021a). By NP and molecular docking (MD) methods, Zhai et al. investigated the mechanism of Pinellia ternate in treating hypertension (Zhai et al., 2021). Chen et al. analyzed the network based on Guizhi decoction, active compounds, and targets, and found hypertension-related targets and key pathways (Chen et al., 2021b). Chen et al. utilized NP and MD to analyzed the genistein for treating pulmonary hypertension (PH) and provided new guidance for further PH-related research (Chen et al., 2019). Liu et al. explained the pharmacological mechanism of TaohongSiwu decoction in the treatment of essential hypertension (EH) by the NP method . Wang et al. utilized NP to analyze the mechanism of Yeju Jiangya decoction against hypertension .
In recent decades, many data mining methods have been applied to reveal the disease mechanism and medication law of many complex diseases, especially hypertension (Ji and Wang, 2014;Ji et al., 2015;Hwang et al., 2016;Liang et al., 2018;Amaratunga et al., 2020;Liu et al., 2021;Zhao et al., 2021). Zhang et al. utilized SPSS21.0 and Apriori algorithm to analyze the symptom/sign information of EH patients collected and gave their distribution law and correlation (Zhang et al., 2019a). Yuan and Chen proposed niche technology and an artificial bee colony algorithm to mine association rules from Traditional Chinese Medicine (TCM) cases for treating hypertension (Yuan and Chen, 2011). Ma et al. collected the new literature about hypertension and constructed the gene network by analysis (Ma et al., 2018). Ramezankhani et al. utilized a decision tree to predict the risk factors of hypertension incidence in data collected from Iranian adults (Ramezankhani et al., 2016). Aljumah et al. utilized a data mining method to predict the treatment of hypertension patients with different age groups (Aljumah et al., 2011). Fang et al. proposed a new modelbased KNN and LightGBM to predict the risk of hypertension (Fang et al., 2021).
Few studies have involved the use of data mining methods to improve network pharmacology. In this paper, a novel ensemble method based on a flexible neural tree (FNT) is proposed to identify hypertension-related active compounds. In the ensemble method, the used base classifiers are Multi-Grained Cascade Forest, support vector machines, random forest, AdaBoost, decision tree, Gradient Boosting Decision Tree, KNN, logical regression, and naïve Bayes. The classification results of nine classifiers are input to the FNT model, which is trained to predict hypertension-related compounds. The data used in the experiment are from up-to-date literature collected about hypertension and network pharmacology. By analysis of the literature, hypertension-related compounds were collected as positive samples and the generated decoys were utilized as negative samples. The molecular descriptor of each compound is extracted as the feature vector.
. . x m i } contains m features and category label y i c 1 , c 2 } { contains two cases. The nine classifiers used are introduced in the following sections of the article.

Multi-Grained Cascade Forest
Multi-Grained Cascade Forest (gcForest) is a novel ensemble machine learning method, which utilizes the cascade forest (ensemble of decision trees) to learn and generate models (Zhou and Feng, 2017). The core of gcForest mainly includes two modules: multi-grained scanning and cascade forest. The flowchart of gcForest is depicted in Figure 1.

1) Multi-grained scanning
Multi granularity scanning is a technical means to enhance cascade forest and do more processing on features. Firstly, a complete m-dimensional sample is input, and then sliding sampling is carried out through the k 1 -dimensional and k 2 -dimensional sampling windows in order to obtain s 1 (m − k 1 ) + 1 and s 2 (m − k 2 ) + 1 feature subsample vectors, respectively. Each sub-sample is used for the training of completely random forest (A) and random forest (B). A probability vector with 2-dimension is obtained in each forest, so that two kinds of forests can produce 2s 1 and 2s 2 representation vectors, respectively. Finally, the results of all forests are spliced together to obtain the sample output.

2) Cascade forest
Cascade forest includes several layers, each layer is composed of many forests, and each forest is composed of many decision trees. Completely random forest (A) and random forest (B) in each layer ensure the diversity of the model. For a completely random forest, each tree in the forest randomly selects a feature as the splitting node of the splitting tree, which grows until each leaf node is subdivided into only one class. For random forest, each tree randomly selects m √ candidate features, and the splitting nodes are filtered through the Gini coefficient. Each forest could generate a two-dimensional class vector. The two-dimensional class vectors of all forests are averaged to obtain the final twodimensional class vector. Finally, the category with the maximum value in the final two-dimensional class vector is taken as the final classification result.

Support Vector Machines
Support vector machines (SVM) is a supervised learning algorithm based on statistical learning theory (Suykens and Vandewalle, 1999;Furey et al., 2000). With the sample set containing positive and negative samples, SVM could search a hyperplane that could segment the samples according to positive and negative classes. The classification hyperplane can be given as follows.
Where x is the data point on the classification hyperplane, w is a vector perpendicular to the classification hyperplane, and b is the displacement. Linear separated data can be distinguished by the optimal classification hyperplane. For non-linear separated data, SVM can be transformed into solving the following optimization problem by the soft interval optimization and kernel techniques.
Where Cis the penalty factor, ς i is the relaxation variable, and x i is mapped to a high-dimensional space by ϕ. SVM could find a hyperplane with the largest interval in this high-dimensional space to classify the data.

Random Forest
Random forest (RF) is a machine learning method based on an ensemble of decision trees for classification and regression (Breiman, 2001;Díaz-Uriarte and Alvarez de Andrés, 2006). Random forest is a combined classification model composed of many decision tree classification models. Each decision tree has the right to vote to determine the best classification result. In random forest, firstly, K sample sets are extracted from the original training set by bootstrap sampling method, and the size of each extracted sample set is the same as that of the original training set. Then, K decision tree models are established from K sample sets, respectively. And K trees will create K classification results. The random forest integrates all the classified results by voting method, and the category with the most votes is designated as the final classification result.

AdaBoost
AdaBoost is a dynamic ensemble classification algorithm, which is to reasonably combine multiple weak classifiers (single-layer decision tree) to make it a strong classifier (Morra et al., 2009;Cao et al., 2013). The detailed algorithm is given as follows.
1) Initialize the weight of each sample. Assuming that the dataset contains n samples, each training sample point is given the same weight ( 1 n ) at the beginning. 2) Train weak classifiers. According to the samples, the weak classifiers are trained. If a sample has been accurately classified, its weight will be reduced in constructing the next training set. On the contrary, if a sample point is not accurately classified, its weight is increased. At the same time, according to the classification error of the weak classifier, its weight is calculated. Then, the sample set with updated weights is used to train the next classifier, and the whole training process goes on iteratively. T weak classifiers are obtained after T iterations.
3) The trained weak classifiers are combined into strong classifiers. Each weak classifier connects its respective weights through the classification function to form a strong classifier. After the training process of each weak classifier, the weight of the weak classifier with a smaller classification error rate is larger, which plays a greater decisive role in the final classification function, while the weight of the weak classifier with a larger classification error rate is smaller, which plays a smaller decisive role in the final classification function. Frontiers in Genetics | www.frontiersin.org October 2021 | Volume 12 | Article 768747

Decision Tree
A Decision Tree (DT) learning algorithm is usually a process of recursively selecting the optimal features and segmenting the training data according to the features so that each sub dataset has the best classification. The CART algorithm is one of the most common decision tree algorithms, which is mainly used for classification and regression (Breiman et al., 1984;Temkin et al., 1995). CART introduces the knowledge of probability theory and statistics into the research of decision tree. Different from the C4.5 algorithm, the CART algorithm could make a binary partition of the feature space and can split scalar attributes and continuous attributes. The specific algorithm is as follows: 1) Calculate the Gini index of the existing features. The feature with the smallest Gini index is selected as the splitting attribute of the root node. According to the optimal feature and cut point, two sub-nodes are generated from the current node, and the training dataset is allocated to the two subnodes according to the feature. According to an attribute value, a node is segmented to make the data in each descendant subset more "pure" than the data in its parent subset. Gini coefficient measures the impurity of sample division, and the smaller the impurity is, the higher the "purity" of the samples is.
For 2-class problems, the training set S is divided into two subsets S 1 and S 2 according to an attribute A. The Gini coefficient of the given division S is calculated as follows.
Where |S| is the number of samples in set S, and Gini(S i ) is the Gini coefficient of sample set S i , which is calculated as follows: Where |C k | denotes the number of samples belonging to class k in the set S i .

2)
Step (1) is called recursively for two child nodes, and the iteration continues until the samples in all child nodes belong to the same category or no attributes can be selected as splitting attributes. 4) Prune the CART decision tree generated.

Gradient Boosting Decision Tree
Gradient Boosting Decision Tree (GBDT) is an integrated learning algorithm (Hu and Min, 2018;Zhang et al., 2019b). By boosting method, N weak learners are created, which are combined into a strong learner after many iterations. The performance of the strong learner is higher than any weak learner. In GBDT, the used weak learner is the CART regression tree. During each iteration of GBDT, the residual of the previous model is reduced, and a new model is trained and established in the gradient direction of residual reduction, to improve the performance of the classifier. The specific algorithm is shown as follows: 1) Initialize the weak learner.
Where L is the loss function.
Where f t−1 (x) is the classifier during the t − 1 − th iteration.
Where κ tj is the value of the leaf node in the regression tree.
b) The calculated residues are used as new sample data, (x i , r ti ) is utilized to fit a new CART regression tree and the probability of each category is calculated. The leaf node region of the CART regression tree R tj (j 1, 2, . . . , J) is obtained. J is the number of leaf nodes of the regression tree. c) Calculate the optimal coefficient for the leaf area, which is given as follows. d) The strong learner is updated with Eq. 8.
Whenx ∈ R tj is true, I is equal to 1; otherwise, it is equal to 0.
3) The final strong learner f(x) is obtained with Eq. 9.
K-Nearest Neighbor K-Nearest Neighbor (KNN) is a classification algorithm based on supervised learning, which is to classify the data points according to the sample set with the known categories (Liao and Vemuri, 2002). Select the K neighbors with the smallest distance from the input data in the training set, and take the category with the most times among the K neighbors as the category of the classified data point. In the KNN algorithm, the selected neighbors are objects that have been correctly classified.
In the KNN method, the most commonly used measurement of distance is the Euclidean distance. The Euclidean distance of two variables (x i and x j ) is defined as follows.

Logistic Regression
Logistic regression (LR) is utilized to deal with the regression problem, which obtains the minimum result of cost function by gradient descent method to obtain the better classification boundary (Maalouf, 2011;Munshi et al., 2014). LR maps the values of linear regression to the interval [0, 1] by Sigmoid function, which is defined as follows.
In order to solve the logistic regression model, the gradient descent algorithm is generally used to iteratively calculate the optimal parameters of the model.

Naïve Bayes
Naïve Bayes (NB) is one of the most widely utilized models in Bayesian classifiers, which is based on the assumption that the influence of an attribute value on the given class is independent of the values of other attributes (class conditional independence) (Rish, 2001;Li and Guo, 2005). The specific algorithm idea is as follows.
According to the joint probability and the prediction data x, the prediction category of x is defined as follows.
arg max p y c k x .
According to the Bayesian theorem, p(y c k |x) is calculated as follows.
p y c k x p x y c k p y c k p(x) .
Since the denominator is constant for all categories, just maximize the numerator, and Eq. 12 could be defined as follows.
arg max p x y c k p y c k .
Because each feature attribute is conditionally independent, p(x|y c k ) could be calculated as follows.
According to Eq. 15, Eq. 14 can be calculated as follows.
arg max p y c k m i 1 p x i y c k Select the category with the largest posteriori probability as the prediction category.

Ensemble Methods
To improve the classification performance of a single classifier, a novel ensemble method based on a flexible neural tree (FNT) is proposed. An example of our proposed ensemble method is depicted in Figure 2. From Figure 2, it could be seen that the used base classifiers are gcForest, SVM, RF, AdaBoost, decision tree, GBDT, KNN, logical regression, and naïve Baye, which are introduced in detail in Classifiers. Firstly according to the training data, these nine classifiers can output their corresponding confidence level set (c (c 1 , c 2 , . . . , c 9 )), which is utilized as the input layer of the FNT model. The other hidden layers of the FNT model can be created randomly from operator set (F (+ 2 , + 3 , . . . , + n )) and variable set (T (c 1 , c 2 , . . . , c 9 )) (Chen et al., 2006). + i denotes a flexible neuron operator, which can be calculated as follows: Where f(·) is an activation function, a i and b i are the parameters of function, x j is the input variable and w j is the corresponding weight of the input variable.
FNT is a kind of cross-layer neural network, so each hidden layer can contain both operator and variable nodes. Because the structure of the FNT model is not fixed and this model contains many parameters such as a i , b i , and w j , many swarm algorithms have been proposed to search the optimal FNT model by iterations. In this paper, a hybrid evolutionary method based on genetic programming like structure optimization algorithm and simulated annealing was utilized for the training dataset. The detailed algorithms were introduced in another study (Yang et al., 2013).

Hypertension-Related Activity Drug Identification
In order to identify hypertension-related active compounds accurately, an ensemble method based on nine classifiers and a flexible neural tree is proposed. The process of hypertensionrelated active compounds identification is depicted in Figure 3. A total of 44 important studies were collected by querying the literature database according to two keywords: hypertension and network pharmacology. Through analyzing this literature, many important medicines such as Banxia Baizhu Tianma Tang, Chaihu Longgu Muli Decoction, compound reserpine and triamterene tablets, and Huanglian Jiedu Decoction, were collected and 88 hypertension-related compounds were searched. These important compounds were verified by biology experiments or molecular docking, which were used as positive samples in this paper. To obtain the negative samples, 20% of these compounds were randomly selected and input into the DUD•E website to generate decoys (Mysinger et al., 2012). In total, 264 decoys are selected randomly as negative samples.  The molecular descriptions of positive and negative compounds were extracted to constitute the hypertensionrelated dataset. With the collected dataset, our proposed ensemble method was fitted to predict other hypertensionrelated compounds.

EXPERIMENT RESULTS
In this part, the hypertension-related dataset collected is utilized, which contains 88 related compounds and 264 unrelated compounds. AUC, ROC curve, TPR, FRP, Precision, FIGURE 4 | Hypertension-related compound identification performances of ten methods with 2-cross validation methods. Specificity, and F1 were used to test the performance of our proposed method. In our method, the parameters of nine classifiers were set by default. In FNT, the variable set is defined as T (c 1 , c 2 , . . . , c 9 ) and the operator set is defined as F (+ 2 , + 3 , + 4 , + 5 ).
Six cross-validation methods were utilized to validate our proposed method. Nine classifiers were also utilized to identify hypertension-related compounds with the same dataset. The ROC curves and AUC performances with the different crossvalidation methods are depicted in Figures 4-9, respectively.  From these results, it can be seen that gcForest has the best ROC curves and AUC values among the nine single classifiers. Our proposed ensemble method could perform better than gcForest in terms of ROC and AUC. With 2-cross, 4-cross, 6-cross, 8-cross, 10-cross, and 15-cross validation methods, in terms of AUC, our method is 0.1, 0.3, 0.3, 0.7, 0.3, and 0.4% higher than gcForest, FIGURE 8 | Hypertension-related compound identification performances of ten methods with 10-cross validation methods. which reveals that our proposed method performs better than nine single classifiers for hypertension-related compound identification.
The TPR, FRP, Precision, Specificity, and F1 performances of the ten methods with the different cross-validation methods are listed in Tables 1-6, respectively. With 2-cross validation and 4cross validation methods, LR could obtain the highest TPR performances, which shows that LR could identify more true hypertension-related compounds. For Table 1, RF and SVM have the best FPR performance, which shows that these two methods could identify less non-related compounds as related ones. SVM also has the highest Precision and Specificity performances among the ten methods. For Table 2, RF has the best FPR, Precision, and Specificity performances. Our method performed best in terms of F1, which reveals that it could identify hypertension-related compounds more accurately overall. With 6-cross validation, 8-cross validation, 10-cross validation, and 15-cross validation methods, our methods perform best among ten methods in terms of TPR, FRP, Precision, Specificity, and F1, except that RF has the lowest performance with 4-cross validation methods. The results show that our proposed ensemble method could identify more true hypertension-related and hypertension-unrelated compounds than the other nine single classifiers.

DISCUSSION
To investigate the performance of our proposed ensemble further, two classical ensemble methods (averaged ensemble and voting ensemble) were also utilized to infer hypertension-related compounds. The F1 and AUC performances of the hypertension-related compounds by three ensemble methods are depicted in Figure 10 and Figure 11, respectively. From Figures 10, 11, it can be seen that our proposed ensemble method obtained better F1 and AUC performances than averaged and voting ensemble methods, which also shows that our method could identify hypertension-related compounds more accurately than the other two classical ensemble methods.

CONCLUSION
To identify hypertension-related closely active compounds, this paper proposed a novel ensemble method based on a flexible neural tree and nine classifiers. In our method, the classification results of nine single classifiers was utilized as the input vector of the flexible neural tree. An FNT model was utilized as a nonlinear ensemble method to identify hypertension-related drug activity. A hybrid evolutionary method based on genetic programming like structure optimization algorithm and simulated annealing is proposed to evolve the FNT model. In order to test the performance of our proposed ensemble method, data were extracted from hypertension-unrelated and hypertension-related compounds collected from up-to-date literature. By the different cross-validation methods, our proposed method obtained better ROC curves and AUC values than nine other single classifiers. Our proposed method also performs better than other single classifiers in terms of TPR, FRP, Precision, Specificity, and F1 in most cases. We also compare our proposed ensemble method with the averaged and voting ensemble methods. The results reveal that our method could identify hypertension-related compounds more accurately than the two classical ensemble methods.  Frontiers in Genetics | www.frontiersin.org October 2021 | Volume 12 | Article 768747