Prediction of Masked Hypertension and Masked Uncontrolled Hypertension Using Machine Learning

Objective: This study aimed to develop machine learning-based prediction models to predict masked hypertension and masked uncontrolled hypertension using the clinical characteristics of patients at a single outpatient visit. Methods: Data were derived from two cohorts in Taiwan. The first cohort included 970 hypertensive patients recruited from six medical centers between 2004 and 2005, which were split into a training set (n = 679), a validation set (n = 146), and a test set (n = 145) for model development and internal validation. The second cohort included 416 hypertensive patients recruited from a single medical center between 2012 and 2020, which was used for external validation. We used 33 clinical characteristics as candidate variables to develop models based on logistic regression (LR), random forest (RF), eXtreme Gradient Boosting (XGboost), and artificial neural network (ANN). Results: The four models featured high sensitivity and high negative predictive value (NPV) in internal validation (sensitivity = 0.914–1.000; NPV = 0.853–1.000) and external validation (sensitivity = 0.950–1.000; NPV = 0.875–1.000). The RF, XGboost, and ANN models showed much higher area under the receiver operating characteristic curve (AUC) (0.799–0.851 in internal validation, 0.672–0.837 in external validation) than the LR model. Among the models, the RF model, composed of 6 predictor variables, had the best overall performance in both internal and external validation (AUC = 0.851 and 0.837; sensitivity = 1.000 and 1.000; specificity = 0.609 and 0.580; NPV = 1.000 and 1.000; accuracy = 0.766 and 0.721, respectively). Conclusion: An effective machine learning-based predictive model that requires data from a single clinic visit may help to identify masked hypertension and masked uncontrolled hypertension.


INTRODUCTION
Hypertension is a major global health risk, affecting 1.13 billion people worldwide (1). However, almost half of people are unaware that they have hypertension (2). It is therefore important to improve the diagnosis and monitoring of hypertension for better management of blood pressure (BP) and to reduce the risk of developing future cardiovascular diseases (CVD) (3).
Masked hypertension (MH) or masked uncontrolled hypertension (MUCH) is defined as normotensive office BP and hypertensive out-of-office BP (4-7). MH refers to treatmentnaïve patients and MUCH to patients with prior hypertension treatments. International registries show that MH/MUCH is a highly prevalent condition, present in up to one in three office-controlled patients (8). Patients with MH/MUCH have an increased risk of mortality and cardiovascular events (6, 9, 10).
Currently, the diagnosis of MH/MUCH depends on out-ofoffice BP measurement, including ambulatory BP monitoring (ABPM) and home BP monitoring (HBPM) (4-7), which take at least 24 h or 7 days, respectively. Whether MH/MUCH patients can be diagnosed early based on the clinical features of a single outpatient visit is still an open question.
Artificial intelligence (AI) approaches have revolutionized the way data can be processed and analyzed. Several studies have shown the potential benefits of AI in the prediction of cardiac arrhythmias, coronary artery disease, heart failure, and stroke (11, 12). However, the application of AI in hypertension diagnosis or classification is still limited (13).
The current study aimed to develop machine learning-based prediction models using accessible clinical characteristics as input features to identify patients with MH/MUCH in actual clinical settings. The models we developed may facilitate the diagnosis of MH/MUCH.

Data Sources and Patient Selection
Data for this study were derived from two cohorts. In the first cohort (cohort 1), patients with hypertension were recruited from six medical centers in Taiwan between 2004 and 2005. The inclusion criteria were as follows: age 20-50 years; patients with essential hypertension; body mass index (BMI) ≤35 kg/m 2 ; fasting glucose level <126 mg/dL without diabetes mellitus; no medical history of severe diseases, including malignancy or failure of the heart, lungs, kidneys, or liver; and no acute disease within 2 weeks prior to the visit. Patients with secondary hypertension were excluded from the study. The inclusion and exclusion criteria were described in detailed in a previous unrelated study (14). The study protocol was approved by the ethics committees of Academia Sinica and the six medical centers.
In the second cohort (cohort 2, the external validation set), patients with hypertension who visited the outpatient clinic of Taipei Veteran General Hospital between 2012 and 2020 were included. The inclusion criteria were as follows: age ≥20 years; patients with essential hypertension; without a medical history of severe diseases, including malignancy or failure of the heart, lungs, kidneys, or liver; and no acute disease within 2 weeks prior to the visit. Patients with secondary hypertension were excluded from the study. The inclusion and exclusion criteria were described in detail in a previous unrelated study (15).
All patients in the two cohorts agreed to participate and signed the informed consent document for the study. Data collection from both cohorts were conducted in accordance with the principles of the Declaration of Helsinki.

Study Design
Data from cohort 1 were used to develop prediction models to identify patients with MH/MUCH and for internal validation. Data from cohort 2 were used for external validation. The study flowchart is shown in Figure 1.

Definition of MH/MUCH
MH and MUCH were defined as office BP < 140/90 mmHg and 24-h average BP ≥ 130/80 mmHg and/or awake (daytime) BP ≥ 130/80 mmHg and/or asleep (nighttime) BP ≥ 120/70 mmHg in untreated and treated patients, respectively (5-7). MH and MUCH were labeled as events based on the office BP and 24-h ambulatory BP measured in each participant in both cohorts.

Prediction Models
Logistic regression (LR), random forest (RF), eXtreme Gradient Boosting (XGboost), and artificial neural network (ANN) were used as the classifiers to obtain a comprehensive spectrum of prediction models. All the models were developed using RStudio (version 1.3.1056, RStudio, PBC, Boston, MA, USA). The algorithms and packages used are listed in Supplementary Table 2. All models returned discriminative outputs of 1 to indicate events or 0 to indicate non-events. FIGURE 1 | Study flowchart. The data set used in each step is indicated as colored columns on the right side (training set for step 2 and 3, validation set for step 4 to 6, test set for step 7, cohort 2 for step 8). Mean imputation was done to all data sets using the mean of the training set in the LR, RF, and ANN models after splitting. Standardization was done to all data sets in the LR and ANN models after mean imputation. In step 5, 21 among 33 candidate variables were selected as predictor variables in the LR model, 6 in RF, 27 in XGboost, and 24 in ANN. The predictor variables were written in descending order of importance. ACEI/ARB, angiotensin-converting enzyme inhibitor/angiotensin receptor blocker; ACEI/ARB_0, dummy variable of not taking ACEI/ARB; A+B+C+D, combination of ACEI/ARB and beta-blocker and CCB and thiazide; A+C, combination of ACEI/ARB and CCB; A+C+D, combination of ACEI/ARB and CCB and thiazide; A+D, combination of ACEI/ARB and thiazide; ALT, alanine aminotransferase; ANN, artificial neural network; Beta-blocker_1, dummy variable of taking beta-blocker; Beta-blocker_0, dummy variable of not taking beta-blocker; BMI, body mass index; CCB, calcium channel blocker; CCB_0, dummy variable of not taking CCB; Current smoker_0, dummy variable of not current smoker; DBP, diastolic blood pressure; eGFR, estimated glomerular filtration rate; HDL-C, high-density lipoprotein-cholesterol; LDL-C, low-density lipoprotein-cholesterol; LR, logistic regression; MAP, mean arterial pressure; PP, pulse pressure; SMOTE-NC, synthetic minority oversampling technique-nominal continuous; Thiazide_0, dummy variable of not taking thiazide; RF, random forest; SBP, systolic blood pressure; TC, total cholesterol; TG, triglyceride; UA, uric acid; WHR, waist-to-hip ratio; XGboost, eXtreme Gradient Boosting.

Development of Prediction Models
As shown in Figure 1, the pipeline to develop the prediction models consisted of the following steps: 1. data collection and preprocessing; 2. oversampling; 3. training models; 4. tuning hyperparameters; 5. importance ranking of 33 candidate variables and feature selection for predictor variables; 6. tuning probability threshold; 7. performance evaluation (internal validation); and 8. external validation.
Participants in cohort 1 (n = 970) were randomly split into training set (n = 679), validation set (n = 146), and test set (n = 145) in a 0.7/0.15/0.15 ratio with balanced levels. Missing values in the training set, validation set, test set, and external validation set were replaced by the mean of all available values for the same variable in the training set in the LR, RF, and ANN models (only 3 missing data points out of 26,190 data points in cohort 1 and 63 missing data points out of 11,232 data points in cohort 2). In the LR and ANN models, all variables were scaled (normalized) by subtracting the mean and then dividing by the standard deviation (SD) of the training set.
Given that there was a class imbalance between events and non-events and to overcome the accuracy paradox, we performed the synthetic minority oversampling techniquenominal continuous (SMOTE-NC) to equalize the number of events and non-events in the training set (random oversampling and random undersampling were also performed, but with poorer performance) (28).
To obtain the maximum area under the receiver operating characteristic curve (AUC) in the validation set during model training, we tuned the hyperparameters using a random search technique (29). Feature importance ranking and supervised feature selection were performed to prevent overfitting and to achieve the maximum AUC in the validation set (30). The details of feature selection are presented in Supplementary Material 2. We established the confusion matrix and calculated F 1 score (= 2 × precision×recall precision+recall ) while changing the decision threshold of the classifier from 0 to 1 (threshold-moving) in the validation set. We then selected the optimal probability threshold yielding the largest F 1 score and published the final models (Supplementary Material 3). The test set and the external validation set were always independent of the training and tuning processes during the development of the models.

Performance Metrics of Internal Validation
To evaluate the performance of various models in the test set, we computed the sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and F 1 score. Receiver operating characteristic (ROC) curves were plotted along with the AUC. For AUC calculation, all predicted results were converted to probabilities.

External Validation
The external validity of the model was then evaluated with the external validation set. Model discrimination was assessed by plotting ROC curves and calculating the AUC. The sensitivity, specificity, PPV, NPV, accuracy, and F 1 score were also computed.

Statistical Analysis
Quantitative variables are expressed as mean ± SD, and categorical variables are expressed as percentages. Continuous parametric data between cohorts 1 and 2 were compared using an unpaired Student's t-test. Continuous parametric data between the training set, validation set, and test set were compared by one-way analysis of variance. Non-parametric data were compared using the Mann-Whitney test. Categorical variables were analyzed using the chi-square test or Fisher's exact test. Spearman's rank correlation coefficients were calculated between candidate variables. Statistical significance was inferred at a twosided P-value < 0.05. Statistical analysis was performed using the SPSS software (version 21.0, SPSS Inc., Chicago, IL, USA).

Hyperparameters and Importance Rank of Candidate Variables
The tuned hyperparameters in the four models are presented in Supplementary Table 3 (Figures 1, 2).

Performance of Prediction Models in External Validation
The ROC curves and the AUC are shown in Figure 3B. Similar to the results in internal validation, the RF model exhibited the largest AUC (0.837, 95% CI 0.800-0.874), whereas the LR model exhibited the smallest AUC (0.571, 95% CI 0.515-0.627).

DISCUSSION
In the present study, we developed four models for MH/MUCH prediction using patient features obtained in a single outpatient visit and tested them. All models had high sensitivity and NPV. The RF, XGboost, and ANN models had AUC and F 1 scores that surpassed those of the LR model. Among them, the RF model, composed of 6 predictor variables, exhibited the best overall performance. In addition, age, male sex, current smoker, office SBP, office DBP, office MAP, office PP, eGFR, creatinine, TG, HDL-C, ALT, beta-blocker, and thiazide were selected as predictor variables in more than three models, indicating their close association with MH/MUCH.
Patients with MH/MUCH had a significantly higher risk of cardiac/cerebrovascular events than those with controlled hypertension but a similar risk to those with sustained hypertension (9, 10). Identifying these patients and initiating appropriate treatment is a priority. Currently, out-of-office BP monitoring, either ABPM or HBPM, is the gold standard to diagnose these patients (4-7). However, the use of out-of-office BP monitoring is usually limited for many reasons, such as the shortage of resources, great consumption of time, poor compliance, and poor adherence of patients (31,32). It is important to find a more efficient way to identify this particular patient group.
To the best of our knowledge, the present study is the first to report the development and evaluation of prediction models for MH/MUCH. The strength of the present study is the reasonable discrimination of the RF model in the external validation set, despite the high dissimilarity between cohort 1 and 2. The temporal, geographical, and domain validation of our model (33) prove its transportability and applicability to actual outpatient  settings. It was suggested that a high NPV, as in the present study, is desirable when a condition is serious, largely asymptomatic, or if treatment for a condition is advisable early in its course (34), which matches the features of MH/MUCH (24).
The reason the RF model produced the best performance may be attributable to its ability to overcome the multicollinearity of our given data (35). The RF algorithm was previously used to define SBP variability features for cardiovascular outcome prediction in the Systolic Blood Pressure Intervention Trial (SPRINT) trial (36). While interpreting the importance of multicolinear variables is still difficult in the RF algorithm, accuracy is much less affected (37), making it a favorable algorithm. Some of the given variables in our dataset are highly correlated (Supplementary Figure 1), creating a significant hindrance to linear algorithms such as LR (38).
It is interesting to point out that eGFR and creatinine were included as predictor variables in three models. Several studies have shown that MH/MUCH is associated with the development of chronic kidney disease (CKD) and the progression of kidney disease (16,17). MUCH/MH is also common in patients with CKD and associated with lower eGFR (18), which is consistent with our finding that eGFR and creatinine were important variables for the prediction of MH/MUCH.
In the present study, HDL-C and TG were predictor variables selected in all and three models, respectively. Previous studies have found a correlation between metabolic syndrome and MH/MUCH (6, 19,20). Although one study reported that MH patients had greater waist circumference and lower HDL-C than normotensives (19), another study showed that only office BP contributed significantly (20). Our results suggest that among the criteria for metabolic syndrome, HDL-C and TG have higher significance with the exception of office BP. These findings mark the complexity of MH/MUCH pathophysiology, and also imply that different parameters in metabolic syndrome have variable degrees of impact or association with MH/MUCH, providing us with further insights into the underlying mechanisms.
It has been suggested by previous studies that patients with MH/MUCH tend to have a more active sympathetic tone out of the office due to neurogenic abnormalities (21)(22)(23). In the present study, beta-blocker and alpha-blocker are chosen in all and two models, respectively, and these drugs are sympathetic antagonists commonly used to treat CVD and hypertension. However, these associations are indicated by cross-sectional comparisons, and direct causal inferences cannot be ascertained.
As for demographic variables, previous studies showed that smoking was associated with MH/MUCH (5, 6, 24-27). The prevalence of MH/MUCH is also found to be greater in men (5,6,24,27). It is consistent with our finding that current smoker and male sex were important variables for the prediction of MH/MUCH.
In the present study, MH/MUCH was defined according to daytime as well as nighttime ambulatory BP. Patients with MH/MUCH increased from 276 (28.5%) to 386 (39.8%) in cohort 1 and increased from 70 (16.8%) to 140 (33.7%) in cohort 2 when we included nighttime BP as one of our criteria to define MH/MUCH aside from office BP, 24-h average BP, and daytime ambulatory BP. High prevalence of nighttime MUCH (or masked uncontrolled nocturnal hypertension) was also noted in the study by Coccina F et al. in which 357 (48.5%) patients among 738 treated hypertensive patients were reported to have nighttime MUCH (39). In their study, patients with nighttime MUCH had an increased risk of cardiovascular events compared to those with controlled hypertension. With our models, physicians could identify patients with not only daytime but also nighttime MH/MUCH.

Study Limitations
The current study has several limitations that must be considered. First, external validation was only performed in patients with hypertension in Taiwan. Data with more representation of diverse populations, a larger sample size, and untreated patients must be obtained to demonstrate better transportability. Second, there were some differences between the inclusion criteria of the two cohorts. Compared to cohort 2, cohort 1 had additional inclusion criteria of "age ≤ 50 years old, " "BMI ≤ 35 kg/m 2 ", and "fast glucose level < 126 mg/dL with no diabetes mellitus." Despite the differences of baseline characteristics between the two cohorts, the performance of our models in external validation was acceptable. Third, HBPM was not included in the diagnostic criteria for MH/MUCH. Even though previous studies showed a greater importance of ABPM to MH (24, 40), the present study may be limited by not identifying all MH/MUCH patients. Forth, some variables found to be related to MH/MUCH were not available in our cohorts, such as echocardiographic variables (41). Finally, our models were developed to predict MH and MUCH together. However, there are potential different pathophysiology and etiology behind MH and MUCH (6, 24). Although some previous studies also did not differentiate MUCH from MH (41)(42)(43)(44), further studies should be considered to develop models of MH and MUCH separately in order to increase the accuracy of prediction models.

CONCLUSION
Patients with MH/MUCH are at an increased risk of CVD compared to patients with controlled hypertension. Due to their "masking nature, " they are, however, largely underdiagnosed and often left untreated. Our machine learning-based prediction models, especially RF, could assist physicians with their ability to detect MH/MUCH patients using clinical data obtained in a single outpatient visit. Through timely and proper handling of these models, patients with MH/MUCH could be able to receive early diagnosis and appropriate treatment to prevent cardiovascular events in the future.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Taipei Veterans General Hospital. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
M-HH, L-CS, and Y-CW contributed to conception and design, analysis and interpretation of data, and drafted the manuscript. H-BL, P-HH, T-CW, S-JL, W-HP, and J-WC contributed to data acquisition and drafted the manuscript. C-CH contributed to conception, data acquisition, analysis and interpretation of data, and drafted and critically revised the manuscript. All authors gave final approval and agreed to be accountable for all aspects of work ensuring integrity and accuracy.