Machine learning models of clinically relevant biomarkers for the prediction of stable obstructive coronary artery disease

Background In patients with suspected obstructive coronary artery disease (CAD), evaluation using a pre-test probability model is the key element for diagnosis; however, its accuracy is controversial. This study aimed to develop machine learning (ML) models using clinically relevant biomarkers to predict the presence of stable obstructive CAD and to compare ML models with an established pre-test probability of CAD models. Methods Eight machine learning models for prediction of obstructive CAD were trained on a cohort of 1,312 patients [randomly split into the training (80%) and internal validation sets (20%)]. Twelve clinical and blood biomarker features assessed on admission were used to inform the models. We compared the best-performing ML model and established the pre-test probability of CAD (updated Diamond-Forrester and CAD consortium) models. Results The CatBoost algorithm model showed the best performance (area under the receiver operating characteristics, AUROC, 0.796, and 95% confidence interval, CI, 0.740–0.853; Matthews correlation coefficient, MCC, 0.448) compared to the seven other algorithms. The CatBoost algorithm model improved risk prediction compared with the CAD consortium clinical model (AUROC 0.727; 95% CI 0.664–0.789; MCC 0.313). The accuracy of the ML model was 74.6%. Age, sex, hypertension, high-sensitivity cardiac troponin T, hemoglobin A1c, triglyceride, and high-density lipoprotein cholesterol levels contributed most to obstructive CAD prediction. Conclusion The ML models using clinically relevant biomarkers provided high accuracy for stable obstructive CAD prediction. In real-world practice, employing such an approach could improve discrimination of patients with suspected obstructive CAD and help select appropriate non-invasive testing for ischemia.


Introduction
Estimating the probability of coronary artery disease (CAD) in patients with stable angina or anginal equivalent symptoms is a frequent challenge. The current guidelines recommend estimation of the pre-test probability of CAD scores to guide decisions on whether diagnostic testing could be deferred or performed, and whether the initial test should be non-invasive or invasive (1). However, recent studies have shown that the performance of the traditional pretest probability of CAD models is limited in estimation of obstructive CAD (2,3). Moreover, the pre-test probability of CAD models does not reflect the current regulatory status of risk factors such as hypertension, diabetes mellitus (DM), and dyslipidemia.
Machine learning (ML) involves the application of artificial intelligence (AI) that uses computer algorithms to identify patterns in large datasets with a multitude of variables to capture high-dimensional, non-linear relationships among clinical features. Data-driven techniques based on ML can improve the performance of risk predictions by exploiting large data repositories to identify novel risk predictors agnostically and more complex interactions between them. However, only few studies have been conducted on stable obstructive CAD using ML of clinical risk factors and blood biomarkers commonly used in clinical practice. Therefore, we aimed to develop ML models using these features to predict stable obstructive CAD and determine the ranking of the features' predictive contribution. We also compared the ML models with the established pre-test probability of CAD models to evaluate whether there were significant improvements in discrimination.

Method Study population
We included a cohort of 4,906 patients who visited the outpatient department for angina or anginal equivalent symptoms and underwent invasive coronary angiography at Dankook University Hospital between August 2014 and January 2016. Obstructive CAD was defined as any stenosis 70% or greater in the epicardial coronary artery, 50% or greater in the left main coronary artery, or both. Non-obstructive CAD was defined as a stenosis 20% or greater but less than 70% in any other epicardial coronary artery, or a coronary artery stenosis 20% or greater but less than 50% in the left main coronary artery, as recorded by physicians in the catheterization report. No apparent CAD was defined as all coronary stenoses less than 20% or luminal irregularities. The case group was defined as having obstructive CAD, and the control group was defined as having no apparent CAD. When creating ML models, the inclusion criteria were patients who were diagnosed with chronic stable coronary syndrome after visiting the outpatient department with angina or anginal equivalent symptoms; the exclusion criteria were patients who were diagnosed with acute myocardial infarction (AMI) based on the fifth universal definition of myocardial infarction, had non-obstructive moderate CAD (20-70% stenosis), and previously underwent percutaneous coronary intervention (PCI).
Finally, 1,312 patients (case group = 861, control group = 451) were selected for the analysis. A subset of the dataset was randomly selected to train the risk-prediction algorithms, and the remaining dataset was used for validation ( Figure 1). This study was approved by the Institutional Review Board of the Dankook University Hospital (2018-09-014).

Data collection
Baseline information was collected from patients with suspected CAD admitted for invasive coronary angiography, including demographics, cardiovascular risk Flowchart of the study population and process. CAD, coronary artery disease.
factors [hypertension, DM, dyslipidemia, chronic kidney disease (CKD), and smoking status], and biomarkers [hemoglobin A1c (HbA1c), creatinine clearance, highsensitivity cardiac troponin T (troponin T), and lipid profile]. These parameters were also used in the established pre-test probability scores for analysis.

Machine learning algorithms and feature importance
Eight supervised ML algorithms were selected: CatBoost (4), Extreme gradient (XG) boost (5), gradient boost (6), Light Gradient Boosting Machine (lightGBM) (7), MultiLayer Perceptron (MLP) (8), support vector machine with a linear kernel (SVM) (9), Random forest (10), and K-nearest neighbor (11). Each ML model was implemented using Python 3.8.2, with the following packages: xgboost for extreme gradient boost, catboost for CatBoost, lightgbm for lightGBM, pytorch for MultiLayer Perceptron, and scikitlearn for the other ML algorithms. For the MLP and SVM algorithms, categorical features were represented by one-hot encoding. Hyperparameters were tuned using the Bayesian hyperparameter tuning library optuna with fivefold crossvalidation on the training population (Supplementary Table 1). To interpret the ML prediction models, we used SHapley Additive exPlanations (SHAP). The SHAP value assesses the impact of each variable by representing the change in log odds when a variable is hidden from the model (12). The MissForest algorithm was used for imputation of missing values in the ML models, except for boosting algorithms (13).
The study population was randomly split into the training (80%; case group = 677, control group 373) and validation (20%; case group = 184, control group = 78) sets. To control the overfitting caused by an imbalanced dataset, the bootstrap resampling method was applied, obtaining equal proportions of numbers in each group of the training population (10 bootstrap samples: case group = 373, control group = 373). To evaluate feature importance, we estimated the SHAP values of 48 available variables in the CatBoost model (Supplementary Figure 1). Twelve variables for obstructive CAD were selected in the final prediction models based on the recursive feature elimination and visual inspection of a SHAP-dependence plot.

Statistical analysis
The Revised Diamond-Forrester score (2), CAD consortium basic, and CAD consortium clinical (14) were calculated to compare model performance. The models were compared with ML-based models by the area under the receiver operating characteristics (AUROC) using the DeLong method (15) and Matthews correlation coefficient (MCC) (16). The MCC is a useful metric for evaluating binary classification, especially for imbalanced datasets. Continuous variables were expressed as mean ± standard deviation (SD) or median (interquartile range) and were compared by Student's t-tests or Wilcoxon rank-sum tests. Categorical variables were expressed as proportions and compared by χ 2 test. A two-sided p-value < 0.05 was considered significant for all the analyses. Table 1 presents the baseline characteristics of the development and validation datasets. The mean age of the 1,312 patients was 63 ± 11.8 years, and 59.4% were men. The CAD group was significantly older, had higher systolic blood pressure, and more frequent hypertension, DM, and dyslipidemia than the no CAD group. Moreover, the CAD group had higher levels of HbA1c, troponin T, and triglycerides than the no CAD group. In contrast, creatinine clearance and high-density lipoprotein (HDL) cholesterol levels were significantly lower in the case group.

Patient characteristics
Model performance and comparison to the established model  (Figure 2B). The AUROC, MCC, accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and F1 of all the risk prediction models are presented in Table 2.

Feature importance
The 12 potential variables for stable obstructive CAD prediction were ranked using SHAP values. Age, sex, hypertension, troponin T, HbA1c, triglycerides, and HDL cholesterol were important features in our study (Figure 3A). To identify features that influenced the prediction model, we constructed a SHAP summary plot of CatBoost. The plot shows how the variable values are related to the SHAP values in the training dataset. Higher SHAP values were associated with higher CAD probability ( Figure 3B). The SHAP-dependence plot (Figure 4) can also be used to understand how a single feature affects the output of the CatBoost prediction model. The y-axis values indicate the SHAP values of the features, and the values of features for the x-axis were in the SHAP-dependence plot. In the plot, we visualized how the influence of a feature changed as its values varied. SHAP values exceeding zero for specific features represent increased risk of CAD.

Discussion
The main findings of our analysis were as follows: (1) the ML-based model (CatBoost), using clinically relevant biomarkers, exhibited a more accurate prediction of stable obstructive CAD than the established pre-test probability models, (2) using a novel ML-based model, we identified important features for the diagnosis of obstructive CAD.
Accurate prediction of obstructive stable CAD still represents an unmet need. Current guidelines recommend assessing the probability of obstructive CAD from clinical risk factors and, according to this pre-test, probability refers to non-invasive testing, invasive coronary angiography, or no further assessment (1). However, the diagnostic performance of established pre-test probability models is limited in the estimation of obstructive CAD in contemporary cohorts. Previous data have shown that the current model overestimates the probability of obstructive CAD in unselected patients (17). Another study demonstrated that the updated 2019 ESC guideline pre-test probability recommendations tended to underestimate slightly the disease in the SCOT-Heart trial cohort (18).
As the ML algorithm has been recently used for the diagnosis and prognosis of coronary artery disease, its predictive ability has improved significantly compared with established pretest and prediction models. In the CREATION cohort study, the ML model provided better accuracy and discrimination than the existing traditional model. Using the ML method instead of established pre-test probability models (modified Diamond-Forrester and CAD consortium score) would imply a correct change in diagnostic strategy in 22.2% of the patients (19). From the CONFIRM registry, it has been shown that an ML model combining clinical features and coronary artery calcium score can accurately estimate the pre-test probability of CAD (20). Also, recent studies have attempted to diagnose stable CAD using multiple biomarkers, but there are limitations regarding difficulties in direct clinical practice application (21).
Only few studies have been conducted on stable obstructive CAD prediction by incorporating multiple biomarkers into the  Values are n (%), mean ± SD (standard deviation), or median (Q1, Q3). BMI, body mass index; CRP, C-reactive protein; HbA1c, hemoglobin A1c; HDL, high-density lipoprotein; LDH, lactate dehydrogenase; LDL, low-density lipoprotein; NT-proBNP, N-terminal pro-brain natriuretic peptide. ML algorithm. The ML-based model could be more accurate and account for subtleties in data that are overlooked by linear assumption. In this study, the SHAP value was found to affect obstructive CAD prediction in the following order: troponin T, HbA1c, triglyceride, creatinine clearance, and HDL cholesterol.
This means that the SHAP values of HbA1c, HDL cholesterol, triglyceride, and creatinine clearance, which reflect the current state of the disease, were higher than the SHAP values of DM, dyslipidemia, and CKD. Therefore, it may be more helpful in predicting the disease. In our study, even if troponin T was very  finely detected within the normal range, it contributed to the prediction of obstructive CAD. Previous studies have reported that elevated levels of troponin T are associated with increased coronary artery plaque volume, structural heart disease, and cardiovascular events (22,23). Therefore, an ML-based model that incorporates these variables could be more accurate in predicting the disease. Moreover, laboratory data and multiple biomarkers can be directly sampled in an outpatient clinic, and results can be easily obtained; therefore, it is expected that ML algorithms developed based on these data can serve as a pre-test probability model in real-world practice. The application of the new pre-test probabilities has important consequences in selecting appropriate diagnostic testing. ML-based models may be helpful in clinical decisions when non-invasive diagnostic tests are not available. Furthermore, AI-based integrated analysis of all data, including non-invasive diagnostic tests, will contribute significantly to patients' precise diagnosis.
This study had several limitations. First, this was a retrospective single-center analysis and thus susceptible to data selection and measurement biases. Second, our MLbased models were not externally validated. Our models were independently divided into training and validation sets to limit overfitting to some extent. In the future, we should conduct a performance test using completely separated test data, which are not used for model development. Third, some values were missing from the data. Missing values could be handled in the boosting algorithm model as the "not available" category. Still, our results were consistent with those obtained with or without missing data imputation (Supplementary Table 2). In the future, detailed and complete hospital-level patient data with minimal missing values will be needed. Fourth, our study did not compare the ML-based model with other non-invasive diagnostic tests. Further randomized control trials comparing the AI-based prediction model and the existing non-invasive stress test are needed to clarify performance power.

Conclusion
In conclusion, we developed and validated a new prediction model for stable obstructive CAD using ML algorithms.
Our ML-based model predicted the probability of obstructive CAD more accurately than the existing pre-test probability of CAD scores. It would be useful to predict the risk of CAD, and helpful to select appropriate non-invasive testing for ischemia.

Data availability statement
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement
The studies involving human participants were reviewed and approved by the Institutional Review Board of Dankook University Hospital. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.
Author contributions JK, SL, and SC designed the study. JR and YC assisted in data acquisition and interpretation. BC and WL performed the