Comparison of Machine Learning Models for Prediction of Initial Intravenous Immunoglobulin Resistance in Children With Kawasaki Disease

We constructed an optimal machine learning (ML) method for predicting intravenous immunoglobulin (IVIG) resistance in children with Kawasaki disease (KD) using commonly available clinical and laboratory variables. We retrospectively collected 98 clinical records of hospitalized children with KD (2–109 months of age). We found that 20 (20%) children were resistant to initial IVIG therapy. We trained three ML techniques, including logistic regression, linear support vector machine, and eXtreme gradient boosting with 10 variables against IVIG resistance. Moreover, we estimated the predictive performance based on nested 5-fold cross-validation (CV). We also selected variables using the recursive feature elimination method and performed the nested 5-fold CV with selected variables in a similar manner. We compared ML models with the existing system regardless of their predictive performance. Results of the area under the receiver operator characteristic curve were in the range of 0.58–0.60 in the all-variable model and 0.60–0.75 in the select model. The specificities were more than 0.90 and higher than those in existing scoring systems, but the sensitivities were lower. Three ML models based on demographics and routine laboratory variables did not provide reliable performance. This is possibly the first study that has attempted to establish a better predictive model. Additional biomarkers are probably needed to generate an effective prediction model.


INTRODUCTION
In developed countries, Kawasaki disease (KD) is the major cause of acquired heart disease in children (1). The main complication of KD is coronary artery abnormality (CAA) due to systemic vasculitis (1). The effectiveness of high-dose intravenous immunoglobulin (IVIG) therapy has been established as an initial KD treatment (2). However, approximately 10-20% children with KD are refractory to this treatment and develop persistent or recurrent fever after initial IVIG therapy (3,4). IVIG resistance is a risk factor for the occurrence of CAA (5). Moreover, the development of a more effective treatment options has been challenging. The American Heart Association has reported that patients who were predicted to be at a high risk for development of CAA may benefit from primary adjunctive therapy such as IVIG and corticosteroids (2). Therefore, developing a reliable tool for predicting IVIG resistance is important to reduce the occurrence of CAA.
Several scoring systems (6)(7)(8)(9)(10)(11)(12) have been proposed. However, the predictive capacity of the existing scoring systems may not be sufficient, and some scoring systems have poor predictive performance for external datasets (13)(14)(15). Machine learning (ML) techniques have been applied to clinical diagnosis and prognosis prediction in many fields of medicine (16). To the best of our knowledge, few studies have applied ML methods for predicting resistance to initial IVIG therapy in patients with KD (17). We aimed to construct an optimal ML method for predicting IVIG resistance in children with KD using commonly available clinical and laboratory variables.

Patients and Data Collection
We retrospectively collected clinical records of patients with KD who were diagnosed based on the Japanese diagnostic guidelines for KD (18) and hospitalized at Tsugaruhoken Medical COOP Kensei Hospital between January 2010 and October 2019. Patients diagnosed with KD presented with minimum five of the six major symptoms, including fever. Patients with only four or less major symptoms and those with CAA were not included. We excluded children who received initial IVIG treatment ≥10 days after the onset and children administered initial doses of <2 g/kg/day. We defined the first illness day as the first day on which a patient had fever. We defined a responder as a patient whose temperature had decreased to <37.5 • C within 36 h after initial IVIG treatment (9,15).
We defined coronary arteries as abnormal when the luminal diameters were more than 3.0 mm in children younger than Abbreviations: Alb, albumin; ALT, alanine aminotransferase; AST, aspartate aminotransferase; AUC, area under the receiver operator characteristic curve; CAA, coronary artery abnormality; CRP, C-reactive protein; CV, cross-validation; Ht, hematocrit; IVIG, intravenous immunoglobulin; KD, Kawasaki disease; LR, logistic regression; ML, machine learning; Na, sodium; PLT, platelet count; SVM, support vector machine; TBil, total bilirubin; WBC, white blood cell count; XGB, eXtreme gradient boosting. 5 years or more than 4.0 mm in those 5 years and older, when the internal diameter of a segment was 1.5 times or greater than that of an adjacent segment, or when the luminal contour was evidently irregular (19). We recorded the maximum coronary artery diameter within 1 month after the onset of the disease.

Statistical Analysis
We performed statistical analyses using Python version 3.6 (Python Software Foundation). We applied Mann-Whitney U-tests for continuous variables and Chi-square tests for categorical variables.
We evaluated the predictive performance of the three supervised ML classifiers and existing scoring systems. We trained logistic regression (LR) with L2 regularization, linear support vector machine (SVM), and eXtreme gradient boosting (XGB) models to predict IVIG resistance, using scikitlearn and XGBoost packages. We evaluated the predictive performance based on sensitivity, specificity, and area under the receiver operator characteristic curve (AUC). We produced three ML models with 10 variables that did not contain missing values (months of age, gender, illness days with IVIG administration, WBC, Ht, PLT, AST, Na, Alb, and CRP).
To evaluate the predictive performance of the three ML models and algorithms, we used the nested 5-fold crossvalidation (CV) approach (20) with GridSearchCV for hyperparameter optimization. We applied a nested CV procedure to estimate an unbiased generalization performance of ML algorithms (21). The two CV cycles included an inner loop for tuning hyper-parameters and outer loop for estimating performance in nested CV (Figure 1). First, the original dataset was divided into five data folds with approximately equal numbers of respondent and non-respondent cases. One data fold was reserved for test fold. The remaining four data folds (training folds) were passed to the inner loop. The inner loop performed 5-fold CVs to identify the best hyper-parameter combination. We selected the hyper-parameter combination that maximized each performance metrics over all steps of the inner loop. In LR and linear SVM models, the penalty parameter C was explored in [0.01, 0.1, 1, 10, and 100]. In XGB model, the maximum depth of a tree (max_depth), the minimum sum of instance weight needed in a child (min_child_weight), and gamma were explored in [3,5], [1,2,3], and [0, 3,10], respectively. For the other hyper-parameters, we used default values of the scikit-learn method. These were tuned by testing all possible hyper-parameter combinations in the inner CV. We then trained our model on training folds using the best hyper-parameter combination; thereafter, we evaluated model performance on the test fold. This process was repeated five times, once for each iteration of the outer loop. Finally, we calculated the average performance over 5-folds. We also repeated nested CVs 10 times in separate splits and derived the average of the results to avoid sampling bias and data overfitting.
Additionally, we selected variables using the recursive feature elimination method. Then, we performed a nested 5-fold CV with selected variables in a similar manner. In all, we have constructed and then evaluated two types of models: all-variable model and select-variable model.

Characteristics of Patients
We collected data from 109 children with KD treated at our hospital. We excluded data from 11 children because 9 children had received initial IVIG at <2 g/kg/day and 4 had received initial IVIG treatment ≥10 days after the onset of the disease. Consequently, we statistically analyzed data from 98 children aged 2-109 months. Table 1 summarizes the demographic and laboratory data of patients. Among them, 20 (20%) children were resistant to the initial IVIG therapy. Only the AST and ALT levels were significantly higher in the IVIG-responsive group than in the IVIG resistant group. The proportion of CAA in the IVIG resistant group was higher than that in the IVIGresponsive group.

Predictive Performance of the ML Model
As shown in Table 2, the AUCs of the all-variable models were 0.58-0.60 in all models, and those of the select-variable models were 0.60-0.75. The results on specificity and accuracy were 0.94-0.99 and 0.78-0.79 in the all-variable models, and 0.96-1.00 and 0.78-0.80 in the select-variable models. The results of specificity and accuracy were high, but those on sensitivity were all lower.

DISCUSSION
We retrospectively evaluated the performances of three ML models to predict the resistance to initial IVIG therapy in a single-center pediatric population of KD. Our results revealed that the three ML models based on demographics and routine laboratory variables did not perform reliably.   (13)(14)(15). As shown in Table 2, the existing scoring systems also did not achieve a good prediction against our dataset. Additional clinical information may be needed to improve the prediction model. Owing to the similarity of each clinical and laboratory characteristics between IVIG-responsive andresistant patients in the current dataset, neither our model nor the existing model may have performed reliably. There may be a need to construct and evaluate new models that also incorporate clinical major symptoms (10) and/or other laboratory data such as erythrocyte sedimentation rate (10) or N-terminal pro-brain natriuretic peptide (22).
Our prediction models using three ML techniques have equally less reliable performance as the existing scoring systems; particularly, the sensitivity were low in all ML algorithms. Our results serve as a first step to establish a good prediction tool. Feature engineering or ensemble learning, which combines several ML techniques into one predictive model, may help improve performance. Alternatively, ML models have advantages over the existing prediction scoring systems. The predictive performances of scoring systems could differ depending on countries or ethnicities (11,13,23). ML is flexible and can be suitable for many tasks. Therefore, the ML approach makes it easy for the model to retrain and update the using the newest data.
To our best knowledge, this is the first study to compare the performances of ML methods for predicting IVIG resistance. There is a study which was designed to develop the prediction model using random forest (17). However, validation procedures were not conducted, though the performance was excellent.
Conversely, there are certain limitations. First, the dataset was relatively small. However, we used nested CVs to obtain unbiased estimates of the true error. We also repeated the nested CVs 10 times and averaged the validation error to reduce sampling bias. Nested CV can choose the classification model by obtaining reliable classification performance and avoiding overfitting (24). Second, the present study was conducted based on dataset derived from a single center. Accordingly, our results may not apply to other populations. However, we consider it meaningful to rebuild the model, similarly using the center's original data. Third, this is a retrospective study. We need to perform a combined analysis of three ML models on a prospective basis.
In conclusion, we evaluated the performance of ML models for predicting resistance to IVIG therapy in children with KD. However, our three ML models based only on demographics and routine laboratory variables did not provide reliable performances. Further studies are needed to improve predictive models. Additional biomarkers are likely to be needed to generate an effective prediction model.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, Yasutaka Kuniyoshi, upon reasonable request.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Committee of Tsugaruhoken Medial COOP Kensei Hospital. Written informed consent from the participants' legal guardian/next of kin was not required to participate in this study in accordance with the national legislation and the institutional requirements.

AUTHOR CONTRIBUTIONS
YK designed the study, drafted the manuscript, performed the statistical analysis, and interpreted the results. All authors have read and approved the final manuscript and contributed to data collection.