A Comparison of XGBoost, Random Forest, and Nomograph for the Prediction of Disease Severity in Patients With COVID-19 Pneumonia: Implications of Cytokine and Immune Cell Profile

Background and Aims The aim of this study was to apply machine learning models and a nomogram to differentiate critically ill from non-critically ill COVID-19 pneumonia patients. Methods Clinical symptoms and signs, laboratory parameters, cytokine profile, and immune cellular data of 63 COVID-19 pneumonia patients were retrospectively reviewed. Outcomes were followed up until Mar 12, 2020. A logistic regression function (LR model), Random Forest, and XGBoost models were developed. The performance of these models was measured by area under receiver operating characteristic curve (AUC) analysis. Results Univariate analysis revealed that there was a difference between critically and non-critically ill patients with respect to levels of interleukin-6, interleukin-10, T cells, CD4+ T, and CD8+ T cells. Interleukin-10 with an AUC of 0.86 was most useful predictor of critically ill patients with COVID-19 pneumonia. Ten variables (respiratory rate, neutrophil counts, aspartate transaminase, albumin, serum procalcitonin, D-dimer and B-type natriuretic peptide, CD4+ T cells, interleukin-6 and interleukin-10) were used as candidate predictors for LR model, Random Forest (RF) and XGBoost model application. The coefficients from LR model were utilized to build a nomogram. RF and XGBoost methods suggested that Interleukin-10 and interleukin-6 were the most important variables for severity of illness prediction. The mean AUC for LR, RF, and XGBoost model were 0.91, 0.89, and 0.93 respectively (in two-fold cross-validation). Individualized prediction by XGBoost model was explained by local interpretable model-agnostic explanations (LIME) plot. Conclusions XGBoost exhibited the highest discriminatory performance for prediction of critically ill patients with COVID-19 pneumonia. It is inferred that the nomogram and visualized interpretation with LIME plot could be useful in the clinical setting. Additionally, interleukin-10 could serve as a useful predictor of critically ill patients with COVID-19 pneumonia.


INTRODUCTION
Coronavirus disease 2019 (COVID-19) is a newly recognized illness, caused by the highly contagious severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and spread rapidly around the world in the last two years (Hong et al., 2021). As of February 28, 2022 (based on the WHO statistics), over 430 million confirmed cases and over 5.7 million deaths have been recorded (2022). COVID-19 causes a spectrum of symptoms ranging from mild to severe pneumonia as well as asymptomatic cases. Our previous study indicated that 34.9% patients with viral pneumonia would develop critical illness, and required admission to the ICU. They either had a fraction of inspired oxygen (FiO2) value of at least 60% or more during hospitalization and required mechanical ventilation (Hong et al., 2021). Delayed presentation of symptoms increases the risk of mortality and need for high-intensity healthcare (Suliman et al., 2021). The 28-day mortality span was reported for 61.5% of critically ill patients, with an average interval of 7 days between ICU admission to death in Wuhan, China . Early identification of critical illness grants an opportunity for timely intervention and thus, prevent more complicated, protracted and less successful hospital admissions (Suliman et al., 2021). Anurag et al. validated the Pneumonia Severity Index (PSI)/ PORT, Confusion, Respiratory rate, Blood pressure, 65 years of age and older (CURB-65) and the Severe Community-Acquired Pneumonia (SCAP) scoring system in COVID-19 pneumonia, for prediction of disease severity and 14-day mortality (Anurag and Preetam, 2021). However, in this study the severe COVID-19 pneumonia was defined by PSI/PORT score >130, CURB-65 score ≥53 or SCAP score ≥10 (Anurag and Preetam, 2021). San et al. classified the disease severity according to the interim guidance of the World Health Organization (San et al., 2021). They suggested that predicting high-risk group by the Brescia-COVID Respiratory Severity Scale (BRCSS) and quick SOFA (qSOFA), may improve clinical outcomes in COVID-19 patients (San et al., 2021). Bats et al. defined the severity with arterial oxygen saturation (SaO2) of less than 90% on room air or need of ≥4 L/min oxygen therapy (O2) to obtain a SaO2 ≥94% (Bats et al., 2021) and developed a COVID-19 severity risk score upon hospital admission (Bats et al., 2021). By enrolling patients both with and without pneumonia and using the definition of severity of COVID-19 recommended by the National Health Commission of China, Liang et al. developed a clinical risk score to predict the occurrence of critical illness in hospitalized patients with COVID-19 (Liang et al., 2020a). Using the same definition of severity of COVID-19 as Liang et al. (2020a), Zhang et al. (2020) developed a score consisting of age, WBCs, neutrophil count, glomerular filtration rate and myoglobin, for prediction of disease severity in COVID-19 . Nomogram is a mathematical model that allows for individualized and evidence-based risk estimation, facilitating management-based decision-making. Feng et al. divided patients into three types (moderate, severe, and critical type) and reported that a nomogram based on chest CT and clinical characteristics could predict the disease progression in COVID-19 pneumonia patients much earlier (Feng et al., 2020). Li reported that, by using the definition of severity of COVID-19 recommended by the National Health Commission of China, a nomogram consisting of CT-based radiomics signature could be used for predicting severe COVID-19 pneumonia . It has already been applied in COVID-19 to predict mortality (Ji et al., 2020) and assess survival (Dong et al., 2021). In summary, different studies used different definition of severity of disease and inclusion criteria. Few included clinical and laboratory prediction scores to identify critical illness in patients with CT confirmed COVID-19 pneumonia.
Machine learning (ML) methods such as deep learning, extreme gradient boosting (XGBoost) and RF focus on how computers learn from data before being applied to real settings. ML methods are useful for developing robust risk models and redefining patient classes (Deo, 2015). Therefore, many applications of ML exist (for clinical diagnosis, prediction, and classification of patients with COVID-19) (Mottaqi et al., 2021).
Previously, Liang et al. have developed a deep learning mediated survival Cox model for early triage of critically ill COVID-19 patients (with and without X-ray abnormality) (Liang et al., 2020b). Deep Learning has also been used for the predictive model for the identification of natural molecules as potential inhibitors of SARS-CoV-2 inhibitors of main protease (Joshi et al., 2021). The XGBoost algorithm has shown to outperform other techniques for various sets of features, in a variety of different settings. Yan et al. has used XGBoost algorithm to identify lactic dehydrogenase (LDH), lymphocyte and C-reactive protein (CRP) as predictors of the mortality of individual patients (Yan et al., 2020). Wang et al. applied the XGBoost model to build a mortalityprediction model using clinical and laboratory data parameters for extrapolation of in-hospital mortality in patients with COVID-19 (Wang et al., 2020b). Liu et al. developed an XGBoost-based clinical model consisting of lymphocyte percentage, lactic dehydrogenase, neutrophil count, and D-dimer on admission for predicting critical illness risk in hospitalized patients with COVID-19 pneumonia . Ryan et al. reported that XGBoost-based algorithm is a useful predictive tool for anticipating patient mortality in COVID-19, pneumonia, and mechanically ventilated patients (Ryan et al., 2020).
In addition, most of the existing scores are developed based on only clinical and laboratory features. As the severity of COVID-19 pneumonia is clearly associated with multifactorial responses, the use of only clinical and laboratory features may result in missing important information from other risk factors. Cytokine storm plays an important role in severe COVID-19 pneumonia (Hu et al., 2021). Therefore, in most severe cases, the prognosis can be markedly worsened by the hyperproduction of proinflammatory cytokines, such as Interleukin-6 (IL-6) and TNF-a, preferentially targeting lung tissue (Costela-Ruiz et al., 2020). However, immune cells such as B Lymphocytes, T cells, CD4 + T & CD8 + T cells, and cytokine profile (such as IL-10), are rarely enrolled in these scores. Hence, further studies are required for developing scoring systems for prediction of critically ill patients with COVID-19 pneumonia (with both cytokine profiles and immune cell data).
The first aim of this study was to develop and compare an extreme gradient boosting (XGBoost) model, RF model, and a conventional LR model (present as nomogram) based on clinical, laboratory data and immune cells and cytokine profiles for prediction of critically ill patients with COVID-19 pneumonia. The second aim was to evaluate the role of immune cells and cytokine profile as potential predictors of the severity of COVID-19 pneumonia.

Study Design, Subject Selection and Ethics
We conducted a post-hoc analysis of a previously reported retrospective cohort study in the First Affiliated Hospital of Wenzhou Medical University in mainland China (Hong et al., 2021). All patients with confirmed COVID-19 pneumonia between January 2020 and March 2020 were eligible for inclusion in this study. A confirmed case of COVID-19 was defined as a positive result on a realtime reverse-transcriptase-polymerase-chain-reaction (RT-PCR) assay of nasal and pharyngeal swab specimens (Hong et al., 2021). Exclusion criteria included lack of pneumonia and unavailability of chest computed tomography scans.

Definition of Severity
Patients with COVID-19 pneumonia were defined as critically ill if they were admitted to the intensive care unit (ICU) and required mechanical ventilation or had a fraction of inspired oxygen (FiO2) of at least 60% or more (Kumar et al., 2009;Hong et al., 2021).

Data Collection and Follow Up
The epidemiological, clinical symptoms & signs, laboratory parameters, cytokine profile, and immune cell data on admission were obtained using data collection forms of electronic medical records. These data included blood chemical analysis, liver, and renal function testing, glucose and coagulation testing, creatine kinase, B-type natriuretic peptide, C-reaction protein, procalcitonin, IL-2, IL-4, IL-6, IL-10, and tumor necrosis factor (TNF)-a, B Lymphocytes, T cells, CD4 + T and CD8 + T cell count. All patients were followed up until March 12, 2020 (Hong et al., 2021). We used LR and machine learning models to differentiate critically ill from non-critically ill patients with COVID-19 pneumonia.

Statistical Analysis
There were missing values in D-dimer, B-type natriuretic peptide levels, cytokine profiles, and immune cells. To handle this issue, missing values were imputed using Multiple Imputations by Chained Equations (MICE), when performing LR and ML analysis (Royston, 2005). MICE has emerged as one of the principal statistical approaches for dealing with missing data. The missing values were replaced by the estimated plausible values to create a "complete" dataset (Royston, 2005).
Categorical values were described as count and proportions and compared by the c 2 test or Fisher's exact test (Hong et al., 2020). According to the results of Shapiro-Wilk test, continuous values were expressed by mean ± SD or median and Inter Quartile Range (IQR) and compared using Student's t-test or the Wilcoxon nonparametric test. All the variables, found to be different between critically ill and non-critically ill patients on univariate analysis, underwent receiver operating characteristic (ROC) curve analysis to identify the valuable single index predictor of critically ill patients with COVID-19 pneumonia. Then, only variables with the area under the receiver operating characteristic curve (AUC) >0.7 were used as potential predictors for critically ill patients having COVID-19 pneumonia (Hong et al., 2017). In addition, an exploratory variable importance analysis was also performed using both XGBoost and RF method to evaluate the role of different variables in prediction of critical illness. In XGBoost method, SHapley Additive exPlanations (SHAP) summary plot was used to quantify the variable importance of each variable, and SHAP force plot was used to explain the individual predictions, respectively (Deshmukh and Merchant, 2020). In the RF method, the importance of each variable was subsequently measured by calculating how much reduction each variable offers when they were added to the RF model using mean decreased accuracy and Gini (Gong et al., 2020).
Risk models were developed using conventional statistical method (forward-conditional step-wise LR), traditional machine learning algorithm (RF), and current state-of-the-art boosting algorithm utilized for gradient boosted decision trees (XGBoost). An RF model is a collection of many decision tree models, each of which is characterized by a tree-like structure (Gong et al., 2020). A gradient boosting ML algorithm (XGBoost) was employed for a binary classification task based on the presence or absence of critically ill patients with COVID-19 pneumonia (Al'aref et al., 2020).
We randomly held out two patients for individualized prediction, the remainder number (61 patients) was used to develop prediction models. When building and tuning prediction models, we used two-fold cross-validation as the resampling strategy to avoid overfitting of the model on new data. Training set was divided into two equal-sized sub-samples in which one subsample was taken for training and the other one for testing over all possible permutations. Analysis was repeated two times (folds). The AUC was calculated for each of the two analyses, using only the respective test data. At last, the mean AUC with 95% CI, and also area under precision recall curve and area under precision recall gain curve was calculated and compared (Saito and Rehmsmeier, 2015). Since the incidence of critical illness in patients with COVID-19 pneumonia was high (34.9%), we selected the best cut-of point (detected where the number of true positives was the highest with sensitivity >90%). This was done by selecting a threshold value at a point where the longest increase in the specificity of the slope declines. The sensitivity, specificity, accuracy, as well as F-score, which is a harmonic mean of recall and precision, were also calculated and compared (Saito and Rehmsmeier, 2015). To overcome the black box problem of XGBoost output and improve its interpretability, the LIME plot was used to explain the individualized prediction.
As for LR analysis, the conditional probabilities for stepwise entry and removal of a factor were 0.05 and 0.06, respectively (Hong et al., 2019). Based on the results of LR, an equation model and nomogram were developed to predict critical illness associated with COVID-19 pneumonia. Model calibration was done by Hosmer-Lomeshow goodness of fit test. Odds ratios (OR) were calculated, with 95% CI. Multicollinearity was considered to be significant if the largest variance inflation factor exceeded 10 (Hong et al., 2020).
A two-tailed P-value of less than 0.05 was considered statistically significant. All statistical analyses were performed in the R and STATA software. Data flow diagram of our study is shown in Supplementary Figure 1.

Clinical Characteristics
A total of 63 hospitalized patients with confirmed COVID-19 pneumonia were enrolled in this study. Baseline clinical and laboratory findings of all patients on admission have been described before (Hong et al., 2021). In summary, out of the 63 patients, 22 (34.9%) required high-flow nasal cannula or higher-level oxygen support measures to correct hypoxemia during their hospital stay and were classified as critically ill patients. The remaining 41 patients were identified as noncritically ill. The mean age of the patients was 55.9 ± 15.3 years. Among these, 41 (65.1%) patients were men. The mean time from onset of symptoms to the hospital admission was 6.9 ± 3.7 days. The most frequent symptoms at the onset of illness were fever and cough (98.4 and 61.94% respectively). Of the clinical characteristics and laboratory findings, the respiratory rate, leukocyte, neutrophil counts, levels of aspartate transaminase, albumin, serum procalcitonin, D-dimer, and B-type natriuretic peptide levels were useful predictors of critically ill patients with COVID-19 pneumonia, having an AUC of more than 0.7 (Hong et al., 2021). Most patients had increased IL-6, IL-10, and decreased CD4 + T cells. The median values of these variables in all patients are shown in the Table 1.

Cytokine and Immune Cells
As for the cytokine profiles and immune cells, univariate analysis revealed that in comparison to the non-critically ill patients, patients with critical illness had higher levels of, IL-6 and IL-10, as well as lower levels of T cells, CD4 + T, and CD8 + T cells ( Figure 1) (Hong et al., 2021). There was no significant difference observed among patients with respect to IL-2, IL-4, Tumor Necrosis Factor Alpha (TNF-a), and B Lymphocytes. Among these, the T cells (AUC: 0.72 ± 0.09), CD4 + T levels (AUC: 0.72 ± 0.08), IL-6 (AUC: 0.85 ± 0.06), and IL-10 (AUC: 0.86 ± 0.06) were useful predictors of critically ill patients with

Exploratory Variable Importance Analysis
Leukocyte and T cells were not included in further analysis because of strong multicollinearity. Therefore, the ten variables (respiratory rate, neutrophil counts, aspartate transaminase, albumin, serum procalcitonin, D-dimer and B-type natriuretic peptide, CD4 + T cells, IL-6 and IL-10) were used for machine learning models. Based on the RF analysis, IL-10 was the most important predictor of critical illness in patients with COVID-19 pneumonia, followed by IL-6 and serum procalcitonin ( Figure 3). SHAP summary plot revealed the relative importance of each feature in the XGBoost analysis. IL-10, IL-6, and CD4 + T cells were the three most important features (Figure 4).

Development and Comparison of Prediction Models
The same ten variables (respiratory rate, neutrophil counts, aspartate transaminase, albumin, serum procalcitonin, Ddimer, B-type natriuretic peptide, CD4 + T cells, IL-6 and IL-10) were used for multivariable logistic analysis.
When we compared the predicting models in two-fold crossvalidation, the mean AUC of ROC curve analysis for LR model, RF model, and XGBoost model for the prediction of SAP was 0.91, 0.89, and 0.93, respectively ( Figure 6). The area under precision recall curve also showed that the XGBoost model (0.82) achieved a higher mean area under precision recall curve than that of the LR (0.81) and RF model (0.75) (Figure 7). The area under precision recall gain curve for XGBoost model, LR model and RF model was 0.53, 0.49, and 0.43, respectively (Figure 8).
XGBoost model achieved a sensitivity of 90.5%, specificity of 87.5%, and diagnostic accuracy of 88.5% and F-score of 84.4%. As a comparison, when RF and LR model achieved a similar of sensitivity of 90.1 and 90.5%, respectively, the RF and LR model achieved a lower specificity, diagnostic accuracy and Fscore ( Table 2).

Explanation of XGBoost Model Results: Individualized Prediction
To clarify the model prediction for individual patients, the LIME plot shows two typical predictions made by the XGBoost model, in which one is for critically ill and the other for non-critically ill patients with COVID-19 pneumonia (Figure 9). The length of the bar for each feature indicates the importance (weight) of that FIGURE 3 | Variable importance plot using RF model for the critically ill COVID-19 pneumonia patients. IL-10 and IL-6 were the most important variables in determining critical illness by either mean decrease accuracy or by mean decrease Gini. feature in making the prediction. A longer bar, therefore, indicates a feature that contributes more towards or against the prediction.

DISCUSSION
IL-10 can be produced by many different myeloid and lymphoid cells, especially produced in large quantity by T helper 2 (Th2) during COVID-19 infections (Huang et al., 2020). It serves as an    anti-inflammatory cytokine by suppressing macrophage and Dendritic Cells (DCs), thereby limiting T helper 1 (Th1) and T helper 2 (Th2) effector responses (Couper et al., 2008).
Premature excretion during a virulent infection can cause overwhelming infection. Conversely, it may lead to severe tissue damage when produced too late during an avirulent infection (Couper et al., 2008). A recent study proposed that dramatic early proinflammatory IL-10 elevation may play a pathological role in COVID-19 severity as its proinflammatory or anti-inflammatory effects that distinguish depending on the different course of disease (Lu et al., 2021). Increasing evidence supports the elevation of IL-10 is correlated to the severity of COVID-19 Huang et al., 2020;Wang et al., 2020a;Zhao et al., 2020;Lu et al., 2021). Our study indicated the importance of IL-6 and 10 variables for RF ( Figure 3) and SHAP summary plot in XGBoost method ( Figure 4). Results confirm that IL-10 is the most important variable for the prediction of critical illness in patients with COVID-19 pneumonia. In addition, based on ROC analysis, IL-10 (AUC = 0.86) could be a useful single predictor of critically ill patients with COVID-19 pneumonia (Figure 2). The critically ill patients with pneumonia caused by this virosis are those who need high-flow nasal cannula or higher-level oxygen support measures to correct the hypoxemia. They are always observed to have pulmonary fibrotic changes on CT scans, ranging from fibrosis associated with pneumonia to severe lung injury, which results in hypoxemia (Shi et al., 2020). Several in vivo and in vitro studies have demonstrated that IL-10 demonstrates anti-fibrotic function in pancreatic, liver, and bleomycin-induced lung fibrosis (Thompson et al., 1998;Demols et al., 2002;Shamskhou et al., 2019). Therefore, it is speculated that IL-10 may play an anti-inflammatory and anti-fibrotic role for critically ill patients with COVID-19 pneumonia.   IL-6 is a pleiotropic cytokine secreted by myeloid cells following immune challenge or tissue injury (Yousif et al., 2021). It has a proinflammatory function but also has anti-inflammatory, proresolution, and regenerative properties (Mcelvaney et al., 2021). Production of IL-6 helps promote resistance to different pathogens and the maintenance of tissue homeostasis, but the overproduction causes chronic inflammatory disorders and severe hyperinflammation (Jones and Hunter, 2021). Several studies have reported that serum level of IL-6 is significantly elevated in the setting of severe COVID-19 disease (Coomes and Haghbayan, 2020;Cummings et al., 2020;Huang et al., 2020;Leisman et al., 2020). Moreover, the use of tocilizumab, a blocker of IL-6 receptor (IL-6R), has been recommended for severe cases of COVID-19 (Huang et al., 2020;Ruan et al., 2020;Wu et al., 2020a;Angriman et al., 2021;Galvań-Romań et al., 2021;Mcelvaney et al., 2021). IL-6 is also reported as one of the good predictors of progression and severity in patients with COVID-19 (Guirao et al., 2020;Liu et al., 2020;Broman et al., 2021;Ren et al., 2021). In addition, it is suggested that an elevated level of IL-6 is an important predictor of patients with severe COVID-19 needing ventilator support (Galvań-Romań et al., 2021). Therefore, IL-6 may be an effective marker of both disease severity and decision making in the clinical management of patients. As expected, our study suggests IL-6 (OR = 1.04, 95% CI 1.02, 1.06) is independently associated with critical illness in patients with COVID-19 pneumonia ( Figure 5).
Aspartate aminotransferase (AST) is one type of aminotransferase that mainly exists in the liver and plays a role in the conversion of aspartate to ketoglutaric acid (Kwo et al., 2017). AST is normally present in the cytoplasm, but it is released into the serum after the damage of cells (Abd Rashid et al., 2021). Therefore, it is used as a method of assessing the liver condition. Recently, studies have reported that critically ill patients with COVID-19 pneumonia manifest elevated AST level (Zahedi et al., 2021). Among indicators of liver injury, elevated AST has been connected with the highest risk of death and the highest association with mortality . Padmaprakash et al. have demonstrated that AST is a significant predictor of COVID-19 mortality and elevated AST level is a valid indicator of COVID-19 pneumonia severity (Padmaprakash et al., 2022). Elevated AST levels have been independently associated with adverse clinical outcomes in COVID-19 patients, which includes admission to ICU, use of invasive mechanical ventilation, and death (Yip et al., 2021). At admission, AST has been demonstrated as an independent predictor of COVID-19 mortality, and it is essential to monitor AST in hospitalized patients (Ding et al., 2021). As expected, our LR model suggested that AST (OR = 1.03, 95% CI 1.01, 1.05) could be a predictive mark of critically ill patients with COVID-19 pneumonia ( Figure 5).
Brain natriuretic peptide (BNP) is a 32 amino acid cardiac natriuretic peptide hormone, which is strongly upregulated in cardiac failure and locally in the area surrounding a myocardial infarction (Hall, 2004). Previous studies have highlighted that COVID-19 is a complex disease, targeting many organs and it is an independent risk factor for acute myocardial infarction, promoting the release of BNP (Katsoularis et al., 2021).
Emerging data suggest that cardiac injury, manifested by cardiac biomarker elevation, is detected in sizeable COVID-19 patients and is associated with adverse outcomes and increased mortality . Stefanini et al. suggested that concomitant elevation of both BNP and troponin I serves as a strong independent predictor of all-cause mortality (OR 3.24) (Stefanini et al., 2020). Our study suggested that BNP (OR = 1.02, 95% CI 1.01, 1.03, P = 0.011) was independently associated with the development of critical illness in patients with COVID-19 pneumonia ( Figure 5).
CD4 + T cells are instrumental as activators of both the innate and adaptive arms of the immune system (Ruterbusch et al., 2020). As critical protectors from infectious diseases, they can assist in humoral responses, indirectly activate macrophages, and directly suppress inflammation (Miller and Mitchell, 1969;Parish and Liew, 1972;Jandinski et al., 1976). Rydyznski et al. have suggested that SARS-CoV-2-specific CD4 + T cells are strongly associated with COVID-19 disease severity (Rydyznski Moderbacher et al., 2020). Oja et al. reported that CD4 + T-cell responses were qualitatively impaired in critically ill patients with COVID-19 patients (Oja et al., 2020). Our study suggested that in comparison to the non-critically ill patients, patients with critical illness had lower levels of CD4 + T cells (Figure 1). The SHAP summary plot by the XGBoost method suggested that the CD4 + T cells play an important role in predicting critical illness (Figure 4).
Nomogram is a two-dimensional graphical tool that could be used to predict the probability of a result, consisting of several lines arranged in proportions (Rahman et al., 2021). It demonstrates a great superiority in quantifying the risk of clinical events simply and intuitively (Iasonos et al., 2008;Jin et al., 2017). It is a quantitative and practical prediction tool and could provide clinicians with an easy-to-use method to predict severe pneumonia in COVID-19 patients (Feng et al., 2020). Wu established a nomogram model consisting of seven variables (age, lymphocyte, CRP, LDH, creatine kinase, urea and calcium) for severity risk prediction of COVID-19 pneumonia and classify COVID-19 patients into low-risk, medium-risk, and high-risk groups (Wu et al., 2020b). Incorporating different factors to construct a nomogram could have different clinical values. Ding et al. suggested that the prognosis of COVID-19 patients can be accurately predicted by the nomogram incorporating abnormal AST and D-bilirubin levels along with other individual signs at admission (Ding et al., 2021). Our study suggested that a nomogram based on LR model, consisting of IL-6, AST, and BNP achieved an excellent AUC of 0.91 for prediction of critically ill patients with COVID-19 pneumonia in two-fold cross-validation ( Figure 6). Compared to other studies (Wu et al., 2020b;Ding et al., 2021), our nomogram was more simple to calculate because only three variables were needed ( Figure 5). The point of each variable can be determined by referring vertically to the dotted line at the bottom. The scores of each corresponding variable have been added to calculate the total score, and the probability of severe COVID-19 pneumonia is predicted based on the values of the total points and lines, corresponding to the total score. Compared to other ML methods, XGBoost shows resistance to overfitting in datasets with imbalanced feature/outcome ratios and hyperparameters, which allows tuning for imbalanced datasets (Vaid et al., 2021). By using SHAP summary plot, the variable importance of each variable could be quantified and explained. SHAP values are a game-theoretic approach to model interpretability that provide explanations of global model structure based on combinations of local explanations for each prediction (Vaid et al., 2021). XGBoost has been used to predict respiratory failure within 48 h, morbidity and mortality in patients hospitalized with COVID-19 (Pan et al., 2020;Bolourani et al., 2021;Wang et al., 2021). AlJame et al. (2021) used RF and XGBoost for screening COVID-19 from other patients while Montomoli et al. (2021) has used it to predict change in the SOFA score in a five day span for ICU admitted COVID-19 patients. Feng et al. (2021) used RF and XGBoost for predicting mortality in Covid-19 patients in comparison to several other methods and found XGBoost to be the superior ML method. Iwendi et al. (2020) reported use of RF for COVID-19 mediated deaths with respect to gender, age and geography. They reported more deaths in males, Wuhan population and people aged between 20 and 70 years.
Our study suggested that, when comparing the performance of the XGBoost model with the RF and LR models, the XGBoost (AUC = 0.93) exhibited the highest discriminatory performance, followed by LR (AUC = 0.91) and FR model (AUC = 0.89) ( Figure 6). The area under precision recall curve and area under precision recall gain curve analysis showed similar results (Figures 7, 8). XGBoost model achieved a sensitivity of 90.5%, specificity of 87.5%, diagnostic accuracy of 88.5% and F-score of 84.4%, way higher than that of nomogram and RF models ( Table 2). ML models are sophisticated and it is hard for clinicians to comprehend them, therefore less practiced in clinics . We have provided a visual illustration of the implemented models to help easily understand the importance of different models and features by clinicians. The results of XGBoost have been explained by LIME plot, which makes it easy to understand the individualized prediction ( Figure 9).
To the best of our knowledge, this is the first study in the literature to implement and compare XGBoost, RF, and LR model (presented as a nomogram) based on clinical, laboratory data, immune cell and cytokine profiles for the differentiation of critically ill from non-critically ill patients with COVID-19 pneumonia. While global measures such as accuracy are useful, they cannot be used for explaining why a model made a specific prediction. We used LIME plot to explain the outcome of XGBoost model. In addition, cytokine profile and immune cellular data were also evaluated as potential predictors for the severity of COVID-19 pneumonia. Our study still shows limitations and there is room for further improvement. First, it was a retrospective study from a single center. Secondly, the small sample size bears an intrinsic risk of over-fitting though we used two-fold cross-validation as the resampling method to avoid overfitting. Only patients with pneumonia were enrolled, therefore our results may be not applicable to patients without pneumonia. Thirdly, given that the proposed ML method is purely data-driven, our model may vary if applied on different datasets (Yan et al., 2020). Our XGBoost approach needs further model training, validation, and optimization before clinical application because patients in this study were enrolled from a single tertiary referral center. However, the findings are interesting and warrant further research. In future, application of deep learning models on our data would be interesting. Apart from classification of patients suffering from COVID-19, our protocol could be applied to subtype various cancers and could be extrapolated in other viral diseases as well. Amalgamation of more methods, deep learning and unsupervised algorithm comparison could also be interesting. The findings could be useful for doctors in prioritizing patient treatment and be a part of decision support systems to obtain useful predictors and impact clinical outcomes.
In conclusion, comparison stats showed that XGBoost had the highest discriminatory performance for prediction of critically ill patients with COVID-19 pneumonia. The nomogram and visualized interpretation with LIME plot could also be useful in the clinical setting. Additionally, we identified that IL-10 is a useful predictor of critically ill patients with COVID-19 pneumonia and this finding is complemented by previously available literature as well.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

ETHICS STATEMENT
This study protocol was approved by the Ethics Committee of the First Affiliated Hospital of Wenzhou Medical University. The committee decided to waive the need for written informed consent from the participants studied in this analysis as the data were analyzed retrospectively and anonymously.

AUTHOR CONTRIBUTIONS
WH conceived the study and carried out majority of the work. WH, GC, and JYeP participated in data collection. WH, ZB, XZ, SJ, YL, JYiP, QL, and SY conducted data analysis and drafted the manuscript. TX, ZB, MZ, SF, VT, SS and AG helped to finalize the manuscript. All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.