Development and validation of a predicted nomogram for mortality of COVID-19: a multicenter retrospective cohort study of 4,711 cases in multiethnic

Background Coronavirus disease 2019 (COVID-19) is an infectious disease spreading rapidly worldwide. As it quickly spreads and can cause severe disease, early detection and treatment may reduce mortality. Therefore, the study aims to construct a risk model and a nomogram for predicting the mortality of COVID-19. Methods The original data of this study were from the article “Neurologic Syndromes Predict Higher In-Hospital Mortality in COVID-19.” The database contained 4,711 multiethnic patients. In this secondary analysis, a statistical difference test was conducted for clinical demographics, clinical characteristics, and laboratory indexes. The least absolute shrinkage and selection operator (LASSO) and multivariate logistic regression analysis were applied to determine the independent predictors for the mortality of COVID-19. A nomogram was conducted and validated according to the independent predictors. The area under the curve (AUC), the calibration curve, and the decision curve analysis (DCA) were carried out to evaluate the nomogram. Results The mortality of COVID-19 is 24.4%. LASSO and multivariate logistic regression analysis suggested that risk factors for age, PCT, glucose, D-dimer, CRP, troponin, BUN, LOS, MAP, AST, temperature, O2Sats, platelets, Asian, and stroke were independent predictors of CTO. Using these independent predictors, a nomogram was constructed with good discrimination (0.860 in the C index) and internal validation (0.8479 in the C index), respectively. The calibration curves and the DCA showed a high degree of reliability and precision for this clinical prediction model. Conclusion An early warning model based on accessible variates from routine clinical tests to predict the mortality of COVID-19 were conducted. This nomogram can be conveniently used to facilitate identifying patients who might develop severe disease at an early stage of COVID-19. Further studies are warranted to validate the prognostic ability of the nomogram.


Introduction
Coronavirus disease 2019 (COVID- 19), an infectious disease caused by the novel binuclear virus-severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), has been broken out and rapidly spread worldwide (1).The number of affected countries and deaths has risen dramatically, which is providing significant challenges and placing an unprecedented economic burden on global public health systems and clinical management (2).
COVID-19 can affect multisystem organs, and the main clinical presentation is pneumonia (3).Although most patients with COVID-19 have mild to moderate illness, with common respiratory symptoms and a good prognosis, several severe and critical patients will get worse rapidly with acute respiratory distress syndrome (ARDS), septic shock, multiple organ dysfunction, and even death, especially in those of the elderly with comorbidities such as congestive heart failure (CHF), chronic obstructive pulmonary disease (COPD), central nervous system (CNS) disease, chronic renal failure, and cancer (4).Furthermore, as previously described, deaths were more common in older patients with abnormal laboratory indexes such as inflammatory factors and hepatorenal function after COVID-19 affected (5).Therefore, it is crucial and urgent to rapidly identify prognostic indicators of fatal outcomes through efficient predictive methods to aid in the early implementation of preventive measures and interventions, thereby preventing disease progression and mortality in critically ill patients.The accurate and immediate decision-making of treatment strategies may reduce the mortality risk.
Considering radiological abnormalities were not observed during initial presentation in approximately 20% of cases, clinical characteristics and routine clinical laboratory tests may provide such prognostic factors as quickly as possible (6).The nomogram is a two-dimensional graphic mathematical representation of a scoring model made up of multiple scale axes designed for a userfriendly interface, highly accurate to calculate the probability of an outcome (7).A nomogram including variables like routine laboratory tests might be more effective and affordable for predicting the risk of mortality (8).Therefore, the purpose of this study was to describe the clinical features of COVID-19 and establish a nomogram based on a large number of COVID-19 patients incorporating common clinical demographics, characteristics, and laboratory parameters, to early warn the risk of fatal outcomes in patients with COVID-19.

Data source
The original database of this research was from the Neurologic Syndromes Predict Higher In-Hospital Mortality in COVID-19 (9).Since Eskandar et al. had relinquished the ownership of the original database to Neurology, 1 we can use this database to conduct secondary analysis

Study population and covariates
A total of 4,711 patients with confirmed COVID-19 were consecutively collected between March 1st and April 16th, 2020.We divided the whole participants into a derivation or validation cohorts by 7:3, randomly.The derivation cohort was formed of 3,534 subjects, including 2,661 surviving patients and 873 deceased patients.The validation cohort was formed of 1,177 subjects, including 902 surviving patients and 275 deceased patients.
Information on clinical demographics, characteristics, laboratory indexes, comorbidities, and mortality was collected by a health care surveillance software package (Clinical Looking Glass; Streamline Health, Atlanta, GA) and a review of the primary medical records (11).

Regression analysis
Least absolute shrinkage and selection operator (LASSO) regression analysis was applied to identify factors related to the mortality of patients with COVID-19.The Lambda values were chosen after a 10-fold cross-validation.Subsequently, a multivariate logistic regression analysis was established with the selection of LASSO regression analysis, in which p-value levels for inclusion criteria were conducted as 0.05.

Model development
Predictive models related to the mortality of COVID-19 were conducted in the primary cohort according to the variables selected by multivariate logistic regression analysis.The final model was determined by the Akaike information criterion (AIC), the receiver operating characteristic (ROC) curves, and the Harrell concordance index (C-index).The nomogram was derived from the final model.

Performance of the nomogram
The model was internally validated by data from the validation cohort.Discriminatory performance was measured by the C index.Calibration was tested via a calibration plot with 1,000 bootstraps resamples, which described the degree of fit between actual and nomogram-predicted mortality of COVID-19.

Clinical usage
Regarding its clinical usefulness, the decision curve analysis (DCA) was undertaken to assess the clinical benefit of the nomogram.Detailed descriptions of DCA have been previously reported (12).The results were considered statistically significant at p-value <0.05.

Statistical analysis
Continuous variables were expressed as medians (mean ± standard deviation), and categorical variables were expressed as numbers (percentage).Differences in baseline characteristics between groups for continuous variables were assessed by the Mann-Whitney U test.The Chi-square test or Fisher's exact test was used for categorical variables according to their sample size.
A two-sided p < 0.05 was considered statistically significant.IBM SPSS v23.0 (SPSS Inc., Chicago, IL, United States) was applied for statistical analyses in the research.The nomogram was conducted and calibration curve analysis were carried out by the R software v4.2.0 (http://www.R-project.org,R Foundation for Statistical Computing, Vienna, Austria).

Population clinical characteristics of the derivation and validation set
Table 1 summarized the clinical characteristics of the derivation set (n = 3,534) and the validation set (n = 1,177).The clinical demographics including mortality, race, comorbidities, and laboratory indexes did not have significant differences between both cohorts (all p > 0.05).

Comparison of the baseline characteristics of the derivation set between the survived patients and deceased patients
The baseline clinical characteristics of the derivation set were shown in Table 2. Compared with surviving patients, patients deceased were more likely to be white race and older, with comorbidities including CHF, COPD, renal disease, CNS disease, and stroke (all p < 0.05).Meanwhile, they were more frequently encountered with decreased median arterial blood pressure (MAP), oxygen saturation (O 2 Sats), and longer length of stay (LOS) (all p < 0.01).In the laboratory parameters between the two groups, the deceased patients showed significantly higher white blood cells (WBC), ferritin, glucose, sodium, procalcitonin (PCT), C-reactive protein (CRP), D-dimer, aspartate aminotransferase (AST), alanine aminotransferase (ALT), troponin, international normalized ratio (INR), blood urea nitrogen (BUN), and creatinine, but lower platelets (all p < 0.01).

LASSO regression analysis
A total of 33 related variables, including clinical demographics, laboratory indexes, and comorbidities, were initially put into the LASSO regression algorithm by 10-fold cross-validation to identify the indictors for mortality of COVID-19.As shown in Figure 1A, 16 potential indictors with non-zero coefficients were chosen: age, PCT, glucose, D-dimer, CRP, troponin, BUN, LOS, MAP, AST, temperature, O 2 Sats, platelets, Asian, CNS disease, and stroke.Figure 1B depicted the changes in the LASSO coefficients.

Multivariate logistic regression analysis
As Figure 2 showed, 16 predictors chosen by the LASSO regression analysis were selected via multivariate logistic regression analysis to determine the independent parameters that predicted the mortality of COVID-19.15 parameters were included in the final model, and those were age, PCT, glucose, D-dimer, CRP, troponin, BUN, LOS, MAP, AST, temperature, O 2 Sats, platelets, Asian, and stroke.

Construction of a novel nomogram scoring system
Based on the results of the multivariate logistic regression analysis, the 15 variables above were included as predictors to establish a nomogram (Figure 3).Each predicters corresponded to a score, and the total score was mapped to the prediction axis of the diagnosis, which could reflect the risk factors for mortality of COVID-19.As an example, to better explain the nomogram, if the patient was 70 years old (30 points), PCT of 10 ng/mL (2 points), glucose of 300 mg/dL (3 points), D-dimer of 6 mg/L (2 points), CRP of 20 mg/L (7 points), troponin of 0.2 ng/mL (2 points), BUN of 50 mg/dL (4 points), LOS of 10 days (2 points), MAP of 70 mm Hg (32 points), AST of 1,000 U/L (7 points), temperature of 38°C (14 points), O 2 Sats of 90% (1 point), platelets of 300 k/mm 3 (19 points), Asian of yes (7 points), and stroke of yes (14 points), the total points was 144 and the probability of death was estimated to be more than 90%.

Evaluation and validation of the nomogram
1,177 patients constituted the validation cohort.The calibration curves were drawn to assess the model's calibration in the derivation (Figure 4A) and validation cohort (Figure 4B).An analysis of the ROC curve was conducted to measure the discrimination of the model in the derivation and validation cohort.And the areas under the curves (AUC) were 0.860 and 0.847, respectively (Figure 5).In addition, the DCA curves showed that the novel nomogram also had a higher clinical net (Figure 6).

Discussion
In the current study, by employing a large, multicenter, and welldescribed population of 4,711 patient cohort, we used LASSO regression and multivariate logistic regression analysis to develop and validate a prediction nomogram.The nomogram was validated by internal 1,000 bootstrap resampling, as well as an internal validation cohort, maintaining an adequate calibration and discrimination capacity, which may enable physicians to predict the mortality of In the research, clinical demographics, characteristics, and laboratory tests were collected and analyzed to investigate the risk of fatal outcomes in COVID-19 patients.Compared with other diseases, COVID-19 progresses more severely and faster, which may not be identified promptly (13).The early symptoms of COVID-19 are insidious and flexible, which creates more challenges to early detection (14).Especially during this time of the severe COVID-19 epidemic, many non-respiratory physicians involved in this critical battle for fighting against the epidemic, a more straightforward method that does not require professional respiratory doctors and radiologists to evaluate the infiltration of multiple lung lobes is practical (15).To make the prognostic nomogram rapid and easy to use in the busy clinical work, we only focused on variables in clinical features and laboratory tests.
The mortality of COVID-19 in our study is 24.4%, in line with the range reported in recent studies (16).In the nomogram, age was one of the most imperative predictors for mortality of COVID-19.Meanwhile, the deceased group was older compared to the surviving group.Previously, researches also reported elders in early risk evaluation for severe COVID-19 (17,18).The association between age and severe COVID-19 might be related to angiotensin converting enzyme-2 (ACE2) (19).As previously described, ACE2 has essential salutary functions and could decrease several detrimental effects, such as inflammation, vasoconstriction, and thrombosis.However, SARS-CoV-2 can markedly downregulate ACE2 by entering into cells, which might be extra detrimental in the old population via age-related baseline ACE2 deficiency (20).In contrast, there were also several studies that suggested age was not an independent indicator for mortality or severe COVID-19 disease (21).As mentioned in those studies, the reason age was not an independent indicator might be the fact that, rather than age, age-related comorbidities affect mortality (22).
As for inflammation-related factors, including PCT, CRP, and D-dimer they were revealed to be related to the mortality of COVID-19 in our model, which was coincided with previous research (23).PCT and CRP are common inflammation factors in infected diseases (24).D-dimer is not only a fibrinogen-related factor, but also a thromboinflammatory factor (25).A high prevalence of pulmonary embolism and venous thromboembolism had been reported in patients with COVID-19 (26).Moreover, more than macrovascular thrombosis, microthrombotic events in the lungs have been observed by autopsies (27).A thromboinflammatory procedure in the pulmonary capillary vessels might be the major reason for microthrombosis in the lung capillaries, inducing COVID-19associated coagulation disorder, which is characterized by a raising in procoagulant biomarkers, such as fibrinogen, together with a substantial rise in D-dimer (28).
For other laboratory parameters in the nomogram, blood glucose, BUN, troponin, and AST, which indicated multi-organ dysfunction, represented major predictors of mortality for COVID-19 patients (29).A Chinese meta-analysis demonstrated that diabetes mellitus was related to an increased risk of severity or death in COVID-19  patients, while it was still not clear to what extent diabetes mellitus independently contributed to the increased risk (30).In our model, not the diabetes condition but the glucose level contributed to the mortality of COVID-19.Therefore, controlling the glucose level is vital for neither diabetes or non-diabetes patients.In other nomogram models, high direct bilirubin level was confirmed to be an independent indicator of mortality in COVID-19 (31).Whereas, another research with a larger population revealed that rather than bilirubin, AST elevation was more closely related to COVID-19 mortality risk (32).SARS-CoV-2 mainly attacks the respiratory  Forest plot for multivariate logistic regression analyses of predictors for mortality in patients with coronavirus disease 2019 (COVID-19).system.Moreover, previous research has shown evidence of damage to other organs, such as the liver (33).Liver dysfunction in COVID-19 patients might be mainly related to an organ-specific immune response (34).Also, systemic cytokine storm, hypoxemia, and medications can aggravate it (35).It was found that AST was increased in the severe COVID-19 patients (36).This result was coincided with the findings of our research, which indicated that AST was a critical biomarker for clinical outcomes.Besides, recent a study pointed out that acute kidney injury was closely related to severe infection and fatality in COVID-19 patients (37).Additionally, the combination of BUN and D-dimer could predict mortality in 305 COVID-19 patients, with 27.9% mortality (38).
It is noteworthy that, in our model, the race of Asian is a risk score for mortality of COVID-19.Even if we have not observed significant differences in mortality within our Asian population because there were only 97 patients in this database, it was included in the nomogram.Hoping larger datasets with more Asian people will improve the prediction model for our Asian.
In our research, a practical nomogram based on easily accessible variates from routine clinical work, can provide a more accurate evaluation and prediction of mortality for COVID-19 patients.As a result, clinicians can use this intuitive predictive nomogram to draw a few lines promptly to make a prompt calculation of a patient's prognosis.If a patient with a high mortality rate could be identified properly and rapidly, he or she could be more likely to benefit from close attention in clinical care and nutritional support in nursing care, which would ultimately have a positive effect on recovery.In addition, our model could help doctors rationally allocate medical resources to reduce the mortality of COVID-19 when medical resources are scarce.

Limitations
Our study has several limitations.First of all, we found that the race of Asian might be an independent risk factor for the mortality of COVID-19.However, due to the Asian population in this database being too small, further research with larger Asian population are warranted to validate the finding.Second, the model

Conclusion
In conclusion, we developed an early warning model based on accessible variates from routine clinical tests to predict the mortality of COVID-19.This nomogram could be conveniently used to facilitate identifying patients who might develop to severe disease at an early stage of COVID-19.Further researches are warranted to validate the prognostic ability of the nomogram.

FIGURE 1
FIGURE 1Risk factors selecting using LASSO model.1A: Optimal parameter (lambda) selection for the LASSO model was cross-validated 6 the minimum criterion.Partial likelihood deviation curves (binomial deviation) versus log (lambda).The dotted vertical lines are drawn at the best values of 1SE (1-SE criterion) using the minimum criterion and the maximum criterion.1B: LASSO coefficient profiles for 33 characteristics.The coefficient profiles were produced from logarithmic sequences (lambda).The vertical lines are drawn on the value selected using fivefold cross-validation, where the best lambda resulted in non-zero coefficients for five features.

FIGURE 4
FIGURE 4Calibration curves of the nomogram in the derivation cohort (A) and validation cohort (B).

FIGURE 5
FIGURE 5Receiver operating characteristics curve in the derivation cohort (red) and validation (blank) cohort for the nomogram.

FIGURE 6
FIGURE 6Decision curve analysis of the nomogram (A) and in the derivation cohort (B) and the validation cohort (C), respectively.
1 doi.org/10.5061/dryad.7d7wm37szaccording to different scientific hypotheses.The original research was granted exempt status.The requirement for obtaining informed consent was waived by the Ethics Committee for Clinical Research of the Albert Einstein College of Medicine, Montefiore Medical Center (9).The original database collected consecutive hospitalized patients with moderate or severe COVID-19 from four hospitals in the Montefiore Health System between March 1st and April 16th, 2020.The database contained multiethnic patients, including 1743 Black, 466 White, 121 Asian, and 1753 Latino.The diagnosis of COVID-19 was based on World Health Organization interim guidance and confirmed by real-time reverse transcriptase PCR positive assay testing for SARS-CoV-2 RNA (10).

TABLE 1
Baseline clinical characteristics of the derivation and validation cohort.

TABLE 2
Clinical characteristics of the derivation cohort.