Development of an Early Warning Model for Predicting the Death Risk of Coronavirus Disease 2019 Based on Data Immediately Available on Admission

Introduction: COVID-19 has overloaded worldwide medical facilities, leaving some potentially high-risk patients trapped in outpatient clinics without sufficient treatment. However, there is still a lack of a simple and effective tool to identify these patients early. Methods: A retrospective cohort study was conducted to develop an early warning model for predicting the death risk of COVID-19. Seventy-five percent of the cases were used to construct the prediction model, and the remaining 25% were used to verify the prediction model based on data immediately available on admission. Results: From March 1, 2020, to April 16, 2020, a total of 4,711 COVID-19 patients were included in our study. The average age was 63.37 ± 16.70 years, of which 1,148 (24.37%) died. Finally, age, SpO2, body temperature (T), and mean arterial pressure (MAP) were selected for constructing the model by univariate analysis, multivariate analysis, and a review of the literature. We used five common methods for constructing the model and finally found that the full model had the best specificity and higher accuracy. The area under the ROC curve (AUC), specificity, sensitivity, and accuracy of full model in train cohort were, respectively, 0.798 (0.779, 0.816), 0.804, 0.656, and 0.768, and in the validation cohort were, respectively, 0.783 (0.751, 0.815), 0.800, 0.616, and 0.755. Visualization tools of the prediction model included a nomogram and an online dynamic nomogram (https://wanghai.shinyapps.io/dynnomapp/). Conclusion: We developed a prediction model that might aid in the early identification of COVID-19 patients with a high probability of mortality on admission. However, further research is required to determine whether this tool can be applied for outpatient or home-based COVID-19 patients.


INTRODUCTION
Since the worldwide COVID-19 epidemic in 2019, up to now (2021/04/03), 129 million people had been infected, and 2.82 million people had died, and the number of confirmed patients with COVID-19 infection was continually growing by hundreds of thousands every day (1), leaving global medical institutions overburdened (2). Because of the substantial growth in COVID-19, several nations are experiencing serious shortages of regular hospital beds and ICU beds (3). As a result, a substantial proportion of COVID-19 patients were trapped in outpatient clinics or at home, unable to receive proper therapy (4); among these there were some patients with a potentially high risk of death. How to early and effectively identify a COVID-19 patient with a high risk of death is a major challenge we face. Although there are more than 100 prediction models about the prognosis of COVID-19 (5,6), there are relatively few early warning models about the severity of COVID-19. Qing-Lei Gao built an early death risk prediction tool for COVID-19 through machine learning (7,8). Although the model had high prediction accuracy, the modified model comprised 14 variables, the majority of which were laboratory indicators, making it hard to acquire useful indications immediately on admission. The effect of early warning (7) on admission could not be realized, and because this study did not provide a visual prediction tool, its operability was poor. Furthermore, several researchers investigated other scoring systems such as QSOFA, SOFA, early warning score (EWS), and national early warning Score 2 (NEWS 2) for early warning of the severity of patients with COVID-19. Among them, NEWS2 had a higher warning value for the severity of patients with COVID-19 (9)(10)(11). However, these studies about NEWS2 were with minimal sample size, and the score contains eight variables, which made it more difficult to use and affected its clinical application value.
To summarize, the current prediction model or prior illness severity scores were almost all that was required to get laboratory indicators and a large number of items. As a result, completing a COVID-19 severity evaluation and early warning in a timely manner is difficult. More importantly, no matter what prediction model or illness severity scores were used, they were all extremely inconvenient. Therefore, it is necessary to develop a more straightforward prediction tool for predicting the death risk of COVID-19.

Study Design
A retrospective cohort study.

Objective
To develop a simple and effective prediction model based on data immediately available on admission to early predict the death risk of COVID-19.

Setting
Four hospitals in New York City.

Diagnosis of COVID-19
SARS-CoV-2 RNA was detected by RT-PCR, and the positive patients were diagnosed as COVID-19 patients.

Inclusion Criteria
(a) Patients diagnosed as COVID-19 and older than 18 years old; (b) For patients admitted to hospital many times, only the last admission was included for analysis.

Exclusion Criteria
(a) Although the patient was evaluated in the emergency room, the patient was not admitted to the hospital; (b) Patients who died in the emergency room.

Participants
From March 1, 2020, to April 16, 2020, patients infected with COVID-19 diagnosed by RT-PCR were collected. The follow-up ended on May 7, 2020, and the follow-up varied from 3 weeks to 80 days. Among them, a total of 4,711 cases confirmed by COVID-19 met the inclusion and exclusion criteria and were included in this study.

Ethics Statement
New ethics approval was not applicable because the original author had obtained ethical approval when conducting this study. Permission to participate was also not appropriate because our review was a retrospective study of data reuse, and the message of the patients was anonymous.

Data Immediately Available on Admission Included
(a) Demographic data only include age and race, while other relevant data were not provided in the data set, so it could not be included in our study for further analysis; (b) Past medical history included myocardial infarction, congestive heart failure, cerebrovascular disease, diabetes, dementia, and chronic obstructive pulmonary disease (COPD); (c) The vital signs at admission include SpO2, mean artistic pressure (MAP), and body temperature (T). All the above variables were collected on admission.

Collection of Outcome Indicators
Death-related data were collected through hospital death registration and deaths in the national death registry.

Selection of Predictor Variables
The following three ways were used to select the variables for the model construction and then construct the corresponding models: (a) All variables that can be obtained immediately on admission were included in the construction and verification of the prediction model; (b) All variables that could be obtained immediately on admission were included in multivariate analysis, and variables with P-value <0.05 were included in the construction and verification of the model; (c) According to the literature review, we further constructed a more concise prediction model.

Statistical Analysis
(a) Mean ± S.D (x ± s) was used for measurement data, while n (%) was used for counting data. (b) Seventy-five percent of the sample size was used to construct the prediction model, and the remaining 25% was used to verify the prediction model. (c) The following methods were used to construct and verify the prediction model, including multiple fractional multivariate models (MFP model), full model, stepwise selected model (stepwise model), bootstrap full (bootstrap resampling 500 times), and bootstrap stepwise (bootstrap resampling 500 Times). (d) The corresponding nomogram was constructed based on the best model described above, and then we used the "DynNom" package to construct a corresponding online dynamic nomogram (15). (e) The missing value of variables included in our study was very few, so there was no special handling of the missing values during model building. Statistical analysis was performed using Empower Stats version 2020 epidemiology software (www.empowerstats.com) and R software.

Univariate Analysis Results
Univariate analysis was performed for the following variables: age, SpO2, MAP, T, black, Asian, White, Latino, myocardial infarction, congestive heart failure, cerebrovascular disease, diabetes, dementia, and COPD. Univariate analysis showed that age, SpO2, MAP, White and COPD were shown to be associated with patient prognosis, with OR values of 1  Table 2).

The Result of Multivariate Logistic Regression Analysis
Multivariate logistic regression analysis was performed for the following variables: age, SpO2, MAP, T, black, Asian, white, Latino, myocardial infarction, congestive heart failure, cerebrovascular disease, diabetes, dementia, and COPD. The  Table 2).

The Construction and Verification of the Prediction Model
Seventy-five percent of the sample was used to construct the prediction model: (1) Table 3; Figure 1.) The remaining 25% was used to verify the prediction model: (1) Firstly, age, SpO2, MAP, T, Black, Asian, White, Latino, myocardial infarction, congestive heart failure, cerebrovascular disease, diabetes, dementia, and COPD all were included for verifying the prediction model, and its AUC of MFP model, full model, stepwise model, bootstrap full, and bootstrap stepwise  Table 3; Figures 1, 2).

Visualization Tool Construction
We discovered that the prediction model constructed by age, SpO2, MAP, and T had a similar predictive value comparing with the prediction model constructed by other variables. Further, we found that the full model had the highest specificity and similar accuracy, as compared with MFP model, stepwise model, bootstrap full, and bootstrap  stepwise. As a result, we chose the Full model as our target prediction model. According to this model, the corresponding nomogram was constructed, and then we used the "DynNom" package to construct a corresponding online dynamic nomogram (https://wanghai.shinyapps.io/dynnomapp/) (See Table 3; Figure 3).

DISCUSSION
We constructed a prediction model with high predictive value through age, SpO2, MAP, and T, and most important was that the model had high specificity and was simple and easy to be used. All the variables included in the prediction model: age, SpO2,  MAP, and T were confirmed to be closely related to the prognosis of COVID-19. According to some researches, variations in COVID-19 mortality risks across various ethnic groups might be due to economic and cultural differences (16). However, because the data set lacked information on the economy and culture, it was difficult to modify the associated factors to establish whether Asians' death risks were indeed higher than those of other ethnic groups. Furthermore, research has indicated that Asians' death risk is not higher than that of other ethnic groups (17). For the reasons stated above, we did not include Asians as a variable in the prediction model's construction and validation.
A large number of studies had found that age was an independent risk for COVID-19 mortality. In Wuhan, a two-way cohort study involving 548 COVID-19 patients (including 269 severe cases) discovered that the older the patients, the higher the risk of COVID-19 severity and fatality (18). Another study, which included 221 COVID-19 infected individuals, systematically explored the relationship between age and clinical manifestations and prognosis of COVID-19. The study found that elderly patients were more likely to be complicated by bacterial infection, and that the severity of the disease was associated with lower serum albumin levels, higher urea nitrogen levels, higher lactate dehydrogenase levels, and higher inflammatory factors levels, as well as the use of glucocorticoid and ventilator-assisted therapy (19). According to Massimo Volpe's research, the elderly patients had a higher Charlson comorbidity index and higher mortality (20). Wenru Su et al. discovered that SARS-CoV-2 susceptibility gene expression in circulating immune cells increased, as did immune system abnormalities in older individuals (21). To summarize, the elderly patients often had more complications, more likely to be complicated with bacterial infection and hypoproteinemia, immune disorders, and more severity and higher mortality. In this study, the higher the age, the higher the death of patients, consistent with the above studies.
COVID-19 mostly harmed the respiratory system, with acute respiratory distress syndrome being a deadly consequence (22,23). Ruiguang Zhang et al. found that patients with hypoxemia (SpO2 <90%) had higher levels of IL-6, IL-10, LDH, and Creactive protein and higher mortality. The above results were consistent with our study.
MAP was one of the indexes reflecting tissue perfusion. A large number of studies showed that MAP on admission was strongly connected to the prognosis of patients. The higher the MAP on admission, the lower the risk of mortality (13,24).
One of the most prevalent signs of COVID-19 was fever. Dong Chen et al. discovered that around 36% of COVID-19infected hospitalized patients had a fever, and the greater their body temperature, the worse their prognosis (25). Furthermore, Yongxi Zhang et al. also found that patients with refractory COVID-19 had higher body temperature (26).
To sum up, the variables included in the early warning model: age, SpO2, MAP, and T had been widely confirmed to be closely related to the prognosis of COVID-19, which were also consistent with our research results, so it was reasonable to use the four variables to construct the prediction model.
When compared to previously published prediction models (such as EWS, NEWS2), our prediction model was with relatively low predictive value for the severity of patients with COVID-19. However, these prediction models were with more variables, and meanwhile these variables cannot be obtained in a short time, which made them more difficult to use (9)(10)(11). Therefore, these models were not suitable for early warning of COVID-19 severity. However, our model still couldn't instead of these models for subsequent prediction of COVID-19 patients' prognosis. In clinical applications, we might utilize our model for early warning while also combining it with other models to minimize further delays in identifying severely unwell patients.

The Application Value of This Model
(a) Firstly, we constructed a straightforward prediction tool, besides the traditional nomogram, and we also built a web version of the prediction tool to help doctors or patients predict the death risk of COVID-19 anytime and anywhere. (b) The variables involved in the model of this study could be obtained in a few minutes, without waiting for the laboratory test results for a long time, and could achieve the death risk of COVID-19 at an early stage. (c) The prediction model of our study had high specificity and relatively low sensitivity, which was helpful for doctors to identify those patients with a high risk of death at an early stage, optimize the allocation of medical resources, and alleviate the current shortage of medical resources. (d) The calibration curve showed that the predicted probability was greater than the observed probability in the training cohort and validation cohort. Although our model overestimated the risk of disease (27), our model would be beneficial for physicians to prepare in advance for patients who were likely to develop into severe diseases, and finally improve patients' prognosis. (e) Dynamic Nomogram is a web-based application (28) that integrates measures of AGE, SpO2, T, and MAP. We may use the mouse to choose values of the above four variables and then click the Predict button to calculate the probability of mortality in COVID patients.

Limitations of Research
(a) Since this study was a retrospective study, further prospective studies would be needed to verify the predictive value of our prediction model. (b) All the cases included in this study were hospitalized patients, which might lead to the limitation of its application population. (c) Our study lacked verification of external validity, the adaptive scope of the model in this study needed to be further verified. Meanwhile, the model in this study needed to be applied cautiously.

CONCLUSION
We developed a prediction model that might aid in the early identification of COVID-19 patients with a high probability of mortality on admission. However, further research is required to determine whether this tool can be applied for outpatient or home-based COVID-19 patients.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: 10.5061/dryad.7d7wm37sz.

AUTHOR CONTRIBUTIONS
HW conceived of the study and drafted the manuscript. HA, YF, QL, and RC participated in the statistical analysis. XM, Y-fM, ZW, TL, and YL participated the design of the study. KQ, CL, and JZ participated in its design and coordination and helped to draft the manuscript. All authors contributed to the article and approved the submitted version.