Characteristic of 523 COVID-19 in Henan Province and a Death Prediction Model

Certain high-risk factors related to the death of COVID-19 have been reported, however, there were few studies on a death prediction model. This study was conducted to delineate the clinical characteristics of patients with coronavirus disease 2019 (covid-19) of different degree and establish a death prediction model. In this multi-centered, retrospective, observational study, we enrolled 523 COVID-19 cases discharged before February 20, 2020 in Henan Province, China, compared clinical data, screened for high-risk fatal factors, built a death prediction model and validated the model in 429 mild cases, six fatal cases discharged after February 16, 2020 from Henan and 14 cases from Wuhan. Out of the 523 cases, 429 were mild, 78 severe survivors, 16 non-survivors. The non-survivors with median age 71 were older and had more comorbidities than the mild and severe survivors. Non-survivors had a relatively delay in hospitalization, with higher white blood cell count, neutrophil percentage, D-dimer, LDH, BNP, and PCT levels and lower proportion of eosinophils, lymphocytes and albumin. Discriminative models were constructed by using random forest with 16 non-survivors and 78 severe survivors. Age was the leading risk factors for poor prognosis, with AUC of 0.907 (95% CI 0.831–0.983). Mixed model constructed with combination of age, demographics, symptoms, and laboratory findings at admission had better performance (p = 0.021) with a generalized AUC of 0.9852 (95% CI 0.961–1). We chose 0.441 as death prediction threshold (with 0.85 sensitivity and 0.987 specificity) and validated the model in 429 mild cases, six fatal cases discharged after February 16, 2020 from Henan and 14 cases from Wuhan successfully. Mixed model can accurately predict clinical outcomes of COVID-19 patients.


INTRODUCTION
In late December 2019, Wuhan City, Hubei Province, China found several cases of unexplained pneumonia. On January 7, 2020, a new coronavirus was detected in the laboratory and the whole genome sequence of the virus was obtained. On January 12, 2020, the World Health Organization temporarily named this new virus 2019 novel coronavirus (2019-nCoV). On February 11, 2020, the World Health Organization announced that the same time the International Virus Classification Committee named the new coronavirus "SARS-CoV-2." Although the lethal rate of SARS-CoV-2 is not as high as SARS and MERS, it is more infectious than other viruses including influenza virus (1-3). The range of basic regeneration number (Ro) is estimated to be 2-5 (4,5). China has effectively controlled the epidemic by adopting strict prevention and control measures, but in areas outside China, the epidemic of novel coronavirus is still spreading. The number of infections caused by SARS-CoV-2 is large and no specific therapeutic is available yet, which is the main cause of so many deaths. SARS-CoV-2 can cause pneumonia and systemic inflammation, leading to multiple organ failure in high-risk patients. More and more studies have focused on the high-risk factors of death. Demographic factors, advanced age, combined underlying diseases, and D-dimer exceeding 1 µg/L have been confirmed as risk factors for death in adult patients (6). In the absence of vaccines and specific antiviral drugs, targeted application of supportive therapy may be beneficial to relieve symptoms and protect organ functions (7). How to quickly identify high-risk patients in the early stage of the disease and actively adopt supportive treatment to reduce mortality is an urgent problem to be solved in the clinic. Cao Bin (6) and others reported some characteristics and clinical progress of the early stage of severe and dead patients, which improved our further understanding of the characteristics of dead patients. However, there are no relevant studies on the application of models to predict COVID-19 death. Using admission characteristics and laboratory test results to establish a predictive model can calculate the probability of over-all mortality due to SARS-CoV-2, identify high-risk patients as early as possible and give support to reduce mortality as soon as possible.
In this study, we collected data of 523 discharged cases of novel coronavirus infection in Henan Province, China and compared the demographics, clinical characteristics, laboratory test, imaging between the mild, severe survivors and nonsurvivors. We established a death prediction model using the data upon admission of the severe survivors and non-survivors.

Study Design and Participants
From January 22, 2020 to February 20, 2020, a total of 717 patients confirmed COVID-19 were discharged in 18 cities of Henan Province, China, of which 19 died. We designed a data collection table, including age, gender, epidemiological history, past history, clinical symptoms, laboratory examination, chest CT and recorded the treatment process and clinical outcome, and data of 556 patients with novel coronavirus pneumonia discharged before February 20, 2020 was collected. All data were checked by two physicians (AL and XM) and a third researcher (QZ) adjudicated any difference in interpretation between the two primary reviewers. For different interpretations and missing data, we contacted the doctor who filled out the form and the patient or their family members to review and supplement. Excluding 18 cases under the age of 18, 10 cases missing key information and five cases transferred to other hospitals with no end point, 523 cases were included for statistical analysis, of which 19 cases died including three fatal cases with data missing. According to the Guidance for Corona Virus Disease 2019 (6th edition) released by the National Health Commission of China, the enrolled cases were categorized as mild or severe (8). There were no deaths in the mild. According to the clinical outcome, we divided the severe into severe survivors and non-survivors. Up to April 1, there were 22 cases died of COVID-19 in Henan Province. We have managed to collect data of another six fatal cases of Henan Province and 14 cases from the Fourth People's Hospital of Wuhan to validate the predictive power of the model. The flow diagram of included patients is shown in Figure 1.

Definition
The incubation period was defined as the interval between the potential earliest date of contact of the transmission source (wildlife or person of suspected or confirmed case) and the potential earliest date of symptom onset (i.e., cough, fever, fatigue, or myalgia). We excluded cases with an incubation period of <1 day or cases of continuous exposure, because those patients continued to be infected. Fever was defined as an axillary temperature of 37.3 • C or higher. Lymphopenia was defined as a lymphocyte count of <1,200 per cubic millimeter. Thrombocytopenia was defined as a platelet count of <100,000 per cubic millimeter. Chest CT was divided into normal, mild, moderate and severe infections according to the range of lesions. The range of lesions < 15% was mild; the range of lesions 15-40% was moderate; the range of lesions > 40% was severe.

Statistical Analysis
Statistical analyses on cohort characteristics were performed on R version 3.6.1. Participants' demographic, laboratory findings and questionnaire were summarized with a standardized statistical significance test method, categorical variables were shown as counts and percentages [n (%)], and associations were tested using a fisher' exact test. Continuous variables were shown as median (interquartile range, IQR), and differences between groups were analyzed with non-parametric test (Wilcoxon's ranksum test). A single-sided p < 0.05 was considered statistically significant. Discriminative models were constructed by using random forest with leave-one-out cross validation, features were selected by using embedded backward selection. Missing data were filled by chose median value in relative cohort (Severe death, severe survival, and mild) for model construction and validation. Receiver operating characteristic (ROC) curve and Precision-Recall curve were visualized by using R program package "pROC" and "precrec, " respectively.

RESULTS
Clinical Characteristics of the Study Patients According to Disease Severity and Clinical Outcome in Severe . Muscle and joint pain, runny nose, diarrhea, dizziness, and headache were rare. The symptoms of fever, cough, dyspnea, gasp, chest tightness, nasal congestion, and muscle and joint pain had a higher incidence in severe cases, and the difference was significant; the incidence of chest tightness in non-survivors was higher than that in severe survivors. The patients in the non-survivors had more symptoms at the onset.  Four (0.82%) had a respiratory rate > 24 breaths/min, one of them died; 8 (4.85%) pulse oxygen saturation < 90%, all severe; median body temperature 37.2 • C (IQR 36.7-37.9), 293 (59.07%) body temperature < 37.5 • C, 16 (3.23%) body temperature > 39 • C and 80% non-survivors body temperature < 37.5 • C upon admission.
The median duration from onset of symptoms to first visit to doctor was 2 days (IQR 0-5), from onset of symptoms to first hospitalization 4 days (IQR 2-7) while 8 days (IQR 6-10) in nonsurvivors. The median incubation period was 5 days (IQR 1-9), with no significant difference between the cases. Table 2 shows the imaging and laboratory examination results. Of all the cases, 419 patients had detailed chest CT data on initial admission, with 17 (4.06%) being normal; 224 (53.46%) chest CT lesions < 15%; 154 (36.75%) chest CT lesions between 15 and 40%; 24 (5.73%) chest CT lesions > 40%, of which 15 were severe. In the non-survivors, 100% of patients had a chest CT lesion area of more than 15% for the first time.

Radiographic and Laboratory Findings on Admission
In the first nucleic acid testing, 323 (65.25%) were confirmed positive for SARS-CoV-2. The leucocyte count in non-survivors (8.66 × 10 9 /L [IQR 7-12.335]) was significantly higher than that in mild and severe survivors. Lymphocytopenia is more common in the severe than in the mild (39.24 vs. 18.16%). 96.65% of patients experienced a decrease in eosinophil count. The level of D-dimer at admission was significantly higher in severe patients The alanine aminotransferase, lactate dehydrogenase and creatine kinase in the severe were significantly higher than those in the mild, and the non-survivors was more obviously, the difference was significant. The incidence of renal impairment was higher in the non-survivors. The incidence of arterial blood gas hypoxia and respiratory alkalosis on admission in the non-survivors was higher than that in the mild and the severe survivors. Three hundred and seventy-two people were tested for C-reactive protein (CRP) upon admission.
Two hundred and eleven (56.72%) had CRP > 10 mg/L. The increase rate in the severe (85.51%) was significantly higher than that in the mild (50.17%). Two hundred and thirty-five patients were tested for procalcitonin (PCT) upon admission, and 100% patients in the non-survivors had elevated PCT. Patients in nonsurvivors had more laboratory abnormalities than those in mild and severe.

Treatments During the Hospitalization
Two hundred and seventeen (41.49%) patients received respiratory support during hospitalization, of which 18 (4.2%) of mild patients received nasal catheter inhalation, as shown in Table 3. The respiratory support rate of the severe was higher than that of the mild, and the non-survivors all received mechanical ventilation treatment, of which six received noninvasive mechanical ventilation treatment and 11 received invasive mechanical ventilation treatment. Nine patients in the severe received ECMO treatment, and no one survived. Thirty-nine (52.7%) of the severe survivors were treated with CRRT, and only 5 (33.33%) of the non-survivors applied this technique. In terms of drug treatment, antiviral treatment was commonly used in each group. The severe had a higher proportion of antibiotics than the mild, and the non-survivors had a higher proportion of carbapenem and glycopeptide antibiotics than the survivors. One hundred and twelve (21.41%) received glucocorticoid therapy, and the non-survivors received a higher proportion of glucocorticoid therapy than the severe survivors (62.5 vs. 41.03%).

Death Prediction Model
We constructed classification models to evaluate death risk for severe patients. Model performance was assessed by receiver operating characteristic (ROC) curve analysis using the area under the curve (AUC). In considering age is among leading risk factors for poor prognosis in several studies (3,6,7,(9)(10)(11), we firstly constructed models by using single age, which could achieve and AUC of 0.907 (95% CI 0.831-0.983) for death and alive severe COVID-19 patients. Mixed models constructed with combination of age, demographics, symptoms, and laboratory tests when firstly admitted to hospital had better   performance (p = 0.021) and could achieved an AUC of 0.984 (95% CI 0.961-1) for death and alive severe COVID patients (Figures 2A,B). In considering fetal cases are with a small sample size, we randomly chose 40 samples from severe cases, then calculated the generalized AUC by using death probabilities and the median generalized AUC was 0.9852 ( Figure 2C). Pulse oxygen, age, creatinine, creatine kinase, D-Dimer are the most important features ( Table 4). We chose 0.441 as death prediction threshold (with 0.85 sensitivity and 0.987 specificity), then used six additional fatal cases (Henan), 429 mild cases and 14 cases (Wuhan) as independent validation cohort, and four in six death cases (0.67%) were assigned as death and majority of predicted death probabilities in the mild Henan cases and those Wuhan cases were below 0.441 ( Figure 2D). Summary characteristics of six Henan additional fatal cases and 14 Wuhan cases and were outlined in Table 5.

DISCUSSION
Henan Province has a large population of 95.593 million people, bordering Hubei Province, China. As of April 1, 2020, there were 1,273 people confirmed COVID-19 in Henan, which was the second most in China outside Hubei Province. We collected data of 523 confirmed COVID-19 cases who had been discharged from 18 cities in Henan Province before February 20, 2020 and conducted statistical analysis. Our data showed that the main epidemiological characteristics of novel coronavirus pneumonia in Henan Province were import and cluster, which were similar to other provinces and cities outside Hubei in China. Among the 523 cases, there were 289 males (55.26%) and 234 females (44.74%). Other reports also showed a higher percentage of males (9,12,13), suggesting that males were more susceptible. Our study suggested that people of all ages were generally susceptible, with people aged 18-64 accounting for 87.96%, which was consistent with the Chinese CDC report (3). In our study, there were 16 fatal cases before February 20, 2020, and 87.5% of the deaths were ≥65 years old, with a median age of 71 years, while the median age for the mild and severe survivors was 42 and 50 years, respectively. The most common comorbidities in the non-survivors were hypertension (46.67%), coronary heart disease (33.33%), diabetes (33.33%), and COPD (33.33%). The average number of comorbidities in non-survivors was 1.94. Several studies about severe novel coronavirus pneumonia in China suggested that advanced age and comorbidities were highrisk factors for COVID-19 patients to develop into severe and death (10,13,14). In our study, advanced age was the biggest risk factor for death, which was consistent with that. A study from Italy involving 1,043 critically ill COVID-19 cases showed similar results, but male patients accounted for a higher proportion (82%) (9). The median incubation period of the 523 cases in Henan Province was 5 days, and there was no significant difference between mild and severe. The median time from the onset of symptoms to hospitalization in the non-survivors was 8 days, and it was significantly longer than the severe survivors, suggesting that a delay in hospitalization might be one of the factors leading to death. Fever (88.74%), cough (62.3%), fatigue (39.58%), and expectoration (28.75%) were the most common symptoms. In spite of more symptoms, 60.87% of the severe and 80% of non-survivors had a temperature below 37.5 • C at the time of admission. Zhong et al.'s study on 1,099 cases of COVID-19 also found that 52% of patients did not have fever when they became ill (12). The lack of fever symptoms made it difficult to identify COVID-19 patients and could also be one of the factors that caused a delay in visiting the doctor. Another study on refractory COVID-19 also found that the refractory pneumonia cases had a significantly lower fever incidence than the common pneumonia cases, suggesting that slow or poor response to SARS-CoV-2 was more likely to cause severe illness (15). Compared with the mild and severe survivors, the nonsurvivors had higher leucocyte count, neutrophil percentage, D-dimer, LDH, BNP, and PCT levels, while the proportion of eosinophils, lymphocytes and albumin were lower, which was consistent with other studies. White blood cell count, neutrophil percentage and elevated PCT suggested that the nonsurvivors might be hospitalized with bacterial infection. Low albumin indicated that the patient was seriously depleted and the nutritional level was poor. D-dimer elevation had been confirmed in multiple studies as a high-risk factor for severe illness and death (10,16,17), which was consistent with our study. Chen et al.'s study found that in the non-survivors 56% had increased leucocyte count and 91% had lymphopenia, while in the severe survivors 4% had increased leucocyte count and 47% had lymphopenia (10). Zhang et al.'s study found that most COVID-19 cases combined with lymphopenia (75.4%) and eosinophilia (52.9%), and lymphopenia and eosinophilia were associated with disease severity (17). In our study, eosinophilia generally occurred in all cases, and there was no significant difference between the non-survivors and the severe survivors, but most of the eosinophils in the severe survivors returned to normal when discharged, while that of the non-survivors continued to decrease. Liu et al. also found that eosinophilia might be an indicator of disease improvement (18).
In the non-survivors, 100% of the patients had chest CT pneumonia area > 15% at admission, which was more severe in imaging than the mild and severe survivors. In terms of respiratory support, the rate of mechanical ventilation in the nonsurvivors was significantly higher than that in the mild and the severe survivors, which also suggested that the lung function of the non-survivors was more seriously impaired. In the nonsurvivors, the percentage of invasive mechanical ventilation was 68.75%, higher than other reports from Wuhan, China, but lower than those reported by the United States (71%) and Italy (88%), and Henan Province's mortality rate was also lower than that of the United States and Italy (9,19). In addition to the aging factor, the fatal rate difference between Italy and Henan Province could be due to the fact that the number of COVID-19 cases in Henan province was relatively smaller and the medical resources were relatively more sufficient. Nine patients were applied with extracorporeal membrane oxygenation and technology (ECMO), but no one survived. Research showed application of ECMO could reduce mortality of patients with H1N1-related ARDS and MERS-related ARDS (20, 21), but there was no large-scale Glucocorticoids had been widely used in SARS-CoV and MERS-CoV, but studies showed that the application of glucocorticoids prolonged the clearance time of virus and the probability of mental illness was significantly increased (24). Similarly, there was no evidence that glucocorticoids were beneficial to improve the prognosis of patients with COVID-19. Whether glucocorticoids can improve the prognosis of COVID-19 still requires long-term follow-up and further research. In our study, some independent risk factors for death were found and we firstly developed a forest tree to accurately predict clinical outcomes of patients with COVID-19 based on combination of age, demographic features, symptoms and clinical tests at admission. Old age was the most important risk factor for poor prognosis of COVID-19 patients. The mixed model conducted by forest tree performed well in predicting survival and death, with AUC of 0.984 (95% confidence interval 0.961-1) for survival and death, which is helpful for further understanding and improve clinical strategies against COVID-19. We also found the predicted value was positively correlated with the severity of COVID-19. Of the 14 confirmed cases from Wuhan, seven were mild, seven were severe, 13 were cured and discharged, and one was referred to other hospital due to critical illness. In the death prediction model based on Wuhan data, those with a predicted value >0.3 were all critically ill, and the respiratory support treatment intensity was higher than the other 10 cases. The predictive value of the case transferred to other hospitals due to critical illness was 0.673, unfortunately we failed to follow up on the clinical outcome. The death prediction model we have established has also been validated in mild and six other fatal cases in Henan Province. The prediction of death for all mild survivors was below 0.3 and 4 in six death cases (66.67%) were assigned as death.
Mild patients have rare fetal cases thus we excluded mild cases in the death prediction models. Several studies have constructed models for early identification of cases at high risk of progression to severe COVID-19 (11) or improved prognosis (25). However, fatal cases were always rapid disease progression and died in hospitals in a short time, though we have plenty of medical support in Henan province. To the best of our knowledge, this is the first death prediction model for COVID-19 established by random forest. The model can accurately predict the prognosis of patients with COVID-19. Our study provided a new method for the evaluation of disease severity. Early identification of highrisk COVID-19 cases and early supportive therapy is critical to the prognosis.
There are some limitations of our study. Firstly, this is a retrospective study. There was incomplete documentation of the history, symptoms, or laboratory findings in some cases, even after trying to feedback and recollect. Secondly, as a retrospective and observational study, although this random forest model was validated in mild cases and additional fatal cases in Henan Province and 14 cases from Wuhan and showed good predictive effects, there were few validators outside Henan Province. Thirdly, imageology lacked objective judgment standards, and the investigators' judgment was subjective, which might lead to some bias.

DATA AVAILABILITY STATEMENT
The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation. The datasets generated for this study can be found here: https:// github.com/xiaoshubaba/COVID-Henan.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Committee from The First Affiliated Hospital of Zhengzhou University. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements.