Assessment of risk scores to predict mortality of COVID-19 patients admitted to the intensive care unit

Objectives To assess the ABC2-SPH score in predicting COVID-19 in-hospital mortality, during intensive care unit (ICU) admission, and to compare its performance with other scores (SOFA, SAPS-3, NEWS2, 4C Mortality Score, SOARS, CURB-65, modified CHA2DS2-VASc, and a novel severity score). Materials and methods Consecutive patients (≥ 18 years) with laboratory-confirmed COVID-19 admitted to ICUs of 25 hospitals, located in 17 Brazilian cities, from October 2020 to March 2022, were included. Overall performance of the scores was evaluated using the Brier score. ABC2-SPH was used as the reference score, and comparisons between ABC2-SPH and the other scores were performed by using the Bonferroni method of correction. The primary outcome was in-hospital mortality. Results ABC2-SPH had an area under the curve of 0.716 (95% CI 0.693–0.738), significantly higher than CURB-65, SOFA, NEWS2, SOARS, and modified CHA2DS2-VASc scores. There was no statistically significant difference between ABC2-SPH and SAPS-3, 4C Mortality Score, and the novel severity score. Conclusion ABC2-SPH was superior to other risk scores, but it still did not demonstrate an excellent predictive ability for mortality in critically ill COVID-19 patients. Our results indicate the need to develop a new score, for this subset of patients.


Introduction
Since its breakthrough, the COVID-19 pandemic caused a collapse of healthcare systems around the world, with an exceeding demand for intensive care beds and mechanical ventilators (1,2).Increasing cases and widespread dissemination of SARS-CoV-2 created the perfect scenario for the acquisition of advantageous mutations, modifying viral transmissibility and disease severity, and allowing escape from natural or vaccine-mediated immunity (3,4).
In this context, a rapid, objective, and reliable evaluation of critically ill patients is fundamental for efficient triage, as well as for treatment, and resource allocation.Patients with COVID-19 may deteriorate rapidly after a period of reasonably mild symptoms, reinforcing the need for early risk stratification (5,6).
Our research group has developed the ABC 2 -SPH score, which is the only score developed and validated in Brazilian COVID-19 patients.It uses strict methodological criteria, with few, easily obtained clinical and laboratory data at hospital presentation to predict in-hospital mortality.ABC 2 -SPH score has shown high accuracy to discriminate between high-risk and non-high-risk patients, superior to several other scores in a large sample of Brazilian patients (7).Nevertheless, this score has not been validated yet to be applied at ICU admission.
Therefore, our aim was to assess the ABC 2 -SPH score, during intensive care unit (ICU) admission, in predicting COVID-19 in-hospital mortality, and to compare its performance with other scores: Sequential Organ Failure Assessment (SOFA), Simplified Acute Physiology Score III (SAPS-3), National Early Warning Score 2 (NEWS2), 4C Mortality Score, SOARS, CURB-65, modified CHA2DS2-VASc, and a novel severity score.

Materials and methods
This study is part of the Brazilian COVID-19 Registry, a retrospective multicenter cohort, which included data from 25 hospitals in Brazil, in 17 cities, with a total of 752 ICU beds, described in detail elsewhere (7).

Study subjects
Consecutive patients (aged ≥18 years) with laboratoryconfirmed COVID-19 (positive SARS-CoV-2 RT-PCR or rapid antigen test), according to World Health Organization guidance, admitted to the ICUs of one of the participating hospitals, between 4 October 2020, and 13 March 2022, were included.Patients with missing data in any of the variables used for the ABC 2 -SPH score, as well as pregnant patients and those who were admitted for other reasons and developed COVID-19 during their hospital stay were not included in this analysis (Figure 1).

Data collection
Demographic information, clinical characteristics, laboratory findings, therapeutic interventions, and outcomes were collected by trained researchers from patient charts to the Research Electronic Data Capture (REDCap) electronic platform, hosted at the Telehealth Center of the Hospital das Clínicas of Universidade Federal de Minas Gerais (UFMG) (8)(9)(10).For analysis, only the first ICU admission was considered if the patient had two distinct admissions in the same hospital stay.Periodical data quality checks were performed to ensure data accuracy.Values likely related to data entry errors were identified using a code developed in R software, based on expert-guided rules.Those data were sent to each center for checking and correction (7).

Sample size
Standardized methodology from the Transparent Reporting of a Multivariable Prediction Model for Individual Prediction or Diagnosis (TRIPOD) checklist (11) recommends that ideally, at least 250 events (in this case, deaths) and 250 non-events should be included for score validation.In the present analysis, there was no formal sample size calculation.Instead, all eligible patients were included, with a sample size that met those requirements.

ABC 2 -SPH
The ABC 2 -SPH score was developed, validated, and reported following guidance from the TRIPOD checklist (11, 12) and the Prediction model Risk Of Bias Assessment Tool (PROBAST) (13).
The score was derived from a population of 3,978 hospital inpatients, from 36 hospitals, using data upon hospital presentation.Validation was conducted on 1,054 inpatient records from the same institutions (temporal validation) and also on patients from the Vall d'Hebron University Hospital cohort (external validation) (7,14).

Comparison with other risk scores
The accuracy of the ABC 2 -SPH score was compared with that of other scores developed specifically for COVID-19.Additionally, we compared the ABC 2 -SPH score with scores developed for other conditions, such as pneumonia and sepsis, applied in severely ill or ICU patients and with early warning scores.The scores used for such comparisons were chosen based on two conditions: (1) they had already been evaluated for COVID-19 in other studies, and (2) they used parameters that were available within our database, with accessible methods for calculation (described in a previous publication).They are SOFA (15), SAPS-3 (16,17), NEWS2 (18), 4C Mortality Score (19, 20), SOARS (21), CURB-65 (22), and a novel severity score developed by Altschul et al. (23).A modified version of the CHA2DS2-VASc score tested in a previous publication to assess mortality in ICU COVID-19 patients (scoring for male sex instead of female) was included in the comparison as well (24).Model comparisons were performed using AUROC and the decision curve analysis.

Outcome
The primary outcome was all-cause in-hospital mortality (considering the entire period of hospitalization).

Statistical analysis
Continuous variables were summarized as medians and interquartile ranges (IQR), and categorical variables as counts and percentages.Data were imputed for variables with up to 30% missing values.This study reported 95% confidence intervals (CI), and a p-value < 0.05 was considered statistically significant.Statistical analysis was performed using the free software R (version 4.0.2), and the packages tidyverse, gt, gtsummary, ggplot2, and rms (25).ABC 2 -SPH was used as the reference score for every comparison since it is the only mortality risk score for COVID-19 tested and validated in the Brazilian population (7).Comparisons between ABC 2 -SPH and the other scores were performed by the Bonferroni correction method.

Performance measures
The area under the receiver operating characteristic curve (AUROC) described the models' discrimination Confidence intervals for AUROC were obtained across 2,000 bootstrap samples.
Overall performance of the scores was evaluated using the Brier score (26).Only the ABC 2 -SPH, SAPS-3, and 4C Mortality scores provided data that allowed calibration.It was performed by plotting the predicted mortality probabilities against the observed mortality, testing intercept equals zero and slope equals one.
We further performed a subgroup analysis comprising the worst phase of the pandemic in Brazil (between 1 March 2021, and 30 April 2021), according to epidemiological data provided by the Brazilian Ministry of Health (27).

Results
A total of 3,037 patients were included, 55.9% were men, with a median age of 61 (IQR 50-70) years old and overall mortality of 50.0%.When comparing patients who died with those who were discharged alive from the hospital, the first group was older and had a higher prevalence of underlying comorbidities such as hypertension, coronary artery disease, heart failure, chronic obstructive pulmonary disease, and cancer, moreover lower platelet levels, higher urea, and C-reactive protein levels, at ICU admission (Supplementary Table S1).
Table 1 and Figure 2 show the discrimination ability expressed as the AUROC for each of the scores evaluated, while Table 2 depicts the results of the statistical comparison between these scores and ABC 2 -SPH, selected as the reference score.
As seen in Table 2, ABC 2 -SPH had higher discrimination than CURB65, SOFA, NEWS2, SOARS, and modified CHA2DS2-VASc scores (AUROC: 0.716 [95% CI 0.693-0.738]).There was no statistically significant difference between ABC 2 -SPH and SAPS-3, 4C Score, and the novel score by Altschul.Even though the AUROC of SAPS-3 was the second lowest in absolute terms (0.614, 95% CI 0.566-0.663),there was no statistically significant difference between that and the ABC 2 -SPH  The calibration curve indicates that the ABC 2 -SPH underestimated mortality at lower ranges of the score and overestimated it at the higher ones.In other words, the less severely ill patients have had a worse outcome than the score could predict, as seen in Figure 3A.SAPS-3 had an even greater underestimation of mortality at lower ranges and overestimation at the higher ranges (Figure 3B).The 4C Mortality score, on the other hand, underestimated mortality through all the ranges of the score (Figure 3C).The calibration curves could not be produced for the remaining scores because it was not possible to access their original derivation data.

Discussion
In the present study, ABC 2 -SPH presented a reasonable performance when applied during ICU admission in predicting COVID-19 in-hospital mortality, and it was significantly better than CURB-65, SOFA, NEWS2, SOARS, and the modified version of CHA2DS2-VASc.When comparing the performance of the ABC 2 -SPH to the SAPS-3, 4C Score, and the score by Altschul et al., we did not observe significant differences.
In the context of the COVID-19 pandemic, many new risk scores were developed and others were tested, or even adapted.A modified version of CHA2DS2-VASc score (giving 1 point for the male sex and 0 points for the female sex, considering male sex a risk factor for COVID-19) was evaluated in 209 intensive care patients, with the rationale that endothelial dysfunction and thrombosis are important components of COVID-19 pathophysiology, but it had fair results (24).
Most of the studies carried out to test or develop risk scores for COVID-19 patients at ICU admission used small samples, increasing the imprecision and compromising the external validity of the results.For instance, a prospective study compared different early warning scores, applied at admission to the ICU, to predict mortality in 140 critically ill patients with laboratory-confirmed COVID-19 (18).The overall performance was intermediate, and the confidence intervals were too wide, conferring significant imprecision to the results.CRB-65, the best discriminatory tool in that study, showed an AUC of 0.720 (95% CI 0.630-0.811).
In a larger study, the performance of SAPS-3 was evaluated in 30,571 COVID-19 patients admitted to ICUs in Brazil.The model's discrimination was excellent, with an AUROC of 0.835 (95% CI 0.828-0.841).However, the mortality was considerably lower than in our cohort (15.0% vs. 50.0%),as well as in other studies with critically ill COVID-19 patients from varied countries, which had a mortality rate between 26 to 50% (28-33).The low mortality rate may have influenced SAPS-3 outperformance in that specific study.Still, the calibration was inappropriate, with an underestimation of mortality in lower to intermediate-risk groups, and an overestimation in the higher-risk group (16).
An Italian group developed and internally validated a prediction model for 28-day mortality of critically ill COVID-19 patients admitted to the ICU.This study used clinical variables (age, obesity,  procalcitonin, SOFA score, PaO 2 /FiO 2 ratio), with an excellent discriminatory capacity of 0.821 (95% CI 0.766-0.876)and 0.822 (95% CI 0.770-0.873), in the original and bootstrap models, respectively (34).Nevertheless, some limitations should be mentioned: the model lacks external validation, the authors included a relatively small sample of participants, and the inclusion of serum procalcitonin (a less available laboratory test) limits the widespread use of this score.In a multicenter cohort in Italy, a machine learning (ML) approach was applied for the development and validation of a predictive model, utilizing many clinical variables.The performance was better when the variables were collected both at ICU admission and during ICU stay (even though with more than 85% of missing data) and were less satisfactory considering only the variables collected at ICU admission that had less than 85% of missing data (35).The sample was modest for a ML approach, with only 1,293 patients for score development, and less than 100 events in the external validation datasets.Still, there was no information on the imprecision of the results, as the authors did not provide the confidence intervals.

A B C
Knight et al. ( 2020) developed and validated the 4C Mortality Score, which uses eight variables readily available at hospital admission, with reasonable discrimination for mortality (AUC 0.774, CI 95% 0.767-0.782)and excellent calibration.Nevertheless, this score was aimed to be used at the moment of hospital admission, not necessarily at ICU admission, and has not been validated for such use (20).
A multicenter retrospective cohort study carried out in Spain and conducted on patients transferred by ambulance to an emergency department evaluated the NEWS2 performance.The NEWS2 score provided an AUROC ranging from 0.825 for 1-day mortality to 0.777 for 90-day mortality.Nevertheless, the hospitalization rate of the 2,961 patients included was 78.6%, while patients that required ICU admission represented only 5.5% of the total participants, and no subgroup analysis was made (36).
The validation of the ABC 2 -SPH in a large cohort of patients admitted to ICU due to COVID-19 complications could be helpful, given that other scores proved to be inaccurate in this scenario.Nevertheless, despite its excellent discrimination for mortality at hospital admission, the results were only reasonable when applied at ICU admission.The AUROC of 0.716 (95% CI 0.693-0.738)was considerably inferior to that observed in the original study (7).The same happened with the widely used SAPS-3, SOFA, and NEWS2, as described above, which had a worse performance than ABC 2 -SPH.
We initially hypothesized that one of the reasons that could explain such unsatisfactory performances is that our cohort was composed exclusively of patients from Brazilian hospitals, including patients admitted during the worst wave of the pandemic in Brazil (27).This could have affected the performance of the scores, since the collapse of the health system may have led many patients to be admitted to ICUs at late phases of the disease, making their recovery more difficult.Another possibility could be that, under the huge saturations of the ICUs during the worst waves, the most critically ill patients did not get admitted into the ICU, with the ones with a better prognosis getting the priority.Nevertheless, in a subgroup analysis of the patients evaluated during the worst phase of the pandemic in Brazil, between 1 March 2021 and 30 April 2021, there was no significant difference in the performance of ABC 2 -SPH (Supplementary Table S2).
Some aspects of each score may have had a negative impact on their performance in this study.ABC 2 -SPH, for instance, uses the SF ratio (SpO 2 /FiO 2 ) as one of its parameters: the lower the ratio, the higher the score, indicating a higher probability of death.Nevertheless, patients admitted to the ICU are frequently on mechanical ventilation (38.1% of all patients evaluated, being 49.1% among those who died and 27.3% among the survivors), which may lead to an inadequate degree of hyperoxia, not necessarily a less severe clinical state, and this could potentially mislead the score.
Besides that, of all the parameters included in ABC 2 -SPH, involving different organ systems, only the SF ratio is directly related to the respiratory system, which is the main cause of death in COVID-19 patients (37).Perhaps, the inclusion of more parameters related to the respiratory system, such as the severity of lung involvement in computerized tomography, could improve the accuracy of the score.The use of imaging methods might cause some mistrust, being it operator-dependent, but the development of machine-learning techniques could eventually surpass this issue.
On the other hand, SOFA includes the mean arterial pressure as one of its parameters, giving it the same value as PaO 2 /FiO 2 ratio for the score (0 to 4 points).Nevertheless, unlike respiratory impairment, hypotension does not seem to be part of the main core of COVID-19 mortality, in the absence of a specific cause.
Likewise, SAPS-3 uses many different parameters which might not be as relevant for COVID-19 mortality.Age just above 40 years already scores 5 points, enough to almost double the probability of death.In contrast, according to our database, the risk of death in the age group of 40-49 years old is 33.5%, compared to 25.6% of those aged 18-29 years old.The risk of death, in reality, only doubles in the age group of 60-69 years old (54.1%) (Supplementary Table S3).Furthermore, SAPS-3 includes a large number of variables that do not apply to our set of patients, such as the reason for ICU admission (in this study, admission for some reason other than COVID-19 was an exclusion criterion).And the same way that SOFA, mean arterial pressure is as valued as PaO 2 /FiO 2 ratio.
Therefore, we hypothesized that such imbalances between clinical importance and the weight of each variable included in the scores could be a reason for such unsatisfactory performances.
This study has limitations that deserve comments.Hospitals from different regional settings and different sizes were included in the study to increase external validity.However, infrastructure unbalances between them may have impacted the results.In addition, some of the scores ended up with fewer participants than others due to incomplete data, since data were imputed for variables with up to 30% missing values.SAPS-3, as mentioned above, is an example of that.Furthermore, the scores chosen to be included in the analysis were limited to the parameters available within our database, leaving some others out of the study.
Further and periodical adjustments, in a similar manner that happens with other risk scores which are subjected to continuous updates (such as APACHE and SAPS), should also be considered for ABC 2 -SPH.

Conclusion
In this study, applying ABC 2 -SPH at ICU admission had a reasonable performance in predicting in-hospital mortality of COVID-19 critically ill patients, superior to other risk scores.In order

FIGURE 1 Flowchart
FIGURE 1Flowchart of COVID-19 patients included in the study.

FIGURE 2
FIGURE 2Discrimination of ABC 2 -SPH and other scores in this cohort.

TABLE 1
Discrimination ability for each score applied in the database of COVID-19 patients admitted to the intensive care unit.
*Complete case analysis.Data were imputed for variables with up to 30% missing values.

TABLE 2
Comparison between ABC 2 -SPH and other scores., area under the ROC curve.*Due to the multiple comparisons, alpha was corrected using Bonferroni method.**ABC2-SPH has higher discrimination ability.The bold values indicate which p-values are lower than the alpha, meaning that the AUROC of ABC2-SPH is larger than that of the score being compared. AUROC 10.3389/fmed.2023.1130218Frontiers in Medicine frontiersin.orgto excellent performance, nevertheless, it may be necessary to develop a new score for this specific subset of patients.