Assessment of the ABC2-SPH risk score to predict invasive mechanical ventilation in COVID-19 patients and comparison to other scores

Background Predicting the need for invasive mechanical ventilation (IMV) is important for the allocation of human and technological resources, improvement of surveillance, and use of effective therapeutic measures. This study aimed (i) to assess whether the ABC2-SPH score is able to predict the receipt of IMV in COVID-19 patients; (ii) to compare its performance with other existing scores; (iii) to perform score recalibration, and to assess whether recalibration improved prediction. Methods Retrospective observational cohort, which included adult laboratory-confirmed COVID-19 patients admitted in 32 hospitals, from 14 Brazilian cities. This study was conducted in two stages: (i) for the assessment of the ABC2-SPH score and comparison with other available scores, patients hospitalized from July 31, 2020, to March 31, 2022, were included; (ii) for ABC2-SPH score recalibration and also comparison with other existing scores, patients admitted from January 1, 2021, to March 31, 2022, were enrolled. For both steps, the area under the receiving operator characteristic score (AUROC) was calculated for all scores, while a calibration plot was assessed only for the ABC2-SPH score. Comparisons between ABC2-SPH and the other scores followed the Delong Test recommendations. Logistic recalibration methods were used to improve results and adapt to the studied sample. Results Overall, 9,350 patients were included in the study, the median age was 58.5 (IQR 47.0–69.0) years old, and 45.4% were women. Of those, 33.5% were admitted to the ICU, 25.2% received IMV, and 17.8% died. The ABC2-SPH score showed a significantly greater discriminatory capacity, than the CURB-65, STSS, and SUM scores, with potentialized results when we consider only patients younger than 80 years old (AUROC 0.714 [95% CI 0.698–0.731]). Thus, after the ABC2-SPH score recalibration, we observed improvements in calibration (slope = 1.135, intercept = 0.242) and overall performance (Brier score = 0.127). Conclusion The ABC2-SPHr risk score demonstrated a good performance to predict the need for mechanical ventilation in COVID-19 hospitalized patients under 80 years of age.


Background
Since its inception, the COVID-19 pandemic has triggered an unprecedented crisis in health systems worldwide, with increased demand for intensive care unit (ICU) beds and mechanical ventilation (1).Although studies highlight the substantial impact of vaccination on the trajectory of the pandemic, with up to 90% protection against COVID-19-associated invasive mechanical ventilation (IMV) and death among adults (2,3).It is estimated that the mortality rate associated with IMV continues overcoming 30% (4).A recent systematic review and meta-analysis found a 43% (95% CI 0.29-0.58)pooled IMV mortality rate (1).Knowledge of COVID-19 intensive care unit (ICU) and associated IMV patient characteristics, and outcomes as well as analyzing their regional variability is critically important for patient management and allocation of resources (1).Therefore, it may be helpful to predict which patients are more likely to progress to IMV, to subsidize more assertive health decisions.
Although different prognostic scores have been proposed to predict IMV among COVID-19 patients, the majority of them present methodological limitations, restricting their clinical applicability (for more details, see Supplementary Table S1).Furthermore, most scores were developed in high-income countries, without external validation in low-and middle-income countries.
In this context, the ABC 2 -SPH risk score for predicting in-hospital mortality was rigorously developed and validated in Brazilian patients with high discrimination (5).This score is the only mortality risk score for COVID-19 tested and validated in the Brazilian population (5).It predicts in-hospital mortality in patients with COVID-19 using easily accessible variables on admission: Age, BUN (blood urea nitrogen), Comorbidities, C-reactive protein, SpO 2 /FiO 2 ratio, Platelet count, and Heart rate.The score ranges from 0 to 20, with the following risk groups: low (0-1), intermediate (2)(3)(4), high (5)(6)(7)(8), and very high (≥9).It is freely available as an online risk calculator. 1 It was developed in a cohort of 3,978 patients admitted to 36 hospitals in five Brazilian states.The validation was carried out on 1,054 patients admitted to the same institutions (temporal validation) and also in a cohort with 474 Spanish patients (external validation).It has shown good overall performance for temporal (AUROC = 0.859 [95% CI 0.833 to 0.885], Brier = 0.108 and calibration [slope = 1.138, intercept = 0.114, value of p = 0.184]) and external validation (AUROC = 0.894 [95% CI 0.870 to 0.919] Brier = 0.093) (5).However, evidence of its accuracy for IMV prediction is still lacking.Therefore, our aims were: (i) to assess whether the ABC 2 -SPH score is able to predict IMV in COVID-19 patients; (ii) to compare its performance with other existing scores; (iii) to perform score recalibration, and to assess whether recalibration improved prediction.

Study design
This study is a substudy of the retrospective multicenter cohort Brazilian COVID-19 Registry, conducted in 32 Brazilian hospitals, in 14 cities from five Brazilian states (Minas Gerais, Pernambuco, Rio Grande do Sul, Santa Catarina and São Paulo), described in detail elsewhere (6).The study was approved by the National Commission for Research Ethics (CAAE 30350820.5.1001.0008)and the individual informed consent was waived due to the pandemic circumstances and analysis of unidentified data.

Study population
The cohort study included consecutive adult patients (≥18 years-old) with laboratory-confirmed COVID-19, according to World Health Organization guidance (7), admitted in one of the participating hospitals.For the assessment of the ABC 2 -SPH score and the comparison with other scores, patients admitted from July 31, 2020, to March 31, 2022, were included.For ABC 2 -SPH score recalibration and also comparison with other scores, patients admitted from January 1, 2021, to March 31, 2022, were enrolled.However, for recalibration, only patients younger than 80 years were included, since mortality is particularly high for mechanical ventilation at an older age.This supported recommendations for conservative treatment for elderly and/or frail patients (8)(9)(10).
Patients with at least one of the following conditions were excluded: (i) pregnant women; (ii) "do not resuscitate" order; (iii) patients who manifested COVID-19 while admitted for other conditions; (iv) those transferred to other hospitals who had no defined outcome (discharged or death); (v) patients who were already on IMV at hospital presentation; and (vi) exclusively for score recalibration, patients ≥80 years old (Figure 1).

Data collection
Medical records were reviewed to collect data concerning the patients' characteristics, including age, sex, pre-existing comorbid medical conditions and medications taken at home; COVID-19associated symptoms at hospital presentation; clinical assessment upon hospital presentation; laboratory results; inpatient medication, treatment, and outcomes.The data collection instrument was designed with reference to COVID-19 guidelines from the World Health Organization and the Brazilian Ministry of Health, as previously described (6).
A detailed guidance manual for data collection was developed, containing the definitions used in the study (Supplementary material).It was provided to all participating centers, and online training was mandatory before local research personnel were allowed to start collecting study data.
Data was collected by trained researchers from the medical records, using Research Electronic Data Capture (REDCap ® ) (version 7.3.1)(11, 12), hosted at the Telehealth Center of the University Hospital, of the Universidade Federal de Minas Gerais (13).To ensure reliability and monitor data, a code was developed in the R software that periodically verified possible data entry errors.When detected, the analysts notified the participating center for correction.

Outcomes
The primary outcome was IMV during hospitalization.

Sample size
Model validation followed guidance from the Transparent Reporting of a Multivariable Prediction Model for Individual Prediction or Diagnosis (TRIPOD) checklist (14,15) and the Prediction model Risk Of Bias Assessment Tool (PROBAST) (16).TRIPOD checklist ideally recommends at least 100 events (as deaths) and 100 non-events as samples for score validation.In the present analysis, the sample size was not calculated, since all patients eligible by the inclusion criteria were enrolled.
The statistical analysis was divided into two stages: (i) evaluation of the ABC 2 -SPH risk score for predicting IMV in COVID-19 patients and comparison with other available scores; and (ii) recalibration of the ABC 2 -SPH score, as well as comparison with other scores.
The main characteristics of the scores are listed in Supplementary Table S1.The comparison of ABC 2 -SPH (5) with other scores (17)(18)(19)(20)(21)(22)(23)(24) was performed using the number of complete cases for each score (non-imputed database) through a procedure for unpaired receiving operator characteristic (ROC) curves that is an extension of Delong et al. recommendations (25).This procedure was implemented

Score recalibration
The score was recalibrated, in an attempt to improve the prediction risk of IMV among patients with COVID-19.The sample of patients included COVID-19 patients under 80 years of age, divided into derivation (from January 1 to April 30, 2021) and validation cohorts (from May 1, 2021, to March 31, 2022), resulting in approximately 75 and 25% of the sample, respectively.This division guarantees the minimum of 100 events in the validation cohort, as recommended by the TRIPOD checklist (14,15).
The recalibration methods consisted of fitting a logistic regression model [for more details, see Steyerberg et al. (26)] in the derivation sample and the evaluation of the method was done in the validation sample.

Missing data
To handle missing values, multiple imputation by chained equations (MICE) was used, considering missing at random assumption.The imputation technique included all variables with up to 30% missing values.The prediction of missing values was performed using all variables included in the analysis.Invasive mechanical ventilation was not imputed, and was not used as a predictor in the MICE model in the validation dataset.The predictive mean matching (PMM) method was used for continuous predictors and polytomous regression for categorical variables.Ten imputed datasets were obtained with 10 iterations, and their results were combined following Rubin's rules (27).

Performance measures
Model's discrimination was assessed by the area under the ROC curve (AUROC), with 95% confidence interval (95% CI) calculated by bootstrap resampling, through 2,000 samples.A value of 0.5 indicates no predictive ability, 0.60 to 0.69 is considered poor, 0.70 to 0.89 good, and 0.90 to 1.0 excellent (28).
The accuracy of the predictive model was assessed using the Brier score, a measure that quantifies how close predictions are to the truth (29).The score ranges between 0 and 1, in which smaller values indicate superior model performance.Results were stratified by age groups (<60, 60-69, 70-79 and ≥ 80 years-old), sex and presence or absence of key comorbidities before recalibration, to assess score performance in different subgroups.
Calibration was assessed graphically by plotting the predicted IMV probabilities against the observed IMV, testing intercept equals zero and slope equals one, simultaneously.

ABC 2 -SPH assessment and comparison with other risk scores
Overall, 9,350 patients were included in the study, the median age was 58.5 (IQR 47.0-69.0)years old, and 45.4% were women.Of those, 33.5% were admitted to the ICU, 25.2% received IMV, and 17.8% died.Patients who received IMV were older; had a higher frequency of hypertension, diabetes, obesity, chronic kidney disease, rheumatologic disease and previous transplant; a higher number of comorbidities; and a higher frequency of ICU, dialysis, thromboembolism and mortality, when compared to those who did not receive IMV (Table 1).They also had a higher frequency dyspnea, cough, fever, nausea, and arthralgia; clinical findings such as fever, tachycardia and arterial hypotension (Supplementary Table S4) and laboratory findings such as neutrophilia, lymphopenia, thrombocytopenia and increased lactate, D-dimer and C-reactive protein, when compared to those who did not receive IMV (Supplementary Tables S4, S5).
The AUROC for the ABC 2 -SPH 0.677 (0.661-0.694), and the Brier score 0.196.Subject-specific risks were calculated, and patients were classified according to ABC 2 -SPH risk groups (Table 2).Score's performance was worse among older patients, especially the octogenarians, and patients with chronic pulmonary obstructive disease (Supplementary Table S1).
For the comparison with other scores, the main characteristics of each score are shown in Supplementary Table S2.When compared with other scores in a complete case analysis, the ABC 2 -SPH score achieved a significantly higher discriminatory capacity than CURB-65, STSS, and SUM scores (Table 3; Figure 2A).When assessing specifically the sample < 80 years, ABC 2 -SPH score still achieved a significantly higher discriminatory capacity than CURB-65, STSS, SOFA and SUM scores (Supplementary Table S3).
The calibration curve indicates that the ABC 2 -SPH underestimated IMV at lower ranges of the score and overestimated it at the higher ones, as observed in Figure 2B (slope = 0.557, intercept = −0.097,value of p < 0.001).

ABC 2 -SPH score recalibration
When assessing specifically the sample of patients used for score recalibration (<80 years-old admitted to hospital with COVID-19, from January 1, 2021, to March 31, 2022), patients from the validation cohort had a slightly lower age, frequency of hypertension and inotropic requirement; a slightly higher frequency of atrial fibrillation and COPD; a higher frequency of smoking and a lower frequency of outcomes than the derivation cohort (Table 4; Supplementary Table S6).As for laboratory findings, there were no clinically relevant differences (Supplementary Table S7).
When assessing score performance in this sample before calibration (Table 5), the AUROC for ABC 2 -SPH was superior to the assessed scores.The recalibrated ABC 2 -SPH score, named as ABC 2 -SPHr score, obtained good overall performance (Brier score = 0.132) and calibration (slope = 1.048, intercept = 0.378, value of p < 0.001) (Figure 3) in the validation subsample.

Discussion
The original ABC 2 -SPH score presented poor discrimination to predict IMV in COVID-19 patients, with an AUROC lower than 0.70, and poor calibration (slope = 0.550, intercept = −0.031,value of p = <0.00).When compared with other scores, it showed a significantly greater discriminatory capacity than the CURB-65 (19), STSS (22), and SUM (23) scores.When assessing data from patients <80 years-old Since December 2019, over 6.9 million deaths related to COVID-19 have been reported worldwide (30).The unprecedented spread of the virus and the high proportion of severely ill patients created widespread disarray.In this context, the medical community encountered saturated hospitals and strained resources, especially related to IMV, as well as the need to provide accurate information on morbidity and prognosis of the disease to patients and families.Based  on this, important ethical questions about intensive care rationing in ICUs had been asked (31).Therefore, it may be helpful to predict which patients are more likely to progress to IMV, in order to subsidize more assertive health decisions for better allocation of human and technological resources, improvement of surveillance, and use of effective therapeutic measures.Despite severity scores being commonly used in hospital settings [such as SOFA (21), STSS (22), and others], the pandemic required new tools specific for COVID-19, in addition to validation of previous clinical scores for rapid, easy, and precise triage.Despite an increasing number of studies relating to various aspects of severe COVID-19 and its ICU management, only the COVID-IRS (15) and SUM (20) scores were specifically developed for the prediction of IMV in COVID-19 patients.In the original studies of scores developed specifically for COVID-19 patients, the majority of them presented good discrimination for COVID-19 (18,20,23,24), with analyses prior to vaccination and without validation for the Brazilian population.Thus, it becomes useful to validate and recalibrate the ABC 2 -SPH, the only score developed and validated in the Brazilian population, with high accuracy in predicting hospital mortality.
In the present study, all available scores, such as CALL ( 17), COVID-IRS (18), CURB-65 (19), PREDI-CO (20), SOFA (21), STSS (22), SUM (23), and 4C Mortality (24), in addition to ABC 2 -SPH itself, performed worse in our Brazilian cohort than in their original cohorts.The differences in predictive ability may be at least partly explained by differences between the population included in the study and the original derivation cohorts (i.e., geographically distant, ethnically different, with the prevalence of distinct comorbidities, in different health systems and cultures), as already observed by other authors (26,32), and also for the fact that some of these scores assessed composite outcomes, to try overcome the limitation of having a small sample size.As mentioned earlier, the TRIPOD Guidelines (14,15) and PROBAST checklist (16) guide at least 100 events for score development.The CALL score derivation score (17), for example, had only 40 events, even using a composite outcome, whereas the COVID-IRS score (18) had 72 events only, and defined the cut-offs based on the data, which may have led to models overfitting (29), limiting their respective generalizations in other cohorts.
Patients who received IMV were predominantly older, women, and had a higher prevalence of underlying comorbidities, as previously described (33-40), of which hypertension, DM, obesity, and chronic kidney disease were the most prevalent.Regarding clinical outcomes, our invasively ventilated patients presented especially higher requirement dialysis, venous thromboembolism, and in-hospital mortality.Overall, in-hospital mortality was 64.0%, similar to those observed for invasively ventilated patients in studies from Argentina (57.7%) (41), Mexico (73.7%) (42), and China (49%) (43).
Considering the finitude of human and logistical resources, during the worst waves the ICUs were completely saturated, requiring the classification of the patients when the probability of survival under critical care treatment, in order to prioritize critical care initiation and continuation for patients who had the highest probability of benefiting from treatment, becoming an ethical necessity to reduce deaths (31).In this context, the ABC 2 -SPH score discrimination ability was worse in elderly patients, especially octogenarians.When considering only patients aged less than 80 years, and as expected, we observed a better AUROC 0.714 (0.698-0.731).
Medical predictive analytics have increased in popularity in recent years to help clinical decision making in various situations and clinical conditions.However, medical manuscripts usually focus the assessment in the AUROC only (also known as C-statistic), and it is often underreported that estimated risks may be unreliable even when the algorithms have good discrimination, especially if calibration is not adequate (44).A recent systematic review mentioned the hundreds of prediction models for COVID-19 as a typical example, most of which are deemed useless due to inappropriate derivation and assessment, with calibration being ignored in the great majority of them (45).This is of utmost importance, as poorly calibrated algorithms may be misleading and potentially harmful for clinical decision-making (44).When assessing the original ABC 2 -SPH score, there was a systematic miscalibration, with observed rates much higher than the predicted probabilities in low points (i.e., the score underestimated IMV), and observed rates significantly much lower than the predicted probabilities in high points (i.e., the score overestimated IMV).To improve prediction, we performed the recalibration of the ABC 2 -SPH score, correcting the intercept and the slope of the model to p-value of the comparison between ABC2-SPH and each score. 2 Due to the multiple comparisons, alpha was corrected using Bonferroni method, to 0.00625.*ABC2-SPH has higher discrimination ability.AUROC: area under the receiving operator characteristic curve.The main information for each score is shown in Supplementary Table S1.The bold values are to highlight that they are statistically significant. 10.3389/fmed.2023.1259055 Frontiers in Medicine 08 frontiersin.orgadapt it to patients at risk of IMV (32), with substantial improvement in overall performance and calibration.Thus, the ABC 2 -SPHr score can be used as a tool to stratify the risk of IMV in Brazilian COVID-19 patients <80 years-old into low, intermediate, high, and very high.Nevertheless, it is important to highlight that prediction models are population-specific and may produce different results in different populations (14).Therefore it is necessary to perform external validation of the ABC 2 -SPHr score for use in other populations.

Strengths and limitations of the study
Our study contributes to the literature because it is a multicenter study, with a large sample of patients from 32 Brazilian hospitals (including public, private, and philanthropic), from different regions and degrees of complexity, which validated and recalibrated the ABC 2 -SPH score for prediction of IMV in COVID-19 patients under the age of 80. Additionally, we included comparisons with existing risk stratification scores, ensuring

Next steps
Like other viruses, SARS-CoV-2 evolves over time.The majority of mutations in the SARS-CoV-2 genome have no impact on viral function, but certain variants have garnered widespread attention because of their rapid emergence within populations and evidence for transmission or clinical implications.These are considered variants of concern.The World Health Organization (WHO) has also designated labels for notable variants based on the Greek alphabet: Alpha, Beta Gamma Delta and Omicron (46).The omicron variant and its sublineage have been increasing in prevalence worldwide (47).In August 2023, the World Health Organization classified the EG.5 coronavirus strain as a "variant of interest, " although it did not seem to add public health risks relative to the other currently circulating Omicron descendent lineages (48).So, the current global epidemiology of SARS-CoV-2 is characterized by the continued spread of the Omicon variant.These findings underscore the importance of vaccination to prevent both moderate and severe COVID-19 and to reduce the circulating variant (49).Currently, in the world, around 70% of persons are vaccinated with at least one dose, of a total of 13.3 billion doses administered globally, but there is still great vaccine inequality between countries (30, 50).Therefore, the future severity of the pandemic is not yet known.
As COVID-19 is a dynamic disease, further assessments in the model are required.The outbreak of COVID-19 was accompanied by an unprecedented explosion of scientific evidence, and a living review has found almost 600 prognostic models to predict diverse outcomes in patients with confirmed COVID-19 (51).In the aforementioned systematic review on the methodology of prediction models, Binuya et al. (45) have discussed that the incessant de novo derivation of models instead of refinement of an existing one is a widely recognized issue, and a huge waste of information from previous modeling studies (and we could infer, also a waste of time and money).If a reasonable prediction model is available and produces accurate estimates, the consensus is to build upon such a model and check whether some adjustments ("model updating") may improve its fit or performance in new data, for example, with recalibration or incorporating a novel marker into the model (45).Thus, further studies should take this into account.

Conclusion
ABC 2 -SPH risk score demonstrated a poor to fair performance to predict the need for mechanical ventilation in COVID-19 hospitalized patients.However, when compared with other scores, it showed a significantly greater discriminatory capacity, than the CURB-65, STSS, and SUM.This result was potentialized after their recalibration, with p-value of the comparison between ABC2-SPH and each score. 2 Due to the multiple comparisons, alpha was corrected using Bonferroni method, to 0.0125.AUROC, area under the receiving operator characteristic curve.The main information for each score is shown in Supplementary Table S1.*It was not possible to test CALL, COVID-IRS, PREDI-CO and 4C Mortality Score, as the number of patients and events was lower than recommended by the TRIPOD checklist (14,15).The bold values are to highlight that they are statistically significant.
Calibration plot of the score recalibration. 10.3389/fmed.2023.1259055 Frontiers in Medicine frontiersin.orga prognostic score that more accurately estimates the probability of IMV in patients aged <80 years old, besides the better discrimination ability than the CURB-65, SOFA, STSS, and SUM scores.Thus ABC 2 -SPHr risk score is a rapid and easy assessment tool to assist clinicians in decision-making when initiating advanced ventilatory support, and therefore to ensure early life-saving interventions.

FIGURE 1 Flowchart
FIGURE 1 Flowchart of the study conducted in two stages: (A) The first stage, aimed to assess the ABC 2 -SPH risk score to predict invasive mechanical ventilation in COVID-19 patients and compare with other available scores; and (B) the second stage, aimed to perform ABC 2 -SPH score recalibration, as well as to compare with other scores.

FIGURE 2 (
FIGURE 2 (A) Area under the receiving operator characteristic curves (AUROC) of ABC 2 -SPH and other scores in this cohort.The main information for each score is shown in Supplementary Table S1.(B) Calibration of ABC 2 -SPH score.

TABLE 2
Predicted and observed invasive mechanical ventilation (IMV) rates observed with ABC 2 -SPH score.

TABLE 3
Discrimination ability for each score to predict invasive mechanical ventilation applied in the database of COVID-19 patients (complete case analysis) and comparison of the ABC 2 -SPH and other existing scores.

Table S1 .
(B) Calibration of ABC 2 -SPH score.The present study presents limitations that should Be addressed.Despite The fact that All hospitals referred there Was adequate supply of IMV during The study period, We cannot assure that All patients Who required IMV In fact received IMV.Therefore, The outcome for this analysis Was receipt of IMV, Not IMV requirement.That Is also Why We opted To recalibrate The score excluding The sample of octogenarians, As frequently doctors have conservative treatment for elderly and/or frail patients, which includes avoiding intubation, and this could Be observed By a worse AUROC curve In this stratum.Additionally, The scores were calculated based On data from a retrospective, observational, and non-randomized study, with data collected from medical records.Therefore, some variables were Not found uniformly, generating missing data.However, In order To reduce this impact, Our data were collected By researchers with extensive training and accompanied closely By a professional with important experience In research.Furthermore, information that depends On a more accurate clinical history, such As The description of comorbidities and details of symptoms, may Not have been obtained.

TABLE 4
Demographic data, clinical characteristics and outcomes of derivation and validation cohorts of patients <80 years-old admitted to hospital with COVID-19, from January 1, 2021, to March 31, 2022, used for score recalibration.

TABLE 5
Discrimination ability for each score to predict invasive mechanical ventilation applied in the database of COVID-19 patients <80 years-old admitted to hospital with COVID-19, from January 1, 2021, to March 31, 2022 (complete case analysis).