Development and validation of a prediction model for mechanical ventilation based on comorbidities in hospitalized patients with COVID-19

Background Timely recognition of respiratory failure and the need for mechanical ventilation is crucial in managing patients with coronavirus disease 2019 (COVID-19) and reducing hospital mortality rate. A risk stratification tool could assist to avoid clinical deterioration of patients with COVID-19 and optimize allocation of scarce resources. Therefore, we aimed to develop a prediction model for early identification of patients with COVID-19 who may require mechanical ventilation. Methods We included patients with COVID-19 hospitalized in United States. Demographic and clinical data were extracted from the records of the Healthcare Cost and Utilization Project State Inpatient Database in 2020. Model construction involved the use of the least absolute shrinkage and selection operator and multivariable logistic regression. The model’s performance was evaluated based on discrimination, calibration, and clinical utility. Results The training set comprised 73,957 patients (5,971 requiring mechanical ventilation), whereas the validation set included 10,428 (887 requiring mechanical ventilation). The prediction model incorporating age, sex, and 11 other comorbidities (deficiency anemias, congestive heart failure, coagulopathy, dementia, diabetes with chronic complications, complicated hypertension, neurological disorders unaffecting movement, obesity, pulmonary circulation disease, severe renal failure, and weight loss) demonstrated moderate discrimination (area under the curve, 0.715; 95% confidence interval, 0.709–0.722), good calibration (Brier score = 0.070, slope = 1, intercept = 0) and a clinical net benefit with a threshold probability ranged from 2 to 34% in the training set. Similar model’s performances were observed in the validation set. Conclusion A robust prognostic model utilizing readily available predictors at hospital admission was developed for the early identification of patients with COVID-19 who may require mechanical ventilation. Application of this model could support clinical decision-making to optimize patient management and resource allocation.


Introduction
Coronavirus disease 2019 , caused by the novel severe acute respiratory syndrome coronavirus 2, is associated with a significantly high mortality rate in patients who progress to respiratory failure (1). Approximately 14-33% patients with COVID-19 progress to respiratory failure and require mechanical ventilation (2)(3)(4). The escalating COVID-19 cases poses enormous challenges for healthcare systems and strain the availability of mechanical ventilation. Delayed recognition of respiratory failure and requirement of mechanical ventilation can increase the risk of hospital mortality (5). Therefore, the development of risk stratification tool that enables early identification of patients with COVID-19 who may require mechanical ventilation is essential to optimize resource allocation and prevent clinical deterioration.
Although various risk stratification models have been developed to identify patients with high risk of severe outcomes (6)(7)(8)(9), most of these models are at high risk of bias (10). Furthermore, these models have major limitation, such as inadequate sample sizes and inappropriate model evaluation, which could lead to model overfitting and optimistic model performance (11). Moreover, many prediction models have incorporated abnormal imaging manifestations and some certain biochemical results as predictor variables due to their significant association with mechanical ventilation (3,12). However, these parameters were frequently unavailable at the time of hospital admission, which consequently impacts the clinical utility of the model for early identification of high-risk patients. Hence, it is imperative to develop a prediction model that addresses these concerns by employing an adequate sample size, appropriate evaluation techniques, and incorporating predictor variables that are routinely recorded upon hospital admission.
Previous studies have confirmed the association between demographic characteristics, comorbidities (such as diabetes, renal disease, and neurologic disorders), and the necessity for mechanical ventilation in patients with . Risk stratification models based on variables that are readily available at hospital admission hold the potential to serve as invaluable tools for facilitating clinical triage, judicious allocation of limited resources, and reduce hospital mortality, particularly for those with rapid progression of critical illness.
The primary objective of this study was to develop and validate a prediction model for patients with COVID-19, aimed at accurately identifying those individuals who would ultimately require mechanical ventilation, using demographic characteristics and comorbidity variables as key predictors.

Study design and participants
This retrospective study included patients admitted with COVID-19 utilizing the Healthcare Cost and Utilization Project (HCUP) State Inpatients Database (SID) of United States (US) in 2020, which contains the universe of the State's hospital inpatient discharge records. All the data users adhered to a Data Use Agreement, and the need for informed consent was waived due to de-identification of individual information. The Ethics Committee of the Naval Medical University approved this study (No. 2021LL024). Model development, validation and reporting were conducted in adherence with the guidelines of the Transparent Reporting of a Multivariable Prediction Model for Individual Prediction or Diagnosis (15).
Patients hospitalized with an admitting diagnosis of COVID-19 were included in this study. COVID-19 hospitalization cases were identified based on the International Classification of Disease, 10th revision, Clinical Modification (ICD-10-CM) code U071 (16). Patients were excluded if they were aged <18 years or had a length of stay (LOS) < 2 days. Records with missing values were excluded, as only six missing data were observed in 94,631 patients.

Outcomes
The primary outcome was the need for mechanical ventilation support, which was identified based on the ICD-10 Procedure Coding System (PCS) codes 5A1935Z, 5A1945Z, and 5A1955Z (17).

Predictor variables
Age, sex, and Elixhauser Comorbidity Index (ECI) were selected as potential predictor variables due to their significant association with clinical outcomes of patients with COVID-19, as established in previous studies (3,14). The ECI, encompassing 38 binary comorbidity variables, has been demonstrated to significantly impact mortality rates and resource allocation within the hospital setting (18). For the purpose of analysis, age was categorized into four distinct group: < 60, 60-69, 70-79, or ≥ 80 years, to simplify calculation and interpretation.

Model development
Patients included in the HCUP SID dataset from Florida in 2020 were allocated to the training set and used for developing the model. Adhering to the principle of at least 10 events per candidate predictor parameter, a total of 5,971 outcome events in the training set was sufficient for developing robust models (19).
To address potential issues of overfitting and collinearity among variables, feature selection was performed using the least absolute shrinkage and selection operator (LASSO) technique, incorporating a 10-fold cross-validation approach (20). The selection of the optimal lambda value for the LASSO regression, which was used to fit the prediction model, followed the one standard error rule. Predictor variables identified through LASSO regression were further evaluated using multivariable logistic regression employing the Enter method. A nomogram for predicting the need of mechanical ventilation support in patients with COVID-19 was constructed based on the results of multivariable logistic regression.
The area under a receiver operating characteristic (ROC) curve was used to assess the discrimination of the model. The optimal cut-off point was determined by identifying the threshold that maximized the Youden index. The agreement between the predicted and observed applications of mechanical ventilation was assessed using a calibration curve. Additionally, decision curve analysis (DCA) was performed to compare the clinical utility of the nomogram and the default strategies of "treat all" or "treat none" by calculating the net benefits at different threshold probabilities.

Model validation
To validate the prediction model, patients included in the HCUP SID of Kentucky in 2020 were allocated to the validation set. The discrimination, calibration, and clinical utility of the prediction model were evaluated by ROC analysis, the calibration curve, and DCA, respectively.

Sensitivity analysis
Sensitivity analyses were conducted to evaluate the discriminatory performance of the prediction model under different scenarios. If dataset included patients aged <18 years or those with an LOS < 2 days, a sensitivity analysis was conducted using the complete data. Furthermore, considering the existing evidence suggesting variability in the risk of mechanical ventilation among patients with COVID-19 across different ethnicities (12), additional sensitivity analysis was performed to examine model's performance within various ethnic groups.

Statistical analysis
Continuous variables were presented as either mean (standard deviation) or median (interquartile range, IQR), whereas categorical variables were expressed as percentages. The Kruskal-Wallis test, Chi-square test, or Fisher's exact test were used to compare the demographic and clinical characteristics of patients who required mechanical ventilation and those who did not, as appropriate. Multivariable logistic regression analyses were conducted, and the results were reported as coefficients and odds ratios (OR) with corresponding 95% confidence intervals (CI). Results were considered statistically significant for p < 0.05. R software (version 4.3.0) was used to perform all the statistical analyses.

Baseline characteristics
A total of 94,631 patients who were hospitalized with COVID-19 underwent screening, resulting in the inclusion of 73,957 patients in the training and 10,428 patients in the validation set ( Figure 1). The median age of the patients was 67 years (IQR, 54-78) and 47.27% were female patients. Among the patients, 44.02% belonged to the white ethnic group, whereas 54.74% belonged to non-white ethnic groups. The median LOS was 6 days (IQR, 4-11), and 39.45% patients had more than three comorbidities. Additionally, 8.13% of the patients received mechanical ventilation. The most prevalent comorbidities included uncomplicated hypertension (44.27%), obesity (27.83%), and diabetes with chronic complications (26.27%). In comparison to patients without mechanical ventilation, those receiving mechanical ventilation were more likely to be older, male, and have higher burden of comorbidities. Details regarding the baseline characteristics of the patients in the training and validation sets are presented in Table 1. Flow chart of study participants in the training and validation sets.
Frontiers in Public Health 04 frontiersin.org
In the training set, the nomogram exhibited a discriminatory performance for distinguishing patients who required mechanical ventilation from those who did not, with an area under the curve (AUC) of 0.715 (95% CI, 0.709-0.722). The cut-off value of 0.071 provided maximal discrimination, with a specificity of 0.647 and a sensitivity of 0.678 ( Figure 4A). Furthermore, the calibration curve plotting the actual probability against the predicted probability demonstrated good calibration (Brier score = 0.070, slope = 1, intercept = 0) ( Figure 5A). The DCA demonstrated that the nomogram had a superior clinical net benefit with a threshold probability range of 2-34%, when compared to the strategies of "treat all" or "treat none" (Figure 6A).

Validation of the nomogram
The validation set consisted 10,428 patients of whom 887 patients required mechanical ventilation. In this set, the nomogram displayed comparable discrimination ability with an AUC 0.722 (95% CI, 0.704-0.739) ( Figure 4B). Using the cut-off value of 0.071 identified in the training set, the specificity and sensitivity in the validation set were 0.656 and 0.684, respectively ( Figure 4B). The calibration curve also demonstrated good agreement in the validation set (Brier score = 0.073, slope = 1.022, intercept = 0.073) ( Figure 5B). The DCA illustrated that the clinical net benefit of the nomogram was higher than default strategy of "treat all" or "treat none, " with a threshold probability range of 3-42% ( Figure 6B).

Discussion
In this study, we developed and evaluated a risk stratification model for predicting the need for mechanical ventilation in a large cohort including 84,025 patients hospitalized with COVID-19. The present model incorporates age, sex, and 11 other Compared with the present prediction model constructed using a hybrid method combining LASSO regression and multivariable logistic regression, several existing models using machine learning  Nomogram for predicting mechanical ventilation requirement in patients with COVID-19.
Frontiers in Public Health 08 frontiersin.org techniques have shown a moderate to good discrimination ability (AUC, 0.65-0.94) (6,(21)(22)(23)(24). However, these existing models were commonly developed by cohorts with small sample size or the patients already admitted to the intensive care units, leading to an optimistic estimate of model performance or limiting their application in generally hospitalized patients. Our present prediction model was more interpretable and easier bedside to use, without need for an application or a website that hosts the calculator. Age has consistently been identified as a strong predictor of adverse outcomes in patients with COVID-19 (25). Our study observed an increasing trend in risk of mechanical ventilation with age was observed in the current study, except for those aged ≥80 years. A study conducted in Japan found that patients aged ≥75 years had a lower rate of requirement for mechanical ventilation support compared with those aged 65-74 years (26). Similarly, another study using data from Korea also suggested that patients aged ≥80 years were less likely to receive mechanical ventilation (27). Considering the higher proportions of do-not-intubate (DNI) orders in older patients with COVID-19 (8), these results potentially reflect a clinical decision made in advance, rather than a lower risk of severe respiratory failure. Other factors such as medical resource availability, the potential harm and benefits of mechanical ventilation, and expected prognosis also contributed to the clinical decision-making for older patients (28, 29).
Comorbidity play a crucial role in predicting the prognosis of patients with COVID-19 (13). The predictive effect of comorbidities was usually presented in two forms in different risk stratification models: individual comorbidity unequally weighted and a count of comorbidities equally weighted (30,31). In our cohort, we observed that neurological disorders unrelated to movement and weight loss exhibited the greatest ORs, which were significantly higher than those of the other comorbidities. In contrast with other comorbidities, dementia presented an oppositely predictive effect on mechanical ventilation (OR, 0.51; 95% CI, 0.46-0.56). Therefore, in our final prediction, we assigned weighted scores to individual FIGURE 4 Discrimination of the nomogram for predicting mechanical ventilation requirement in patients with COVID-19. Receiver operator characteristic curves of the nomogram in the training (A) and validation sets (B). Frontiers in Public Health 09 frontiersin.org comorbidity, rather than using an unweighted count of comorbidities. The association between dementia and decreased risk of mechanical ventilation may be explained by the fact that patients with dementia are more likely to have an advance care planning (ACP) or do-not-resuscitate (DNR) order, leading to a lower treatment intensity (32). After risk adjustment of ACP, dementia showed no significant effect on the likelihood of receiving mechanical ventilation (33). The timely prediction of adverse outcomes of patients is of paramount important for effective allocation of the healthcare resources and prevention of clinical deterioration. However, prediction models which exhibit different performance in various subpopulations might potentially introduce unfairness in clinical decision-making and exaggerate health inequity (34). Underdiagnosis of mechanical ventilation requirement can result in delayed medical intervention, while overdiagnosis can lead to inappropriate aggressive treatment (35). The present model demonstrated similar performance between white ethnic group and non-white ethnic groups in terms of specificity and sensitivity, confirming that its application would not introduce unfairness in clinical decision-making.
One of the major strengths of our study was using a large, representative dataset to develop a model with a sufficient sample size, thereby reducing the risk of bias. Additionally, we also implemented LASSO regression for feature selection, a robust method that effectively mitigates multicollinearity within the model. Moreover, we performed sensitivity analyses to evaluate model's discriminatory performance in different subpopulations and confirm its robustness.
However, this study has some limitations that should be acknowledged. First, the HCUP SID database did not provide details on DNR or DNI orders of patients, particularly those with greater age or dementia. This limitation might have influenced the analysis of truly requirement for mechanical ventilation in our study. Patient treatment preferences should be considered in the future studies to improve model performance. Second, the HCUP SID database did not record imaging and laboratory results, which were common predictor variables in other prediction models (8,9). Consequently, it was impossible to compare their performance with our model in the present datasets. Third, all the patients included in this study were from US in 2020, which might limit the generalizability of the present model to a broader population. Moreover, the application of emerging vaccines and anti-viral agents, as well as the emergence of new COVID-19 variants, could exert influence on the risk of adverse outcomes (36)(37)(38). Hence, future validations using data from patients with COVID-19 from different pandemic periods and regions should be conducted to confirm the stability and generalizability of this prediction model.

Conclusion
This study has presented a robust prediction model incorporating age, sex, and a set of comorbidities to assess the risk of receiving mechanical ventilation in hospitalized patients with COVID-19. Good performance of this risk stratification model was observed in discrimination, calibration, and clinical utility. The application of this model, incorporating predictor variables readily available at hospital admission, can facilitate early identification of the patients with a high-risk for mechanical ventilation, and assist front-line clinicians to optimize patient management and resource allocation during periods with a surge in infections and a limited supply of mechanical ventilators.

Data availability statement
Publicly available datasets were analyzed in this study. This data can be found at: www.hcup-us.ahrq.gov.

Ethics statement
The studies involving human participants were reviewed and approved by the Ethics Committee of the Naval Medical University. Written informed consent for participation was not required for this study in accordance with the national legislation and the institutional requirements. Author contributions YZ, Y-JZ, and D-JZ contributed to the study design, data acquisition, statistical analysis, and manuscript preparation. B-YY and T-TL contributed to the study design and model development. L-YW contributed to the data acquisition and manuscript preparation. L-LZ contributed to the study conception, design, data interpretation, manuscript editing, and funding acquisition. All authors the read and approved the final manuscript.

Funding
This work was supported by a grant from the National Science Foundation of China (no. 72174204) and Military Key Disciplines Construction Project (no. 03).