Machine learning to identify risk factors associated with the development of ventilated hospital-acquired pneumonia and mortality: implications for antibiotic therapy selection

Background Among patients with nosocomial bacterial pneumonia, those who decompensated to requiring mechanical ventilation (vHABP) faced the highest mortality followed by ventilator-associated pneumonia (VABP) and non-ventilated hospital-acquired pneumonia (nvHABP). The objectives of this study were to identify risk factors associated with the development and mortality of vHABP and to evaluate antibiotic management. Methods A multicenter retrospective cohort study of adult inpatients with nosocomial pneumonia during 2014–2019 was performed. Groups were stratified by vHABP, nvHABP, and VABP and compared on demographics, clinical characteristics, treatment, and outcomes. Multivariable models were generated via machine learning to identify risk factors for progression to vHABP as well as pneumonia-associated mortality for each cohort. Results 457 patients (32% nvHABP, 37% vHABP, and 31% VABP) were evaluated. The vHABP and nvHABP groups were similar in age (median age 66.4 years) with 77% having multiple comorbidities but more vHABP patients had liver disease (18.2% vs. 7.7% p = 0.005), alcohol use disorder (27% vs. 7.1%, p < 0.0001), and were hospitalized within the past 30  days (30.4% vs. 19.5%, p = 0.02). An immediate need for ventilatory support occurred in 70% of vHABP patients on the day of diagnosis. Mortality was the highest in vHABP followed by VABP and nvHABP groups (44.6% vs. 36% vs. 14.3%, p < 0.0001). Nearly all (96%) vHABP patients had positive cultures, with Gram-negative pathogens accounting for 58.8% whereby 33.0% were resistant to extended-spectrum β-lactams (ESBLs), ceftriaxone (17.5%), fluoroquinolones (20.6%), and carbapenems (12.4%). Up to half of the vHABP patients with ESBL-Enterobacterales or P. aeruginosa did not receive an effective empiric regimen; over 50% increase in mortality rate was observed among patients whom effective therapy was initiated past the day of pneumonia diagnosis. Risk factors associated with vHABP development were alcohol use disorder, APACHE II score, vasopressor therapy prior to infection, and culture positive for ESBL-Enterobacterales whereas history of hospitalization in the past 30  days, active malignancy, isolation of ceftriaxone-resistant pathogens or Pseudomonas aeruginosa, and vasopressor therapy were risk factors for vHABP-associated mortality. Conclusion Patients with vHABP experienced an acute and severe decompensation upon diagnosis. The risk factors identified in this study could provide actionable data for clinicians to identify those at risk for vHABP at the onset of pneumonia and to target antimicrobial stewardship efforts to improve treatment success.


Background
Nosocomial pneumonia (NP) is a leading hospital-acquired infection that accounts for 22% of cases and is associated with prolonged hospitalization and significant mortality (1).Nosocomial pneumonia may be grouped into three subtypes: ventilator-associated bacterial pneumonia (VABP), and ventilated (vHABP) and non-ventilated hospital-acquired bacterial pneumonia (nvHABP) (2,3).VABP accounts for roughly half of the cases of NP while the remaining half is equally divided between nvHABP and vHABP (4).Up to 40% of mechanically ventilated patients develop VABP with an all-cause mortality risk of 20%-50% (5,6).Thus, concerted efforts have focused on preventive measures to target reduction of VABP occurrence to zero (7).
Although HABP is generally considered less severe, more than 50% of patients develop serious complications including respiratory failure, septic shock, and empyema (8).Notably, mortality has been shown to be highest for HABP patients who progress to vHABP compared to VABP and nvHABP (4,6,9).A recent multicenter retrospective study using administrative data to compare the epidemiology and clinical outcome of patients with nvHABP, vHABP, and VABP found that more vHABP patients required ICU admission and vasopressor therapy, had a prolonged hospitalization, and were more likely to be discharged to hospice among survivors.In another single-center retrospective study, Motowski et al. compared patients with ventilated pneumonias (vHABP vs. VABP) and similarly found that vHABP was associated with significantly higher 30-day and in-hospital all-cause mortality and longer length of stay (10).The growing evidence surrounding vHABP-associated morbidity and mortality supports further investigation to identify risk factors associated with the development of vHABP and death in order to facilitate early recognition of at-risk individuals and to help guide antibiotic management.
Recently, machine learning (ML) has been adopted into medical research as a method of minimizing bias and improving the accuracy of predictive models.ML is a branch of artificial intelligence that applies statistical techniques to produce a trained model fitted to a given data set.Among the ML algorithms, random forests are an increasingly popular statistical method of classification and regression.Random forests are a combination of tree predictors such that each tree depends on the values of random vectors sampled independently and with the same distribution for all trees in the forest (11).Few studies have applied machine learning to predict risk of developing pneumonia, but have not explored risk factors associated with poor outcomes including disease progression and mortality (12,13).Thus, our study objectives were to identify risk factors prior to or at onset of HABP diagnosis associated with progression to vHABP and mortality and to evaluate empiric antibiotic management using a machine learning approach.

Study population and design
This was a retrospective cohort study conducted at two sites: Huntington Hospital and Los Angeles General Medical Center-University of Southern California.This study was conducted in accordance with the amended Declaration of Helsinki.The study protocol was approved by the institutional review boards (IRB) at both centers (Advarra IRB Pro00045861; University of Southern California IRB: HS-20-00663).Informed consent was waived.
Eligible patients were hospitalized adults (≥18 years) who developed NP between March 2014 and December 2019.Hospitalized patients with a secondary diagnosis of pneumonia ICD-9 and ICD-10 codes were screened for inclusion; those with a primary diagnosis of community-acquired pneumonia were excluded.Pneumonia diagnosis was confirmed with documentation of new or progressive radiographic infiltrate in addition to clinical findings suggestive of infection such as new-onset fever, purulent sputum, leukocytosis, and decline in oxygenation (14).Pregnant patients and patients with pneumonia of non-bacterial etiology were excluded.Hospitalacquired pneumonia was defined as pneumonia developing >48 h from admission.Non-ventilated HABP (nvHABP) was defined as HABP without the need for endotracheal intubation but allowing for use of non-invasive ventilation (e.g., nasal cannula, high flow nasal cannula, bi-level positive airway pressure, etc.) during the course of infection whereas ventilated HABP (vHABP) was defined as HABP subsequently requiring endotracheal intubation at any time during the course of infection (including at onset).Ventilator-associated pneumonia was defined as pneumonia developing >48 h after endotracheal intubation (2).Due to a significantly higher proportion of patients with nvHABP at one study site, patients who met inclusion

Clinical evaluation
Patients' medical records were reviewed for pertinent demographic, laboratory, and clinical information as follows: age, gender, comorbid conditions, social history, residence prior to admission, receipt of immunosuppressive therapy, hospitalization within the past 30 days or receipt of antibiotics within the past 90 days, severity of illness (Acute Physiology and Chronic Health Evaluation, APACHE II score), intensive care unit (ICU) admission, and need for and duration of vasopressor therapy and mechanical ventilation, vital signs, daily labs, culture and sensitivity results, clinical management (oxygen supplementation, antibiotic therapy), and outcomes (hospital and ICU lengths of stay and all-cause in-hospital mortality).

Study definitions and endpoints
The APACHE II score was calculated at onset of pneumonia diagnosis.Empiric therapy was defined as any antibiotic administered prior to or without knowledge of pathogen identity and/or susceptibility.Effective therapy was any antibiotic regimen containing at least one agent with documented in vitro activity against the isolated pathogen from the respiratory culture.The primary endpoints were risk of development of vHABP and in-hospital mortality.Study data were managed using REDCap, a secure web-based platform designed for data capture in research studies (15).

Data analysis
Patients were grouped by subtypes of NP (vHABP, nvHABP, and VABP).Our primary analysis was to compare those who developed vHABP vs. nvHABP on demographics, comorbidities, and clinical and microbiological features at time of pneumonia diagnosis as well as empiric treatment to identify predisposing risk factors for developing vHABP and vHABP-associated mortality.The VABP group was included for relative comparison.Descriptive analysis was performed using Mann Whitney U or Student t-test for continuous variables and chi-square or Fisher's exact test for categorical variables where appropriate.Odds ratio (OR) with 95% confidence intervals (CI) were calculated.A modified Poisson regression analysis using error variance was used to analyze time to receipt of effective therapy to identify the incremental risk for in-hospital mortality with day 0 (i.e., effective therapy started before or on the day of respiratory culture was taken) as the reference group.A supervised machine learning algorithm, the Random Forests (RF) method, was employed in this study.Breiman in 2001 defined a random forest as a classifier consisting of a collection of tree-structured classifiers {h(x, ϴ k ), k = 1, …} where the {ϴ k } are independently and identically distributed random vectors and each tree casts a unit vote for the most popular class at input x.For the k th tree, a random vector ϴ k is generated, independent of the past random vectors ϴ 1 , …, ϴ k-1 but with the same distribution (11).The strength of the individual trees in the forest and the correlation between them determines the generalization error of a forest of the tree classifier.Combining trees grown using random features can produce improved accuracy (11).Rodriguez-Galiano et al. provided a flowchart illustration of the RF method (16).The RF method performs both classification and regression prediction.It enables a more robust, accurate, and stable prediction than the Classification and Regression Trees (CART) by building multiple decision trees and merging the predictions by averaging the posterior probabilities for interval targets or voting for class targets (17).A SAS High Performance procedure, HPFOREST, was applied, to create random forest models in a high performance environment.The data was split proportionally into a training set [i.e., input data or inBag fraction (16)] and "out-of-bag" (OOB) data to measure the accuracy of the model and reduce the misclassification rate.The training set for a tree was a sample without replacement from all available observations.Averaging over trees from different training samples reduced the dependence of the predictions on any particular training sample.The OOB sample, a set of observations not used in building the current tree, was used to estimate the prediction error, evaluate variable importance, and monitor correlation (11,16).The difference between the misclassification rate for the modified and original OOB data divided by the standard error determined the importance of the variable ranked from most to least important (17).About 50 clinical factors assessed as continuous or categorical variables with the potential to impact primary or secondary outcomes were selected as the input to the random forest ensembles which included age, gender, race, APACHE-II, Charlson Comorbidity Index, alcohol use disorder, malignancy, liver disease and 14 other frequently occurring comorbid conditions, receipt of vasopressor therapy, isolation of P. aeruginosa, resistance phenotype of the respiratory pathogen, ICU admission prior to pneumonia diagnosis, and empiric antibiotic therapy.About 14-18 preselected factors in the random forest method were then included in the logistic regression forward variable selection one-by-one in the order of their importance (i.e., ranking).The area under the receiver operating characteristic (ROC) curve was estimated and compared to assess which factors were highly influential in the model prediction.Those factors were then further explored using a backwards selection logistic regression model.The interaction effects were not included.The final multivariable logistic regression models only included significant predictors for the major endpoints: risk of development of vHABP and pneumoniaassociated-mortality.All variables that had less than 5% of values missing were included as candidates in the machine learningbased models (18).All statistical tests were 2-tailed and a p-value < 0.05 was considered significant.Statistical analyses were performed using SAS software, version 9.4 (SAS Institute Inc., Cary, NC, United States).

Clinical characteristics and outcomes
Pneumonia onset from admission was a median of 6 days for both the vHABP and nvHABP groups (Table 2).Notably, the majority of patients in the vHABP group (70.3%, 104/148) had an immediate need for ventilatory support, occurring on the day of diagnosis [median 0 days (IQR: 0, 1)], suggesting an acute and severe decompensation.There was also a higher prevalence of ICU admission (52% vs. 27.8%,p < 0.0001) and vasopressor use prior to pneumonia diagnosis (48.6% vs. 4.7%, p < 0.0001) in the vHABP compared to the nvHABP group.Additionally, patients in the vHABP group had a significantly higher APACHE II score (median 25.0 vs. 12.0, p < 0.0001) on the day of pneumonia diagnosis compared to the nvHABP group.Overall, significantly more vHABP patients required ICU level of care (100% vs. 37.9%, p < 0.0001) with over half requiring vasopressor therapy (57.4% vs. 6.5%, p < 0.0001) during infection when compared to the nvHABP group (Table 2).Patients with vHABP had worse outcomes than those with nvHABP: longer post-infection ICU stay (median 10 d vs. 3 d, p < 0.0001) and overall length of stay (median 24 d vs. 13.5 d, p < 0.0001), and 3-fold higher in-hospital mortality rate (44.6% vs. 14.3%, p < 0.0001).Compared to the vHABP group, patients in the VABP group had similar severity of underlying illness (median APACHE II score 23.5).Despite the VABP group requiring a longer duration of ICU stay (12.5 d vs. 10 d) and prolonged duration of mechanical ventilation (8 d vs. 6 d), in-hospital mortality (36% vs. 44.6%)remained lower than those with vHABP.
Among those with a positive respiratory culture, the proportion of patients receiving an effective empiric regimen in the vHABP group was 80% for MRSA; however, nearly half of the patients with ESBL-Enterobacterales (39.5%) and P. aeruginosa (47.4%) did not receive effective empiric therapy.In comparison, a higher proportion of patients in the nvHABP group (87.5%) with P. aeruginosa received an effective empiric regimen.Overall, patients with vHABP were less likely to receive an effective regimen within 48 h of pneumonia diagnosis compared to those with nvHABP (67.7% vs. 78.7%,p = 0.17), though the difference was not statistically significant.Importantly, mortality risk increased by 1.55 fold (95% CI, 0.98-2.46,p = 0.06) for those who received effective empiric therapy 1-2 days after the day of diagnosis (Table 4).Despite lower rates of effective empiric regimens against MRSA (45.5%),ESBL-Enterobacterales (29.4%), and P. aeruginosa (29.2%), the overall mortality rate was lower in the VABP than vHABP group.

Discussion
This is a retrospective cohort analysis of patients with HABP differentiated into nvHABP and vHABP to determine risk factors associated with the development of vHABP and vHABP-associated mortality using a machine learning approach.The advantage of using the random forest algorithm in machine learning over traditional methods to identify predictive risk factors is that the former yields improved accuracy and precision while minimizing bias, supporting its use as a promising alternative to traditional predictive tools (19)(20)(21)(22).Although risk factors for vHABP-associated mortality and morbidity have been identified previously, our study provides unbiased confirmation for several known as well as newly identified risk factors associated with immune-disrupting chronic conditions and antimicrobial resistance.Consistent with prior published literature, our findings also confirm that vHABP is associated with significantly worse outcomes than either nvHABP or VABP.Importantly, our study provides actionable information prior to onset of pneumonia that could facilitate early recognition of those at risk for developing vHABP and potential treatment considerations to improve outcomes.
Overall, baseline characteristics were similar among study groups.One notable exception when comparing between vHABP and nvHABP groups is that a significantly greater proportion of the vHABP group had liver disease (18.2% vs. 7.7%, p = 0.005) and alcohol use disorder (27% vs. 7.1%, p < 0.0001).It is well established that patients with severe liver disease have compromised immune function thereby increasing the risk and severity of infection (23).Pneumonia is a frequent complication particularly among patients with cirrhosis (24, 25).Additionally, several studies have linked alcohol use disorder with poor outcomes among patients with community-acquired pneumonias (26)(27)(28).Both liver disease and alcohol use disorder were independently associated with poor outcomes in this study.However, only alcohol use disorder was selected by the Random Forest algorithm as it is likely the stronger predictor of mortality compared to liver disease despite significant correlation between the two factors as determined by post hoc analysis.Notably, the vHABP group experienced significantly worse outcomes compared to the nvHABP group: longer post-infection ICU stay (median 10 vs. 3d, p < 0.0001), higher utilization of vasopressors during infection (57.4% vs. 6.5%, p < 0.0001), longer length of hospital stay (24 vs. 13.5d,p < 0.0001), and higher in-hospital mortality (44.6 vs. 14.3%, p < 0.0001).Although duration of mechanical ventilation, ICU stay, and hospitalization were relatively longer in the VABP compared to the vHABP group, the latter had numerically higher mortality rate (44.6% vs. 36%, p = 0.12).We speculate that mortality among vHABP patients may be attributable in part to advanced age coupled with a lower immunological reserve for containing the infection among those with      patients with recent hospitalization are at increased risk of acquiring multi-drug resistant infections in which the probability of receiving initial ineffective therapy is high.Multiple studies have shown that delays in effective therapy negatively impacted outcomes including length of stay and survival among patients with multi-drug resistant Enterobacterales and P. aeruginosa (30)(31)(32).More vHABP patients were hospitalized in the 30 days before admission compared to nvHABP patients (30.4% vs. 19.5%,p = 0.02).Accordingly, culture positivity was nearly 2-fold higher in the vHABP compared to the nvHABP group with a numerically higher prevalence of ceftriaxone-, carbapenem-, and ESBL-resistant phenotypes.To our knowledge, this is the first study comparing resistance phenotypes across the three different classifications of nosocomial pneumonia.The major pathogens of concern among culture-positive patients with vHABP were P. aeruginosa, ESBL-Enterobacterales, and S. aureus.As confirmed by our machine learning-derived multivariable model, isolation of an ESBL-producing organism was a significant predictor for vHABP development (OR 3.35, 95% CI: 1.37 to 8.2; p = 0.008) while isolation of P. aeruginosa (OR 3.08, 95% CI: 0.97 to 9.72; p = 0.06) and ceftriaxone resistance (OR 3.24, 95% CI: 1.02 to 10.26; p = 0.04) was associated with vHABP-associated mortality.
With respect to empiric therapy, more patients with MRSA isolation received an effective regimen (vHABP: 80%, nvHABP: 60%, and VABP: 45.5%) compared to those with isolation of ESBL-Enterobacterales or P. aeruginosa.It is notable that patients with nvHABP receiving empiric vancomycin therapy had nearly 3-fold higher risk of mortality (OR 2.94, 95% CI: 1.06 to 8.17; p = 0.04) which could potentially serve as a surrogate marker for a subpopulation with more complex underlying disease in whom broad antimicrobial coverage was initiated.For patients with vHABP involving ESBL-Enterobacterales, nearly 40% did not receive an effective empiric regimen.In addition, despite the high rate of empiric antipseudomonal coverage in all 3 groups, nearly half of the patients (47.4%) with vHABP involving P. aeruginosa did not receive an effective empiric regimen.Considering that 70% of our vHABP group experienced an acute rapid respiratory decompensation requiring ventilatory support within 24 h of pneumonia diagnosis, prompt initiation of an effective empiric regimen is of paramount importance.As expected, delays in receipt of effective therapy significantly increased the risk of mortality.For patients in the vHABP group, we observed over 50% increase in mortality rate when effective therapy was not initiated on or before the day of pneumonia diagnosis.Given the global concern of rising multidrug resistance, our findings underscore the need to provide empiric coverage that encompasses ESBL-producing organisms and P. aeruginosa in patients at risk for developing vHABP considering the high prevalence of recent healthcare exposure in this subpopulation.
Our study had several limitations.First, our cohort may be subject to selection bias.Patients were initially screened based on ICD-9 and ICD-10 codes.Although all related codes were included in the screening criteria, there may be patients with nosocomial pneumonia that were missed in the initial screening.Second, this was a retrospective study conducted over a 5-year period at 2 different institutions.We acknowledge that the standard of care may have changed over the study period and that practice standards may differ between the two study sites.Notably, cefepime and piperacillintazobactam are differentially preferred as empiric agents of choice at the two institutions; however, both agents empirically cover P. aeruginosa (risk factor for vHABP-associated mortality) and the choice of agent was not identified as a significant risk factor for mortality on the multivariable model.Rather, resistance against either agent such as with ESBL-producing organisms was a significant predictor for development of vHABP which may contribute to the negative consequences from the delayed receipt of effective therapy.As the aim of this study was to identify predisposing or early risk factors that would distinguish at-risk patients for developing vHABP, we did not report on definitive therapy since by the time culture and sensitivities were reported, patients had already progressed to needing ventilatory support in the vHABP group.Lastly, we acknowledge that the current study represents an initial derivation study and that our models have not been externally validated which is necessary to confirm our results.
Our study further confirms the increase in morbidity and mortality associated with vHABP from previous studies.While the nvHABP and vHABP groups differed in various aspects of patient characteristics, clinical presentation, and microbiology based on univariate analysis, only a handful of variables were identified as

Conclusion
Taken together, alcohol use disorder, APACHE II score at pneumonia diagnosis, isolation of ESBL-producing pathogens, and need for vasopressor therapy prior to infection were risk factors associated with the development of vHABP.Among those who developed vHABP, prior hospitalization within the past 30 days, active malignancy, isolation of P. aeruginosa or ceftriaxone-resistant pathogens, and vasopressor therapy during infection increased the risk of death after controlling for age.As such, patients who have any of these risk factors should be monitored closely and have a lower threshold for escalation of therapy.Considering that isolation of a ceftriaxone-resistant organism or P. aeruginosa carries a risk for vHABP or in-hospital mortality respectively, it may be prudent to initiate empiric therapy against those organisms in patients who developed nosocomial pneumonia and require vasopressors, were recently hospitalized, or had a history of malignancy.Although these factors were identified by a machine learning derived model, external validation is needed to confirm the reliability of our results in realworld applications.

Risk for multidrug resistance, n (%)
a p-values are for the comparison of vHABP and nvHABP groups only.

TABLE 2
Clinical characteristics and outcome.

TABLE 4
Relative risk of in-hospital mortality by time of delay to effective therapy for vHABP patients with positive cultures (n = 99).
a Row percentage in each category of time to receipt of effective therapy.b p-value associated with relative risk.

TABLE 3
Microbiology and empiric therapy.

Time to receipt of effective regimen
Extended spectrum ß-lactams include ceftriaxone, cefepime, and/or piperacillin-tazobactam. Regimens containing at least one agent with documented in vitro activity against the organism(s) isolated for patients with culture-positive pneumonia.
a p-values are for the comparison of vHABP and nvHABP groups only.b Ceftriaxone-resistance only includes Enterobacterales in which susceptibility was performed.c d Fluoroquinolones include ciprofloxacin and levofloxacin.f

TABLE 6
Predictors of in-hospital mortality from multivariable analyses.
ESBL, Extended-spectrum B-lactamase; APACHE II, Acute Physiology and Chronic Health Evaluation II.

TABLE 5
Predictors of vHABP development from multivariable analysis.