Identifying the risk factors of ICU-acquired fungal infections: clinical evidence from using machine learning

Background Fungal infections are associated with high morbidity and mortality in the intensive care unit (ICU), but their diagnosis is difficult. In this study, machine learning was applied to design and define the predictive model of ICU-acquired fungi (ICU-AF) in the early stage of fungal infections using Random Forest. Objectives This study aimed to provide evidence for the early warning and management of fungal infections. Methods We analyzed the data of patients with culture-positive fungi during their admission to seven ICUs of the First Affiliated Hospital of Chongqing Medical University from January 1, 2015, to December 31, 2019. Patients whose first culture was positive for fungi longer than 48 h after ICU admission were included in the ICU-AF cohort. A predictive model of ICU-AF was obtained using the Least Absolute Shrinkage and Selection Operator and machine learning, and the relationship between the features within the model and the disease severity and mortality of patients was analyzed. Finally, the relationships between the ICU-AF model, antifungal therapy and empirical antifungal therapy were analyzed. Results A total of 1,434 cases were included finally. We used lasso dimensionality reduction for all features and selected six features with importance ≥0.05 in the optimal model, namely, times of arterial catheter, enteral nutrition, corticosteroids, broadspectrum antibiotics, urinary catheter, and invasive mechanical ventilation. The area under the curve of the model for predicting ICU-AF was 0.981 in the test set, with a sensitivity of 0.960 and specificity of 0.990. The times of arterial catheter (p = 0.011, OR = 1.057, 95% CI = 1.053–1.104) and invasive mechanical ventilation (p = 0.007, OR = 1.056, 95%CI = 1.015–1.098) were independent risk factors for antifungal therapy in ICU-AF. The times of arterial catheter (p = 0.004, OR = 1.098, 95%CI = 0.855–0.970) were an independent risk factor for empirical antifungal therapy. Conclusion The most important risk factors for ICU-AF are the six time-related features of clinical parameters (arterial catheter, enteral nutrition, corticosteroids, broadspectrum antibiotics, urinary catheter, and invasive mechanical ventilation), which provide early warning for the occurrence of fungal infection. Furthermore, this model can help ICU physicians to assess whether empiric antifungal therapy should be administered to ICU patients who are susceptible to fungal infections.


Introduction
Infections are been a key medical issue in in intensive care units (ICUs).In a recent survey of a worldwide sample of ICU patients, the prevalence of suspected or proven infection was 8,135 out of 15,202 (54%), and the in-hospital mortality rate was 2,404 out of 7,936 (30%) (1).Fungi are opportunistic pathogens that normally colonize the skin and mucous membranes of ICU patients (2).The entry of fungi into the body results in fungal infection when the body's defense barrier or immune system is disrupted (3,4).Although this decade the prevalence of fungal infection has decreased from 963 out of 4,947 (19%) to 864 out of 5,259 (16%) in the ICU, it is still the third most common pathogen in ICU (1,5).A study reported that invasive fungal infections have a mortality rate of more than 30% in critically ill patients (6).The mortality rate after Candida infection is more than 40% (7,8).Furthermore, the mortality rate attributable to invasive aspergillosis >42% (9).Fungal infections occur at different sites with varying rates.The mortality rate of patients with candidemia was 28%, which was higher than that of patients with abdominal invasive candidiasis (16%) and non-abdominal sterile sites (10%) (10).Therefore, it is important to focus on the early characteristics of fungal infections to reduce the infection and mortality rates in the ICU.
A multicenter study involving global ICU infections found that 1706 out of 8,135 (16%) infections were ICU-acquired (1), which are summed up in hospital-acquired infections (HAPs) (11).The mortality rate of ICU-acquired infections (461 out of 1706, 27%) was higher than that of community-acquired infections (697 out of 3,474, 20%) and hospital-acquired infections (661 out of 2,724, 24.9%) (1).Among the 848 (30%) cases of fungal infections, 255 were "ICU-acquired fungal infections (ICU-AFIs), " which are attributed to the special pathophysiology of critically ill patients during ICU stay (1,8).Mainstream diagnostic methods are classified as proven, probable, and possible (12).However, diagnosing fungal infection is difficult.The false-negative rate of ICU-acquired candidemia, which is a conventional fungal infection in the ICU, can reach 60% (13).It is puzzling that the basis for the initial diagnosis of ICU-AFI limits early identification because the fungal samples belong to the ICU (255 out of 848, 30%) or other medical units (300 out of 848, 35%) (1).It is very difficult for clinical doctors to accurately confirm and treat ICU-AFI.Therefore, distinguishing between ICU-acquired fungi (ICU-AF) and non-ICU-acquired fungi (non-ICU-AF) is beneficial for the early management of ICU-AFI.
In the real world, studies on the same target may yield different results owing to multiple confounding factors.A recent study by Poissy and Keighley on the risk factors of candidemia in the ICU produced conflicting conclusions regarding urinary catheters and liver disease (14,15).This study is a retrospective clinical cohort study that used machine learning (ML) to identify the origin of ICU-AFIs and created a scoring chart to predict ICU-AF risk models.

Study design
This study was approved by the Institutional Ethics Committee of the First Affiliated Hospital of Chongqing Medical University (reference number: 2021-366).The ethics committee waived the requirement for informed consent because of the retrospective nature of this study.Patient data were sourced from medical record systems and analyzed anonymously to protect patient privacy.
We included a cohort of patients who had culture-positive fungi during their admission to seven ICUs (GICU, general ICU; SICU, surgical ICU; RICU, respiratory ICU; NICU, neurology ICU; NSICU, neurosurgery ICU; CSICU, cardiothoracic surgery ICU; CCU, cardiovascular ICU) at the First Affiliated Hospital of Chongqing Medical University from January 1, 2015, to December 31, 2019.Culture-positive fungi refer to specimens obtained from ICU patients that were cultured positive for fungi by laboratory physicians in the microbiology room, and an official report was issued.Subsequently, all patient data, including basic information (age, gender and comorbidities), characteristics of fungi (microbiology and time to positivity of ICU), laboratory results (all results shall be obtained within 24 h after the fungal culture is positive), and clinical data (days in the ICU, department, Acute Physiology and Chronic Health Evaluation (APACHE) II Score, diagnosis on ICU admission, and clinical characteristics), were extracted from our internal electronic medical records (Table 1).The 28-day mortality rates after ICU admission were recorded.Data were collected by three investigators and were checked by two other investigators to avoid bias.Notably, these features were chosen on the basis of their availability in all patients rather than on any a priori assumptions about their ability to predict fungal acquisition, although the goal of our prediction model was to select the most influential factors in the collected data for the prediction of ICU-AF.

Definition
According to guidelines, infection after 48 h of hospitalization is defined as HAP (11).Therefore, we included patients whose first culture was positive for fungi longer than 48 h after ICU admission in the ICU-AF cohort and less than 48 h in the non-ICU-AF cohort.The cohort process and exclusion criteria are shown in Figure 1.All antifungal treatment decisions were jointly made by two or more deputy chief physicians with >15 years of clinical experience in critical care medicine.Among these, empirical antifungal therapy prior to fungal culture is based on guidelines (6).

Machine learning
ML methods are computer algorithms that automatically recognize complex patterns on the basis of empirical data.The goal is to enable algorithms to learn from past or present data and to use this knowledge to make predictions or decisions regarding unknown future events (16).In the current study, we used the random forest (RF) ML algorithm.It is a "tree-based" algorithm in which multiple decision trees are constructed using random classifications of independent features that are used to predict outcome labels for random subsets of samples (17).On the one hand, the RF technique is a regression tree technique that uses bootstrap aggregation and randomization of predictor variables to achieve a high degree of predictive accuracy and is often used in medical field analysis to construct classification prediction models (18-20).On the other hand, RF may be more suitable for feature selection during classification tasks in bioinformatics and related sciences, where it has a relatively low tendency to overfit and produces more robust results (21).

Data set division
We randomly assigned 1,434 cases to the sample, with 50% of the cases used as the training set and the rest as the test set.We also ensured that there was no gender or age bias between the training set and testing set.

Feature extraction
For the training set, we first used the Least Absolute Shrinkage and Selection Operator (LASSO) to reduce the dimension of features according to whether the patient is ICU-AF.We performed feature reduction using LASSO on the training set.LASSO performs feature selection during model construction by penalizing the respective regression coefficients.As this penalty increases, more regression coefficients shrink to zero, thus resulting in a more regularized model (22).In this process, 49 significant features with nonzero coefficients were obtained.We then used them in the RF prediction model.By using a ten-fold cross-validation analysis, we selected the best model parameter on the basis of the accuracy of each fold of the model.At the same time, we ranked the features in this model by setting a threshold of 0.05 to select the features in reference to previous articles (23,24).These features were retained, and the randomized forest model was trained to predict patient ICU-AF by using ten-fold crossvalidation.Finally, the model was tested using the test set.Both downscaling and ten-fold cross-validation were used to prevent overfitting.Overfitting can occur when excessive features affect the predictive performance of a model.However, the use of nested k-fold cross-validation allows us to perform model training independently of hyperparameters optimization, which prevents overfitting or incorrect generalization estimates (25).The R language was used to

Model performance
In the ten-fold cross-validation of the model, we trained different model parameters and selected the model parameters with the best accuracy in one fold for application to the test set.The ability of the model to discriminate between acquired fungi was determined using the area under the curve (AUC), and the stability of the model was determined on the basis of sensitivity and specificity.From our learning models, we chose the model with the best discrimination ability.

Statistical analysis
All statistical analyses were performed using Stata 24.0 software.To divide the training and test sets, we used analysis of variance (ANOVA) to analyze whether there was a difference in the age distribution between the training and test sets, and the chi-square test was used to analyze whether there was a difference in the gender distribution between them.The main specification of ML is that the models constructed from selected features perform well for predicting patient outcomes, AUC, sensitivity, specificity, and accuracy and are only used to determine the performance of the models (26).Therefore, many previous studies have used ML and logic methods (27,28).In our research, factors associated with antifungal therapy and empirical antifungal therapy for acquired fungi were analyzed using univariate and multivariate conditional logistic regression models for all features of the ML model.Its odds ratio (OR) and 95% confidence interval (CI), p < 0.05 was considered significant.

Cohort characteristics
For the submission of the manuscript, we enrolled 2,147 cases.A total of 1,434 cases with complete data were obtained after exclusion and screening (Figure 1, step 1).The cases were randomly and equally divided into the training (N = 717) and test (n = 717) sets.The distribution of outcome labels for patients in the training and test sets showed no significant differences (p = 0.37, chi-square test, not shown).The features of the two data sets are shown in Table 1; age (ANOVA, p = 0.60, not shown) and gender (chi-square test, p = 0.15, not shown) had no statistical difference, and the other features are shown in Table 1.On the basis of whether the fungi were ICU acquired, LASSO was performed to reduce the dimension of features.Thereafter, by using ten-fold cross-validation, the average accuracy of the random forest model was 0.907 ± 0.042, among which the third-fold accuracy we applied was the highest, which was 0.972 (Figure 2).We took the third-fold model parameter as our optimal model parameter.We selected six features with importance ≥0.05 in the optimal mode, namely, times of arterial catheter, times of enteral nutrition, times of corticosteroids, times of broad-spectrum antibiotics, times of urinary catheter and times of invasive mechanical ventilation (Figure 3A).

The role of each feature
By using these features for ML analysis and testing on an independent test set, the results showed that the AUC for predicting ICU-AF was 0.981 in the test set, with a sensitivity of 0.960 and specificity of 0.990 (Figure 3B).Disease severity in ICU patients was represented by the APACHE II Score, which was analyzed separately with the continuous time of these six features.Only the times of invasive mechanical ventilation showed a significant linear correlation with the APACHE II Score (p = 0.031) (Figure 3C).The duration time of these features showed no significant differences in the 28-day mortality (Figure 3D).

Risk factors associated with antifungal therapy and empirical antifungal therapy
Considering the univariate and multivariate conditional logistic regression analyses of antifungal therapy in ICU-AF, the results showed that among these six features, times of arterial catheter (p = 0.011, OR = 1.057, 95%CI = 1.053-1.104)and times of invasive mechanical ventilation (p = 0.007, OR = 1.056, 95%CI = 1.015-1.098)were independent risk factors for antifungal therapy in ICU-AF (Table 2).In the sample on antifungal therapy, times of arterial catheter (p = 0.004, OR = 1.098, 95%CI = 0.855-0.970)was an independent risk factor for empirical antifungal therapy (Table 3).

Discussion
This retrospective clinical cohort study spanned 5 years, included 1,434 cases with complete data, and identified 6 risk factors for ICU-AF using ML.Fungal infection, which is accompanied by difficult treatment and poor prognosis, is an important component of ICU infections (1,29).He et al. used ML to establish predictive models for secondary candidemia in patients with systemic inflammatory response syndrome (SIRS) patients in the ICU.These models have a potential guiding role in the antifungal treatment of critically ill patients with SIRS (30).Researchers often focus on the pathogenic state and non-pathogenic state of fungi, which are known as "infection" and "colonization, " respectively (31, 32).Once a fungal infection emerges in critically ill patients in the ICU, colonization poses a high risk to individuals with immune disorders.Popular researches has considered fungal colonization, including multi-site colonization and the colonization of special strains, as a risk factors for fungal infection (33,34).The risk of fungal infection increased significantly after fungal colonization in ICU patients.One study found that 93 out of 137 (68%) patients with candidemia had Candida colonization (30).The preconception was that fungal infection is opportunistic.However, the sensitivity of ICU blood cultures for invasive candidiasis (including intra-abdominal candidiasis) is approximately 40% (13).Up to 70% of patients with candidemia do not receive early empiric antifungal therapy early on (35).Generally, doctors in the ICU often value patients who already have the "fungi" label, but the preparation for a new onset one is insufficient.This could increase the risk of patients in the ICU.A study showed that a 12-h delay in starting antifungal therapy was associated with a 2.09-fold increase in mortality (36).Discovering the types of patients in the ICU who are at high risk for acquired fungal infections is an important part of critical illness warnings.
This study advances the warning line of fungal infection before colonization, which is called ICU-AF and is defined as fungi cultured after 48 h in the ICU.LASSO dimensionality reduction and ML methods were used to analyze patients admitted to the ICU over the past 5 years.Compared with non-acquired fungi, six features including times of arterial catheter, times of enteral nutrition, times of corticosteroids, times of broad-spectrum antibiotics, times of urinary catheter, and times of invasive mechanical ventilation, showed high significance in ICU-AF.These features are considered high-risk factors for fungal infection in the ICU (7,(37)(38)(39)(40)(41)(42).The current study used ML to prove that ICU-AF has a higher risk of occurrence when ICU patients exhibit the above six features.However, the utility of risk factors in ICU-AF patients depends on differentiating between the dimensions of time, frequency, and intensity.ICU-AF is expected to provide an early warning for antifungal therapy or even empirical antifungal therapy.
Logistic regression analysis showed that the times of arterial catheter and invasive mechanical ventilation were independent risk factors for antifungal therapy in ICU-AF, and ductus arteriosus time was an independent risk factor for empirical antifungal therapy in ICU-AF.By using ML to study the early warning of ICU-AF, the times of arterial catheter insertion and invasive mechanical ventilation can be used to warn critical care physicians on whether antifungal therapy is needed.Patients with arterial catheters may require early empirical antifungal therapy.

Strengths and limitations
This study applied an unconventional method to study susceptibility to ICU-AF: First, we used efficient ML methods to analyze clinical data to reduce the bias of manual analysis.Second, we focused on the early warning of fungal infection, namely, ICU-AF, and this approach is more in line with the needs of treating ICU patients.Finally, we investigated the role of the ICU-AF early warning model in antifungal therapy and empirical antifungal therapy for guiding the management of ICU-AF.
This study has the following limitations.First, this was a retrospective, single-center study.It should be noted that this single-center study involved seven different ICU wards (GICU, RICU, SICU, NICU, NSICU, CSICU, and CCU), and some specific characteristics (such as major abdominal surgery and disturbance of consciousness) were diverse.However, even across seven different ICUs, each patient had these six features.As a routine treatment procedure for patients in the ICU, the duration of these six features was obtained via detailed nursing records and reflected the length of time that patients received treatment in the real world and the homogeneity of fungal infection risk factors across all ICUs.Second, there were more than 40 salient features in the optimal model (Figure 3A).However, we selected only six features with importance >0.05.When multiple features appear in the results, it is crucial to extract better and more convenient feature models for clinical applications.In addition to using 0.05 as a threshold to screen six features, we also explored the important role of these six features in ICU-AF on the basis of clinical practice.Other features (such as central venous catheter, abdominal surgery and SOFA scores) that were reported to be connected with ICU-AF (43,44), could probably have hidden roles.They still have potential value for future discussion.The prevailing view supports that the six features, analyzed in the current study are good predictors of ICU-AF (7,(37)(38)(39)(40)(41)(42).Controlling these six operations is an effective way to reduce ICU-AF.Blaize et al. found that controlling the use of corticosteroids could reduce the risk of invasive pulmonary fungal infections in COVID-19 patients admitted to the ICU (45).Thirdly, regarding the question of whether ICU physicians can distinguish fungal colonization from fungal infection.The AUC of the optimal model for the fungal infection test obtained in this study was 0.670 (Supplementary Figure S1).In the clinical cohort, these were indistinguishable at the time of diagnosis; thus, we advanced the field of view to the acquired fungus.Finally, increasing the amount of training data can enable us to obtain more information and make diverse learning in most cases, as well as increase the chances of achieving better results.Some important studies use 70% or 80% of samples in the training set (46)(47)(48).We randomly assigned 1,434 cases to the sample, with 50% of the cases used as the training set and the rest as the test set, to improve the efficiency of model validation.Meanwhile, it was also ensured that there was no gender and age bias between the training set and the test set.Although this ratio is also a common ratio for dividing datasets in previous studies, such as in some studies on tumor diseases (49, 50), we will continue to collect and expand sample size data in future research to improve the sample ratio in the training set.In summary, ML classifier models in clinical cohorts have the potential to predict the risk of ICU-AFI.The most important risk factors for ICU-AF are the six time-related clinical parameters (arterial catheter, enteral nutrition, corticosteroids, broad-spectrum antibiotics, urinary catheter, and invasive mechanical ventilation) that provide early warnings for the early prevention of fungal infection.Furthermore, this model, although needs to be more clinically validated, has the potential to help ICU physicians assess whether Accuracy of models in ten-fold cross-validation.OR stands for odds ratio, CI for confidence interval; The samples for logistic regression analysis were from all ICU-AF samples with antifungal therapy.Variables were selected with importance ≥ 0.05 in the training set.

FIGURE 1 Flowchart
FIGURE 1 Flowchart for enrollment and screening.Step 1: Preliminarily screen the samples according to the inclusion and exclusion criteria.Step 2: Use lasso dimensionality reduction for all 61 features and select 6 features with importance ≥0.05.Step 3: All samples are randomly and equally divided into training set and test set.Max-min scale: normalization for continuous features, the formula is ′ = − − x x x x x min max min .One-hot: setting unordered classification features to mutually exclusive dummy features.

FIGURE 3 (
FIGURE 3 (A) The 49 features with the highest relative gain for model predicting ICU-AF and the 5 features with importance ≥0.05.(B) Receiver operating characteristic curve (ROC) of models.(C) Scatter plot with linear regression line of best fit with APACHE II score analyzed separately with six features.r 2 : represents the degree of feature fitting; p < 0.05 were considered significant; (D) The heatmap of different features in dead vs. surviving patients, and colors in the heatmap indicate the time (days) for the corresponding feature.AUC, area under the subject curve.APACHE II, Acute Physiology and Chronic Health Evaluation II.

TABLE 1
The characteristics of training set and test set.
draw the density map between each feature and APACHE II, and the lm function was used to fit the regression model.The "pheatmap" package implements heatmap to display sample survival and feature performance.

TABLE 1 (
Continued) The author(s) declare that financial support was received for the research, authorship, and/or publication of this article.This work was funded by 2022 Chongqing Medical University Graduate Smart Medicine Special Research and Development Plan (Project Number: YJSZHYX202222 to Y-sZ), Innovation Project for Doctoral Students at the First Affiliated Hospital of Chongqing Medical University (CYYY-BSYJSCXXM-202209 to Y-sZ), Chongqing medical scientific research project (Joint project of Chongqing Health Commission and Science and Technology Bureau, 2023ZDXM004 to FX), Clinical Medicine Postgraduate Joint Training Base of Chongqing Medical University-the First Affiliated Hospital of Chongqing Medical University (lpjd202001 to FX), The project of Chongqing talents (cstc2022ycjh-bgzxm0131 to FX).

TABLE 2
Independent risk factors associated with antifungal therapy according to ICU-AF.stands for odds ratio, CI for confidence interval; The samples for logistic regression analysis were from all ICU-AF samples.Variables were selected with importance ≥ 0.05 in the training set. OR

TABLE 3
Independent risk factors associated with empirical antifungal therapy according to ICU-AF.