Deciphering the microbial landscape of lower respiratory tract infections: insights from metagenomics and machine learning

Background Lower respiratory tract infections represent prevalent ailments. Nonetheless, current comprehension of the microbial ecosystems within the lower respiratory tract remains incomplete and necessitates further comprehensive assessment. Leveraging the advancements in metagenomic next-generation sequencing (mNGS) technology alongside the emergence of machine learning, it is now viable to compare the attributes of lower respiratory tract microbial communities among patients across diverse age groups, diseases, and infection types. Method We collected bronchoalveolar lavage fluid samples from 138 patients diagnosed with lower respiratory tract infections and conducted mNGS to characterize the lung microbiota. Employing various machine learning algorithms, we investigated the correlation of key bacteria in patients with concurrent bronchiectasis and developed a predictive model for hospitalization duration based on these identified key bacteria. Result We observed variations in microbial communities across different age groups, diseases, and infection types. In the elderly group, Pseudomonas aeruginosa exhibited the highest relative abundance, followed by Corynebacterium striatum and Acinetobacter baumannii. Methylobacterium and Prevotella emerged as the dominant genera at the genus level in the younger group, while Mycobacterium tuberculosis and Haemophilus influenzae were prevalent species. Within the bronchiectasis group, dominant bacteria included Pseudomonas aeruginosa, Haemophilus influenzae, and Klebsiella pneumoniae. Significant differences in the presence of Pseudomonas phage JBD93 were noted between the bronchiectasis group and the control group. In the group with concomitant fungal infections, the most abundant genera were Acinetobacter and Pseudomonas, with Acinetobacter baumannii and Pseudomonas aeruginosa as the predominant species. Notable differences were observed in the presence of Human gammaherpesvirus 4, Human betaherpesvirus 5, Candida albicans, Aspergillus oryzae, and Aspergillus fumigatus between the group with concomitant fungal infections and the bacterial group. Machine learning algorithms were utilized to select bacteria and clinical indicators associated with hospitalization duration, confirming the excellent performance of bacteria in predicting hospitalization time. Conclusion Our study provided a comprehensive description of the microbial characteristics among patients with lower respiratory tract infections, offering insights from various perspectives. Additionally, we investigated the advanced predictive capability of microbial community features in determining the hospitalization duration of these patients.


Introduction
Lower respiratory tract infections are prevalent worldwide, encompassing a spectrum of severity from acute bronchitis to severe pneumonia (Mizgerd, 2006(Mizgerd, , 2008)).However, these infections can be attributed to single or multiple microorganisms, exhibiting a range of virulence from commensal to highly pathogenic (De Roux, 2006;Webster and Govorkova, 2006;Weber et al., 2007).Accurate identification of the causative microorganisms is imperative for effective treatment and prevention of complications.The rapid advancement of mNGS technology offers a more sensitive detection method for pathogenic microorganisms compared to traditional microbiological techniques (Miao et al., 2018).The lung microbiota plays a vital role in maintaining respiratory health and influencing the severity of lower respiratory tract diseases (Fenn et al., 2022).Although studies have explored the characteristics of lung microbiota across different severity levels of lung infections using metagenomics (Zhan et al., 2023), multidimensional analysis of lung microbiota characteristics remains limited.
Machine learning algorithms are algorithms designed to automatically analyze data, uncover patterns, and predict unknown data based on these patterns.They exhibit robust fitting and generalization capabilities, particularly for classification tasks involving complex features.Integration of machine learning into medicine holds the promise of delivering more accurate diagnoses and personalized treatments for patients (Weiss et al., 2012(Weiss et al., , 2015)).By combining machine learning with mNGS, our objective is to address practical clinical challenges from a microbial standpoint.This involves comprehensively elucidating patient microbiota characteristics and leveraging machine learning predictive models.
In this study, to explore the characteristics of patients with lower respiratory tract infections, we grouped the patient with different criteria, such as age and comorbid conditions.We analyzed the microbial differences among various groups based on different classification criteria and investigated the correlations and predictive capabilities of the microbiota using machine learning, specifically focusing on the prediction of hospitalization duration.

Study population
This study enrolled 157 patients with lower respiratory tract infections treated at the Respiratory and Critical Care Medicine Department of Chengdu Third People's Hospital from March 1 to June 30, 2023.Following a thorough evaluation by two seasoned clinicians, 138 patients were selected for inclusion.The research methodology entailed prospective specimen collection and subsequent blinded retrospective analysis, adhering to the principles of the Helsinki Declaration.Participants provided written informed consent, and the study's protocols received approval from the Chengdu Third People's Hospital Institutional Review Board, ensuring compliance with all pertinent ethical standards.The characterization of lower respiratory tract infections is based on the criteria outlined in Huang et al. (Huang et al., 2018).

Specimen collection
A total of 138 patients' bronchoalveolar lavage fluid (BALF) samples were collected to analyze the respiratory tract microbiota, with each sample labeled according to patient details.The collection procedure involved several steps as follows: In cases of localized lesions, the segment containing the lesion was chosen.For diffuse lesions, the most severe segment was selected (Pan et al., 2022).The bronchoscope tip was positioned in the target bronchial segment or sub-end opening.Sterile physiological saline at a temperature of 37°C or room temperature was injected rapidly through the operating channel in multiple injections of 20-50 mL each, with a total volume ranging from 60-120 mL.Immediately after saline injection, appropriate negative pressure was applied to aspirate the bronchoalveolar lavage fluid, aiming for a recommended total recovery rate of ≥30%.The collected fluid comprised secretions from approximately 10 mL of bronchial terminals and alveoli.Any potentially contaminated portion at the front end was discarded, and the remaining portion, constituting at least approximately 5 mL, was promptly collected into a test tube.The collected BALF samples were stored at -80°C.All samples were obtained from the area of lung infiltration, with priority given to the site of most severe infiltration in cases of multiple infiltrated areas.

DNA extraction, library preparation, and sequencing
Initially, cell membrane lysis and host DNA depletion were performed on BALF samples.Following this, 250 ml of the post-lysis supernatant was transferred into a 1.5 mL centrifuge tube and combined with 300 ml of a lysis buffer mixture, followed by homogenization through vortexing.After a brief centrifugation, the mixture underwent a 10-minute incubation at 70°C.DNA extraction was performed using a magnetic bead mixture consisting of 350 ml isopropanol and 15 ml magnetic beads.The DNA concentrations were quantified using the Qubit dsDNA HS Assay Kit (Thermo Fisher Scientific).Subsequently, the DNA underwent fragmentation, end-complementation, and sequencing adapter ligation following the library construction protocol.Finally, the libraries were sequenced using the Vision 1000 high-throughput sequencing platform, targeting an output of 20 million 50 bp singleend sequence data per read.

Bioinformatics analysis
Data quality control and species classification: To ensure the accuracy and reliability of subsequent information analysis results, raw sequencing data undergoes filtering and processing using fastp software to obtain quality-controlled data.Bowtie2 aligns the sequences to the host genome, removing host-aligned sequences.Kraken2 annotates and classifies all effective sequences of the samples to study species composition and diversity information.Bracken re-estimates species composition abundance and excludes background contaminating DNA interference, utilizing negative controls in each mNGS run.Top 15 species by abundance are visualized using the R package ggplot2, with statistical testing via the Wilcoxon rank-sum test.
Alpha diversity analysis: Alpha diversity, measuring abundance and diversity of microbial communities, employs statistical indices estimating species richness and diversity for each sample.The Chao index, estimating species count using the Chao1 algorithm, and the Ace index estimate total species richness.The Shannon index assesses microbial diversity, while the Simpson index quantifies biodiversity.Alpha diversity indices are calculated for each sample using the R package Vegan.
Beta diversity analysis: Beta diversity compares microbial community compositions of different samples or groups.Principal Coordinate Analysis (PCoA) extracts significant elements capturing sample differences, visualized on a twodimensional plot.Bray Curtis distance measures dissimilarity between samples, while UniFrac distance computes PCoA analysis in the R package Vegan.The p-value for PCoA analysis is calculated using adonis, and boxplot p-values using the Wilcoxon rank-sum test.
Differential species analysis: LEfSe (Linear discriminant analysis Effect Size) identifies biomarkers in high-dimensional data.It detects species with significant abundance differences between groups using the Kruskal-Wallis rank-sum test.Subsequently, the Wilcoxon rank-sum test assesses differential species consistency across subgroups.Linear regression analysis (LDA) estimates the influence magnitude of abundance for each component.Differential species between groups are calculated using LEfSe software with thresholds of LDA>=2 and p-value<=0.05.

Statistics analysis
In the clinical data, continuous variables with a normal distribution are presented as mean ± standard deviation (SD), whereas non-normally distributed variables are presented as median (Q1, Q3).Statistical analysis involves the application of Student's t-test and Wilcoxon rank-sum test.Categorical variables are depicted as percentages and scrutinized using the Chi-square test or Fisher's exact test.A two-sided p-value < 0.05 is regarded as statistically significant in all instances.Machine learning feature selection and statistical analysis were carried out using R version 4.2.3 and Python version 3.10.9.

Sample characteristics
The workflow is depicted in Figure 1.In this study, a cohort of 138 patients diagnosed with lower respiratory tract infections was recruited.Based on their clinical characteristics, patients were stratified into three subgroups according to age (>65 years or ≤65 years), presence or absence of bronchiectasis, and presence or absence of fungal infections (Table 1).Specifically, the cohort comprised 72 patients aged over 65 and 66 patients aged 65 or younger.Among these, 23 patients exhibited bronchiectasis while 115 did not.Furthermore, fungal infections were present in 50 patients, while 88 patients did not exhibit fungal infections.The spectrum of lower respiratory tract infections included communityacquired pneumonia, acute exacerbation of chronic obstructive pulmonary disease, bronchiectasis, obstructive pneumonia, acute bronchitis, and asthma.

Age groups
In the age group comparison (Figure 2A), the top 15 microorganisms at the genus level in the elderly group were ranked by their relative abundance.These microorganisms included Pseudomonas, Streptococcus, Corynebacterium, Methylobacterium, Acinetobacter, Prevotella, Xanthomonas, Aspergillus.At the species level, prevalent species were identified as Pseudomonas aeruginosa, Corynebacterium striatum, Acinetobacter baumannii, Aspergillus fumigatus, Human alphaherpesvirus 1, and Parvimonas micra.In the younger group, the predominant 15 microorganisms, including bacteria, fungi, and viruses, consisted of Methylobacterium, Streptococcus, Prevotella, Haemophilus, and Mycobacterium at the genus level, and Mycobacterium tuberculosis, Haemophilus influenzae, and Xanthomonas campestris at the species level.Pseudomonas was most abundant at the genus level in the elderly group, while Methylobacterium was highest in the younger group.At the species level, Mycobacterium tuberculosis was highest in the younger group.Although there was no statistically significant difference in a-diversity between the younger and elderly groups (Figure 2B), b-diversity analysis, including PCA, PCoA, and NMDS (Figure 2C), revealed significant differences, indicating meaningful grouping between the two groups.LEfSe analysis (Figure 2D) showed more common pathogenic microbial species in the elderly group, such as Human gammaherpesvirus 4 (EBV), Human betaherpesvirus 5, and Enterococcus faecalis, while the majority of the younger group consisted of oral and upper respiratory symbiotic bacteria.

Merging the bronchiectasis groups
In the comparison between patients with and without bronchiectasis (Figure 3A), the bronchiectasis group exhibited a higher abundance of Pseudomonas, Haemophilus, Acinetobacter, Streptococcus, Klebsiella, Aspergillus, and Alpha influenzavirus among the top 15 microorganisms at the genus level, ranked by relative abundance.At the species level, dominant species included Pseudomonas aeruginosa, Haemophilus influenzae, Klebsiella pneumoniae, Acinetobacter baumannii, Aspergillus fumigatus, and Influenza A virus.Conversely, in the non-bronchiectasis group, the top 15 microorganisms at the genus level were Methylobacterium, Streptococcus, Pseudomonas, Prevotella, Acinetobacter, Aspergillus, and Alpha influenzavirus.At the species level, prevailing species were Pseudomonas aeruginosa, Mycobacterium tuberculosis, Xanthomonas campestris, Corynebacterium striatum, Acinetobacter baumannii, Aspergillus fumigatus, and Influenza A virus.Notably, the bronchiectasis group exhibited higher abundance of Pseudomonas and Pseudomonas aeruginosa at both the genus and species levels, followed by Haemophilus (Haemophilus influenzae) infection.The non-bronchiectasis group showed a higher abundance of Methylobacterium and Streptococcus at the genus level, and Pseudomonas aeruginosa and Mycobacterium tuberculosis at the species level.Both groups had a similar prevalence of fungal and viral infections.Analysis of a-diversity (Figure 3B) revealed that the bronchiectasis group had lower Shannon and Simpson indices compared to the non-bronchiectasis group, indicating decreased microbial diversity.Furthermore, the analysis of b-diversity (Figure 3C) including PCA, PCoA, and NMDS showed statistically significant differences (p<0.05) between the two groups, highlighting distinct microbial compositions.Results from LEfSe analysis (Figure 3D) indicated significant differences in bacterial composition, with an enrichment of Cupriavidus sp.ISTL7, Pseudomonas phage JBD93, and Mycolicibacterium neoaurum in the bronchiectasis group.In contrast, the non-bronchiectasis group predominantly consisted of normal skin and oral flora, including Methylobacterium, Streptococcus oralis, and Phyllobacterium sp.628.

Merging the fungal infection groups
In the grouping of fungal and bacterial infections (Figure 4A), we observed that the fungal infection group exhibited a higher abundance of Acinetobacter, Pseudomonas, Klebsiella, and Stenotrophomonas at the genus level.Aspergillus predominated among fungal infections, while viral infections included Simplexvirus.At the species level, dominant microorganisms in the fungal infection group included Acinetobacter baumannii, Pseudomonas aeruginosa, Aspergillus fumigatus, and Corynebacterium striatum, while viral infections featured Human alphaherpesvirus 1.In contrast, the group without fungal infections showed a higher abundance of Methylobacterium at the genus level, and Pseudomonas aeruginosa and Mycobacterium tuberculosis at the species level.a-diversity analysis (Figure 4B) indicated that the non-fungal infection group had higher Richness compared to the fungal infection group, suggesting increased abundance of microbial taxa and a more diverse and stable microbial ecosystem.b-diversity analysis (Figure 4C) also revealed significant differences in microbial composition between the two groups, indicating meaningful grouping.Moreover, LEfSe analysis (Figure 4D) demonstrated distinct microbial compositions in the fungal infection group compared to the non-fungal infection group, with notable bacteria such as Pseudomonas aeruginosa, Acinetobacter, and Chryseobacterium bernardetii, and viruses including Human gammaherpesvirus 4 (EBV) and Human betaherpesvirus 5. Additionally, there were differential abundances of fungal species such as Candida albicans, Aspergillus oryzae, and Aspergillus fumigatus in the fungal infection group.In contrast, the non-fungal infection group was characterized by higher abundances of Methylobacterium at the genus level, and Haemophilus influenzae, Cutibacterium acnes, Filifactor alocis, and Labrys sp.KNU 23 at the species level.

Correlation analyses
Next, correlation analysis was conducted on the top 30 bacteria in the six groups based on an LDA threshold of LDA>=2 and pvalue <= 0.05.Positive correlations were observed between Methylobacterium, Neisseria, and Capnocytophaga at the species level in the young group, while a negative correlation was noted between Methylobacterium and Human gammaherpesvirus 4 (EBV), Enterococcus faecium in the elderly group (Figure 5A).Furthermore, the analysis (Figure 5B) revealed positive correlations between ycolicibacterium neoaurum and Cupriavidus sp ISTL7 in the bronchiectasis group with merging bronchiectasis, while the non-merging bronchiectasis group showed positive correlations among most symbiotic bacteria.Subsequently, significant negative correlations were identified between Acinetobacter bacteria (Acinetobacter sp FDAARGOS 494, Acinetobacter sp FDAARGOS 560) and certain bacterial species from Methylobacterium, Stenotrophomonas maltophilia, Cutibacterium acnes, Phyllobacterium sp 628, Labrys sp KNU 23, and Pseudomonas aeruginosa in the merging fungal infection group (Figure 5C).
To explore the association between clinical medication, clinical indicators, and bacterial presence in clinical diagnosis and treatment, a machine learning approach was employed to identify key variables for the bronchiectasis group based on patients' clinical characteristics.A total of 31 key bacteria (Figure 5D), 14 clinical indicators, and 14 medications were identified through feature selection.Subsequently, correlation analysis was performed.Significant associations were observed between Luteitalea pratensis and key markers of inflammation, including C-reactive protein, Procalcitonin, and lymphocyte percentage.Hymenobacter sp.APR13 exhibited correlations with C-reactive protein and Procalcitonin, while Staphylococcus phage StB12 was associated with erythrocyte sedimentation rate (ESR).Streptococcus virus 2972 showed a correlation with Procalcitonin, and Leptotrichia buccalis demonstrated a correlation with interleukin-6.In terms of liver function markers, Gimesia fumaroli displayed correlations with aspartate aminotransferase and total cholesterol.Prosthecochloris sp.HL-130-GSB and Spirosoma aerolatum were related to aspartate aminotransferase.Additionally, correlations were observed among Elizabethkingia anophelis, Fusobacterium necrophorum, and magnesium, globulin, and creatinine.In the correlation between bacteria and drugs (Figure 5E

Machine learning prediction models
To evaluate the predictive influence of bacteria on hospitalization duration in patients with lower respiratory tract infections, we divided the hospital stay days of all patients into two groups: a short group and a long group, based on the median value of 13.Furthermore, we utilized machine learning techniques to develop a predictive model, incorporating the selected variables (Supplementary Tables S4, S5).Remarkably, in the random forest models constructed separately based on clinical indicators and bacteria, the model incorporating bacteria demonstrated superior predictive performance (Figure 6).

Age
Pseudomonas aeruginosa is a Gram-negative bacterium that can survive in various environments and is widely distributed (Silby et al., 2011).As the age of patients infected with Pseudomonas aeruginosa increases, the number of drug-resistant strains also increases (Hu et al., 2019).In the elderly group, the most prevalent genera in BALF are Pseudomonas, with Pseudomonas aeruginosa being the dominant species.In contrast, the microbial abundance in the young group is noticeably different.Mycobacterium tuberculosis, the causative agent of tuberculosis, shows a significant correlation with age, with a higher likelihood of transmission and infection in younger individuals (Borgdorff et al., 2001;Yang et al., 2012).Haemophilus influenzae, a common inhabitant of the oral and respiratory tracts, is a characteristic respiratory microbiota dominated by Streptococcus and Haemophilus (Haemophilus influenzae) in the young group compared to the old group.Additionally, Corynebacterium striatum, a highly abundant species in the genus Corynebacterium, is normally found in various mucosal locations such as human skin and the nasopharynx (Funke et al., 1997).There are reports suggesting that Corynebacterium striatum is increasingly recognized as an infection-related bacterium (Dıéz-Aguilar et al., 2013;Yang et al., 2018).In 2018, researchers in the United States reported three cases of community-acquired pneumonia (CAP) with strains predominantly belonging to the genus Corynebacterium, indirectly indicating a close association between the genus Corynebacterium and lower respiratory tract infections (Yang et al., 2018).However, most elderly individuals have compromised immune function, which is a prerequisite for Corynebacterium infection (Nudel et al., 2018;Lee et al., 2022).Acinetobacter baumannii is the most common bacterium in mechanically ventilated patients (Xie et al., 2018), and the rising antimicrobial resistance of Acinetobacter has led to a broader recognition that it is no longer exclusively a nosocomial pathogen in elderly individuals.It is extensively prevalent in long-term acute care facilities, nursing homes, and the community (Sengstock et al., 2010).Our results show that the majority of patients infected with Acinetobacter baumannii are in the old group.In the old group, Methylobacterium mesophilicum and Rothia mucilaginosa are opportunistic infection bacteria associated with immunodeficiency (Sanders et al., 2000;Chavan et al., 2013;Maraki and Papadakis, 2015), although only a few cases have been reported (Sanders et al., 2000;Engler and Norton, 2001;Maraki and Papadakis, 2015).While individual cases of Rothia mucilaginosa infection have been reported in patients with normal immune function (Baeza Martıńez et al., 2014), it is not a primary consideration for lower respiratory tract infections in patients.As for Prevotella, it primarily resides in the intestines and oral cavity (Tett et al., 2021).Although the diversity of Prevotella is related to the host's diet and lifestyle, it also plays a critical role in maintaining human health and disease (Tett et al., 2021).Related studies have shown that Prevotella can regulate inflammatory responses (Marietta et al., 2016) and some lung inflammation (Bernasconi et al., 2016), may be associated with respiratory dysbiosis (Welp and Bomberger, 2020).In terms of fungi, we also observed Aspergillus fumigatus, a representative of the Aspergillus genus.Aspergillus fumigatus is the most common species in the Aspergillus genus, but fungal infections often have few A comparison of machine learning models under different predictor variables was conducted.Based on microbial and clinical indicators, machine learning models using random forest were constructed to predict patients' length of hospital stay.The term "Accuracy" refers to the proportion of correctly classified samples to the total number of samples.The Receiver Operating Characteristic (ROC) curve is a comprehensive indicator that reflects the sensitivity and specificity of continuous variables."Precision," also known as positive predictive value, represents the proportion of true positive samples among predicted positive samples."Recall," also known as sensitivity, represents the proportion of true positive samples among all positive samples.The F1 score is a weighted average of precision and recall.Delong test, p<0.05(0.0284629), the statistical significance was significant.Li et al. 10.3389/fcimb.2024.1385562Frontiers in Cellular and Infection Microbiology frontiersin.orgnotable characteristics and the pathogen may not be detectable for a long time.Therefore, diagnosing fungal infections in patients with normal immune function can be challenging, especially as age increases and immune function declines (Zhang et al., 2014;Zhou et al., 2023).A multicenter retrospective study has shown that virus reactivation is associated with an increased risk of mortality in patients (Huang et al., 2023).Our LEfSe analysis reveals that the old group has a greater number of common pathogenic microorganism species compared to the young group, such as Human gammaherpesvirus 4 (EBV) and Human betaherpesvirus 5.
In the young group, the majority of microorganisms are symbiotic bacteria in the oral cavity and upper respiratory tract, whereas a negative correlation exists between Methylobacterium and Human gammaherpesvirus 4 (EBV) and Enterococcus faecium in the old group.

Bronchiectasis
Pseudomonas aeruginosa is among the most commonly isolated pathogens in the sputum of bronchiectasis patients, whether in the stable or exacerbation phase of the disease in clinical settings (Tunney et al., 2013;Lin et al., 2016).Moreover, Pseudomonas aeruginosa is a significant risk factor for the severity and prognosis of bronchiectasis (Loebinger et al., 2009;Wang et al., 2018).In the combined bronchiectasis group, Pseudomonas is the most abundant genus, with Pseudomonas aeruginosa being the predominant s p e c i e s .I n t h e n o n -m e r g i n g b r o n c h i e c t a s i s g r o u p , Methylobacterium genus belongs to the opportunistic infection bacteria (Sanders et al., 2000) and is commonly colonized in various parts of the human body (Kaye et al., 1992;Liu et al., 1997).Additionally, the relative abundance of Pseudomonas and Pseudomonas aeruginosa is lower than that in merging the bronchiectasis group.Haemophilus influenzae is significantly associated with the severity of bronchiectasis (Purcell et al., 2014).Within the combined bronchiectasis group, we also found that its relative abundance ranks second.In merging bronchiectasis group, their relative abundance ranks second.Additionally, Klebsiella pneumoniae is a bacterium that distinguishes the merging bronchiectasis group from non-merging group and aligns with previous studies (Huang et al., 2020).Moreover, our study identified significant differences in Acinetobacter baumannii, a bacterium of the Acinetobacter genus, between the merging and non-merging bronchiectasis groups.This disparity extends beyond relative abundance to include variations in the presence of other bacteria within the Acinetobacter genus across these groups.In terms of fungi and viruses, although there is no significant difference at the species level between the two groups, The relative abundance of common fungi (such as Aspergillus) and Alphainfluenzavirus was higher in the merging bronchiectasis group compared to the nonmerging group.In terms of a-diversity, Many studies have indicated that the occurrence of diseases can lead to a reduction in microbial diversity (Oriano et al., 2020;Liu et al., 2023;Wu et al., 2023), our research similarly demonstrates a decrease in microbial diversity in the merging bronchiectasis group.Particularly noteworthy in the LEfSe analysis is the significant difference in bacteria, including Cupriavidus sp.ISTL7, which is commonly found in the human environment (Gupta et al., 2019(Gupta et al., , 2021) ) and the significant difference in Pseudomonas phage JBD93, indirectly reflecting the changes in Pseudomonas aeruginosa in this group.This is similar to an arms race between bacteria and phages, and phages may directly participate in interactions with immune cells and play a role in immune regulation (Lepage et al., 2008;Letarov and Kulikov, 2009).The rise of bacterial drug resistance has sparked considerable interest in the relationship between bacteriophages and Pseudomonas aeruginosa (Haddock et al., 2023), along with a renewed focus on phage therapy (Fujiki et al., 2023).Mycolicibacterium neoaurum, a mycobacterium opportunistic infection, was previously mainly found in immunocompromised individuals (Pang et al., 2022).However, there have been more reports of infections in cases that have undergone invasive medical examinations or surgeries (Shapiro et al., 2023).In the mouse experiment, it was demonstrated that Mycolicibacterium neoaurum enhances the suppressive activity of regulatory T cells (Tregs) and increases the mortality rate in cases of Salmonella coinfection (Wang et al., 2020).We also found a positive correlation between Mycolicibacterium neoaurum and Cupriavidus sp.ISTL7 In the merging bronchiectasis group.

Fungus
In the merging fungal infection group, in addition to Aspergillus, especially Aspergillus fumigatus, which has a relatively high abundance, Acinetobacter and Pseudomonas, including Acinetobacter baumannii and Pseudomonas aeruginosa, show the highest abundance.This is consistent with previous studies (Zhao et al., 2021).It has been reported that Acinetobacter and fungal infections are correlated as two related pathogenic microorganisms (Thoma et al., 2022).Previous studies have also reported that the colonization of Candida in the respiratory tract of patients increases the risk of Pseudomonas ventilator-associated pneumonia (Azoulay et al., 2006).In the LEfSe analysis of this group, Candida albicans, Aspergillus oryzae, and Aspergillus fumigatus were found to be significantly different from the non-merging fungal infection group.Therefore, we speculate that there is a strong correlation between Acinetobacter, Pseudomonas aeruginosa, and fungal infections.Generally, in healthy individuals, innate immunity serves as a barrier against Aspergillus infection.However, individuals with compromised immune function are more vulnerable to Aspergillus infection, particularly in combination with viral infections (Huang et al., 2023).At the viral species level, Human gammaherpesvirus 4 (EBV) and Human betaherpesvirus 5 are distinguishable from the non-merging fungal infection group.And Methylobacterium is the most abundant in the non-fungal infection group.Meanwhile, the nonmerging fungal infection group has a higher richness compared to the merging fungal infection group.A higher abundance microbial ecosystem is usually considered more diverse and stable.Additionally, there are species differences between the two groups.The correlation heatmap demonstrates a significant negative correlation between Acinetobacter bacteria (Acinetobacter sp.FDAARGOS 494, Acinetobacter sp.FDAARGOS 560), and some common environmental bacteria, suggesting possible competition between Acinetobacter and these bacteria (Littman and Pamer, 2011).

Correlation analysis
We conducted further analysis on the correlation between drugs and bronchiectasis -associated infection bacteria.Leptotrichia buccalis is a normal oral bacterium, and there have been isolated reports of it causing severe infectious cavitary pneumonia and sepsis in immunocompromised patients (Morgenstein et al., 1980).The genus Staphylococcus is a common bacterium in the human living environment, and Staphylococcus phage StB12 is closely associated with Staphylococcus.It participates in the encoding and evolution of virulence genes and antibiotic resistance in Staphylococcus (Brüssow et al., 2004).Therefore, our study elucidates the correlation between the use of glucocorticoids and penicillin-like drugs, such as pipracillin, and the prevalence of these phages.While phage therapy has a longstanding history and has been extensively explored in medicine (Samson et al., 2023), it remains uncertain whether these drugs are indirectly or directly related to bacteriophages.As for cephalosporin drugs and voriconazole, Fusobacterium is the main related bacteria.Literature reports have shown that culturing of Fusobacterium nucleatum supernatant induces the expression of SARS-CoV-2 receptor ACE2 and the production of interleukins IL-6 and IL-8 in alveolar epithelial cells, exacerbating SARS-CoV-2 infection (Takahashi et al., 2021).There is also evidence to suggest that Fusobacterium may have a potential role in protecting the oral mucosa from SARS-CoV-2 infection (Nardelli et al., 2021).In our study, we demonstrated an association between pipracillin, voriconazole, and the presence of Fusobacterium, which requires further experimental research to determine whether Fusobacterium is an enemy or friend in patients with bronchodilator.In the correlation analysis of clinical indicators and infection bacteria related to bronchiectasis.Previous studies have found a correlation between CRP and Pneumocystis jirovecii in patients with non-HIV immunodeficiency (Zhao et al., 2022), as well as a correlation between monocyte count and fungal infections (Wang et al., 2022).Magnesium is usually considered to have antiinflammatory effects (Tam et al., 2003), but there are reports that in animal experiments, magnesium may inhibit neutrophil oxidative burst, which is harmful for chronic diseases (Bussière et al., 2002).However, only bacteria such as Pseudomonas aeruginosa were discussed, and we speculate that some common environmental bacteria, including Fusobacterium and Prevotella, are also involved.

Machine learning
Machine learning applications in disease diagnosis (Lai et al., 2024), complication prediction (Pax et al., 2024), and forecasting of factors such as bacterial drug resistance and predictive models for bacteriophage therapy of Escherichia coli urinary tract infections have demonstrated promising predictive efficacy (Hu et al., 2023;Dixit et al., 2024;Keith et al., 2024;Nsubuga et al., 2024).Additionally, numerous clinical machine learning prediction models have been developed to predict disease prognosis and survival time by collecting large-scale clinical features (Kogan et al., 2022;Li et al., 2022;Tang et al., 2022;Li et al., 2023), demonstrating excellent predictive performance.However, in the actual treatment of patients with lower respiratory tract infections (Sethi, 2010;Jain et al., 2015), the complex relationships between microorganisms must be considered.Microorganisms are important reference factors, and it is crucial to understand the relationship between microorganisms and time of hospital stay during the diagnosis and treatment process.The relationship between bacteria and time of hospital stay remains understudied.Our developed machine learning prediction model revealed that incorporating specific bacteria as predictors for the time of hospitalization in cases of lower respiratory tract infections resulted in significantly improved predictive accuracy.This novel insight offers a fresh perspective in patient care, and we anticipate that by advancing our ability to precisely detect microorganisms, we can further tailor individualized treatment strategies.

Conclusion
In conclusion, we initially scrutinized the microbial community characteristics based on age, the presence of bronchiectasis-related infection, and fungal infection.Furthermore, we examined the correlation between the microbial community and clinical indicators, as well as treatment medications in the bronchodilator group.Finally, leveraging machine learning techniques, we juxtaposed specific microbial features with clinical attributes to assess the predictive efficacy of patient hospital stay duration.These findings elucidate the variances in microbial community characteristics in lower respiratory tract infections across diverse conditions and underscore the potential of bacterial features in forecasting the length of patient hospitalization.studies were conducted in accordance with the local legislation and institutional requirements.The participants provided their written informed consent to participate in this study.Written informed consent was obtained from the minor(s)' legal guardian/next of kin for the publication of any potentially identifiable images or data included in this article.

FIGURE 1
FIGURE 1Study design and flow diagram.
FIGURE 2 Microbial characteristics across age groups in patients with pulmonary infections.(A) The distribution of the top 15 microbial taxa at the genus level and species level.(B) Analysis of microbial alpha diversity.(C) Analysis of microbial beta diversity.(D) LEfSe analysis was performed on young and old groups of microorganisms to demonstrate the distribution of the top 30 microorganisms at the species level.
FIGURE 3 Microbial characteristics in patients with merging bronchiectasis and non-merging group in lower respiratory tract infections.(A) The distribution of the top 15 microbial taxa at the genus level and species level.(B) Analysis of microbial alpha diversity.(C) Analysis of microbial beta diversity.(D) LEfSe analysis was performed on two groups of microorganisms to demonstrate the distribution of the top 30 microorganisms at the species level.
FIGURE 4 Microbial characteristics in patients with merging fungal infections and non-merging group in lower respiratory tract infections.(A) The distribution of the top 15 microbial taxa at the genus level and species level.(B) Analysis of microbial alpha diversity.(C) Analysis of microbial beta diversity.(D) LEfSe analysis was performed on two groups of microorganisms to demonstrate the distribution of the top 30 microorganisms at the species level.
FIGURE 5Correlation analysis was performed between two groups within each cluster and the key bacteria in the bronchiectasis group.(A-C) Microbial correlations among the control groups of different age groups, bronchiectasis groups, and fungal groups were analyzed based on the top 30 microorganisms identified through LEfSe analysis.(D) A total of 31 bacterial features were selected based on RFECV feature selection from the bronchiectasis group and the control group.(E) Based on the clinical data characteristics of the patients, feature variables were selected through machine learning.After screening, 31 key bacteria, 14 clinical indicators, and 14 medications were identified in the bronchiectasis group.Correlation analysis was then conducted.Left panel: Analysis of the correlation between clinical indicators and key bacteria.Right panel: Analysis of the correlation between medications and key bacteria on the right.