Application of mNGS in the study of pulmonary microbiome in pneumoconiosis complicated with pulmonary infection patients and exploration of potential biomarkers

Background Pneumoconiosis patients have a high prevalence of pulmonary infections, which can complicate diagnosis and treatment. And there is no comprehensive study of the microbiome of patients with pneumoconiosis. The application of metagenomic next-generation sequencing (mNGS) fills the gap to some extent by analyzing the lung microbiota of pneumoconiosis population while achieving accurate diagnosis. Methods We retrospectively analyzed 44 patients with suspected pneumoconiosis complicated with pulmonary infection between Jan 2020 and Nov 2022. Bronchoalveolar lavage fluid (BALF) specimens from 44 patients were collected and tested using the mNGS technology. Results Among the lung microbiome of pneumoconiosis patients with complicated pulmonary infection (P group), the most frequently detected bacteria and fungi at the genus level were Streptococcus and Aspergillus, at the species level were Streptococcus pneumoniae and Aspergillus flavus, respectively, and the most frequently detected DNA virus was Human gammaherpesvirus 4. There was no significant difference in α diversity between the P group and the non-pneumoconiosis patients complicated with pulmonary infection group (Non-P group) in pulmonary flora, while P< 0.01 for β diversity analysis, and the differential species between the two groups were Mycobacterium colombiense and Fusobacterium nucleatum. In addition, we monitored a high distribution of Malassezia and Pneumocystis in the P group, while herpes virus was detected in the majority of samples. Conclusions Overall, we not only revealed a comprehensive lung microbiome profile of pneumoconiosis patients, but also compared the differences between their microbiome and that of non-pneumoconiosis complicated with pulmonary infection patients. This provides a good basis for a better understanding of the relationship between pneumoconiosis and microorganisms, and for the search of potential biomarkers.


Introduction
Pneumoconiosis is a group of lung diseases caused by the inhalation of inorganic mineral particles, usually because of certain occupations. Its main pathological features include chronic lung inflammation and progressive pulmonary fibrosis (Perret et al., 2017), which can lead to respiratory and/or cardiac failure and eventually death. Pneumoconiosis is prevalent worldwide, with more than 60,000 new cases reported worldwide in 2017 . With the development and optimization of the industry in recent years, the pneumoconiosis population has decreased from 23.33% before 1970 to 2.29% in 2020 . However, the mortality rate of pneumoconiosis is relatively high (GBD 2017 Disease and Injury Incidence and Prevalence Collaborators, 2018;GBD 2013 Mortality andCauses of Death Collaborators, 2015), which is a serious threat to global public health.
Patients with pneumoconiosis are susceptible to microbial invasion such as Mycobacterium tuberculosis (Jun et al., 2013), nontuberculous mycobacteria (NTM) (McGrath and Bardsley, 2009) and Aspergillus (Vangara et al., 2022), leading to pulmonary infection. And many patients with advanced pneumoconiosis die of respiratory failure due to pulmonary infections (Barnes et al., 2019;Qi et al., 2021). Traditional etiologic methods such as microscopy, smear, and culture have low sensitivity, subjectivity, and contamination, which can lead to missed or false detection and affect patient outcomes (Dahyot et al., 2017). It is very important for patients with pulmonary infections to identify the etiology and use accurate drugs, especially for patients with lung damage such as pneumoconiosis. Many studies have revealed that the abundance and composition of microbial communities vary in different body habitats, with strong links to health status and human disease (Dickson et al., 2020;Wu et al., 2020). However, current analysis of bacterial community diversity in pneumoconiosis mostly uses sputum culture and 16S rRNA, which are not sufficient for microbiome analysis, and in most cases, microorganisms cannot be identified to species level (Mingjing Chen et al., 2017;Zhimin Ma, 2020;Druzhinin et al., 2022).
Metagenomic next-generation sequencing (mNGS) has the advantages of broad coverage, unbiased and unpredictable, and can simultaneously identify bacteria, fungi and viruses in a single sample (Chiu and Miller, 2019;Chen et al., 2021;D'Humières et al., 2021). It has been widely used in clinical practice in recent years, playing an important role in assisting clinical diagnosis, guiding rational drug use, reducing patient burden, and improving patient clinical outcome (Qian et al., 2020). In addition, mNGS does not require culture and pathogen detection results are typically available within 24-48 hours and are less susceptible to antibiotics than culture (Miao et al., 2018). Early diagnosis of pneumoconiosis complicated with pulmonary infection patients is very important due to the poor prognosis (Barnes et al., 2019;Qi et al., 2021), while the use of mNGS technique has not been reported for these patients. This study retrospectively examines pulmonary microbiome (bacterial, fungal, viral) characteristics in pneumoconiosis patients with pulmonary infection (P group), compares the pulmonary microbiome to non-pneumoconiosis patients with pulmonary infection (Non-P group), analyzes differential microbiome, and explores potential diagnostic biomarkers of pneumoconiosis.

Study population
Patients with suspected pneumoconiosis complicated with pulmonary infection were recruited, the diagnostic criteria for pulmonary infection was shown in Figure 1 Shi et al., 2019), and pneumoconiosis was diagnosed with pneumoconiosis by the Chinese diagnostic standard GBZ 70-2015 and the International Labor Organization's classification standard for pneumoconiosis (Honma et al., 2004), Recruitment was carried out at a single site in West China Fourth Hospital Sichuan University, Chengdu between Jan 2020-Nov 2022, Patients who were under 18 years of age, unable to obtain bronchoalveolar lavage fluid (BALF), and had incomplete information were excluded from our study. Besides, some of the collected samples have been tested by G test, GM test or culture before mNGS. Data were collected on the demographics, underlying diseases and clinical features of the patients enrolled and were listed in Table 1.

Specimen collection
BALF was obtained from 44 participants. The purpose of collecting BALF is to make an etiologic diagnosis of the patient's infection. Samples were collected from patients according to standard procedures (Levy et al., 2018). After local anesthesia of the patient's throat, the fiberoptic bronchoscope was introduced. The lung was lavaged with room temperature sterile saline several times through the fiberoptic bronchoscope, 20-60 mL each time. 10 mL of the sample was removed from the recovered solution, place 2 mL of it into a sampling tube with RNA protection solution (Sigma-Aldrich) and the rest into a sterile nucleic acid-free DNA sampling tube and store immediately at -80°C.

Sample DNA and RNA extraction
BALF DNA was extracted using methods previously described (Mac Aogaín et al., 2021;Ju et al., 2022), take 50 mL of proteinase k and 1 mL of BALF sample, digest at 60°C for 20 min, and then leave at 4°C for 5 min to lower the reaction temperature. Transfer the sample to a sterile test tube and centrifuge briefly followed by DNA extraction using the TIANamp Magnetic DNA Kit (DP710-t2, Tiangen, China) according to the manufacturer's protocol. Sputum was liquefied by 0.1% DTT (dithiothreitol) for 20 min at 56°C before extraction. The QIAamp Viral RNA Mini Kit (Qiagen) was used to extract RNA from the BALF (Langelier et al., 2018).
DNA libraries were prepared using the KAPA Hyper Prep Kit (KAPA Biosystems) according to the manufacturer's protocol. Libraries were constructed after Qubit quantification. For RNA extraction samples, rRNA was removed from total RNA and libraries were constructed after purification as described for DNA library construction. Agilent 2100 was used for quality control and then DNA libraries were sequenced on the Dif seq platform for 50 bp paired end sequencing (Dinfectome Medical Technology Inc, Nanjing, China).

Bioinformatics analysis
For pathogen identification, we used an in-house developed bioinformatics pipeline (Zeng et al., 2022). Briefly, low quality reads, adapter contamination, duplicated and shot (length <36 bp) reads were removed to generate high quality sequencing data. Sequences from the human host were identified by mapping to the human reference genome (hs37d5) using the bowtie2 software (Langmead and Salzberg, 2012). Reads that could not be mapped to the human genome were retained. They were aligned to the microorganism genome database for pathogen identification. Our microorganism genome database contained the genome sequences of bacteria, fungi, viruses, and parasites (can be downloaded from https://www.ncbi.nlm.nih.gov/) (Wood et al., 2019).

Interpretation and reporting
The mNGS pathogen detection pipeline was described in previous studies (Miao et al., 2018;Miller et al., 2019;Qian et al., 2020;Zeng et al., 2022;Chen et al., 2023;Xu et al., 2023), and the criteria for detection positivity were as follows: 1) at least one species-specific read for Mycobacterium tuberculosis, Nocardia and Legionella pneumophila detection; 2) for other bacteria, fungi, virus, and parasites, at least three unique reads were needed; 3) pathogens were excluded if the ratio of microorganism reads per million of a given sample versus NTC was < 10.

Statistics analysis
The statistical analysis was carried out using the R software (v4.2.1) (R Core Team, 2021). Alpha diversity was estimated by Shannon index and Simpson index based on the taxonomic profile of each sample. Beta diversity was assessed by Bray-Curtis measure. PERMANOVA was performed using the R package "vegan" to analyze the Bray-Curtis distance in different P and Non-P groups. In all cases, two-tailed analysis was performed and considered. Differences were regarded as significant at P < 0.05. Differential relative abundance of taxonomic groups at the genus/species level between groups was tested using the Kruskal-Wallis rank sum test (R package "kruskal.test") (Kruskal and Wallis, 1952). Statistical Inclusion and exclusion flowchart of study.

General information of study participants
120 patients suspected of pulmonary infection and pneumoconiosis were screened, 44 eligible patients were included in the final analysis. Including 25 patients with pneumoconiosis and 19 patients with non-pneumoconiosis, 25 patients with pneumoconiosis and 19 patients with non-pneumoconiosis underwent bronchoscopy to obtain BALF. In terms of patient composition, all participants in the study were male and no female patients were enrolled in pneumoconiosis due to occupational characteristics. The main types of dusts causing pneumoconiosis according to clinical data were production dust (indoor work), mineral dust (coal mine, drilling related work), and the average number of years patients were exposed to such work was 10.76 years.

Characteristics of the pulmonary microbiome of pneumoconiosis patients
We plotted bar charts based on the frequency of species detection in pneumoconiosis patients, with the top 10 genera and top 20 species detected. In BALF samples, 521 bacterial species, 78 fungi species, and 17 viral species were detected in the

Microbiota analysis between P and Non-P groups
Analysis of microbiome differences in pneumoconiosis patients and non-pneumoconiosis patients will help understand the relationship between microbes and pneumoconiosis and identify biomarkers relevant to pneumoconiosis diagnosis.
Bar graphs were plotted based on the relative abundance of detected species, as shown in Figure 3, and the species with the highest relative abundance at the genus level in the P and Non-P groups were detected as Streptococcus. Among the top 10 genera in terms of relative abundance, the relative abundance of Streptococcus, Prevotella, Mycobacterium and Rothia in the P group was higher than that Non-P group, while all other genera had higher relative abundance in the Non-P group, the relative abundance of Corynebacterium was essentially equal between the two groups. At the species level, among the top 10 species by relative abundance, Prevotella melaninogenica, Rothia mucilaginosa, Streptococcus oralis, Streptococcus mitis were detected in higher relative abundance in the P group than Non-P group, while the remaining species had higher relative abundance in the Non-P group. Among them, Pseudomonas aeruginosa was usually associated with poor patient prognosis , while Abiotrophia defectiva was normal in the oral, genitourinary, and intestinal tracts, may cause sometimes serious infections in humans .
To analyze the differences in species diversity between the groups, a-diversity and b-diversity were used. The findings proved that there was no significant difference in ACE, Chao1, Shannon or Simpson between the two groups (P > 0.05, only the Shannon Diversity Index results were shown), indicating similar species variety. The difference in species between groups was analyzed with b diversity, and P < 0.01, suggesting that there was a remarkable difference in species between groups and the grouping was meaningful, as shown in Figure 4.
We tested species differences between P and Non-P groups at phylum, genus and species level. No conspicuous differences were found in the phylum and genus between the groups, However, the distribution of species differed dramatically. Mycobacterium colombiense (M. colombiense) and Fusobacterium nucleatum (F. nucleatum) were evidently different in their presence ( Figure 5A), with the former being detected mainly in pneumoconiosis patients and the latter mainly in non-pneumoconiosis patients. The study also used LEfSe analysis to explore species that differed strikingly between groups ( Figure 5B), with only three species differing between the two groups, including one at the genus level and two at the species level (i.e. the two different species mentioned above), the genus Capnocytophaga was enriched in the P group.
Sperman correlation analysis was performed to explore the correlation between clinical parameters such as patient's age, pneumoconiosis years, and inflammatory indicators at admission with significantly different species and the top 18 species in terms of relative abundance (for a total of 20 species, Figure 6). Prevotella, Comparison of the relative abundance of microorganisms between P and Non-P groups. (A) Distribution of bacteria at the genus level in the P and Non-P groups. (B) Distribution of bacteria at the species level in the P and Non-P groups.
Actinomyces and Rothia were common colonizing organisms in the mouth, Prevotella melaninogenica, Prevotella pallens, Actinomyces odontolyticus, Rothia mucilaginosa and other oral bacteria were distinctly and negatively correlated with patients' age, pneumoconiosis years and lymphocyte count, which may mean that the abundance of these microorganisms decreases as pneumoconiosis progresses. M. colombiense was positively correlated with years of work related to pneumoconiosis, suggesting that the likelihood of M. colombiense infection increased with the progression of pneumoconiosis, while we observed that the relative abundance of Pseudomonas aeruginosa was positively correlated with the length of hospitalization of pneumoconiosis patients, which seemed somewhat unusual and might be related to the small number of patients enrolled.

Comparison of fungi and virus detection in P and Non-P
The mNGS technology can identify and detect bacteria, fungi and viruses in the same sample, which is more conducive to a fully revealed microbiome signature. The top 20 genera/species were plotted in terms of relative abundance of species detected in the P group, as shown in the Figure 7. At the genus level, the top four genera detected were Aspergillus, Candida, Malassezia and Pneumocystis. Among them, more Malassezia and Pneumocystis were distributed in the P group, while Aspergillus and Candida were more dominant in the Non-P group. At the species level, among the top five detected species, Aspergillus sydowii, Aspergillus versicolor, Candida albicans were higher in the Non-P group than in the P group, while Aureobasidium melanogenum, Clavispora lusitaniae were higher in the P group. The viruses detected were displayed in Figure 7C below, with more viruses detected in the P group, while Human gammaherpesvirus 4, Human betaherpesvirus 5, Influenza A virus were mainly detected in the Non-P group. Human gammaherpesvirus 4, Human betaherpesvirus 5, Human betaherpesvirus 7 and Human betaherpesvirus 6A were mainly detected in the P group. The Human gammaherpesvirus or Human betaherpesvirus mentioned above belong to the same family, Herpesviridae.

Discussion
In this study, mNGS technology was used to comprehensively reveal the pulmonary microbiome of pneumoconiosis patients, including the characteristics of bacteria, fungi and viruses, through BALF samples, and compare the differences in the lung microbiome between the P and Non-P groups so as to compare the microbial differences between the two groups for the exploration of potential biomarkers. To our knowledge, this current study is the first to investigate the lung microbiome of pneumoconiosis patients using a comprehensive and systematic mNGS technique and is also the first study to reveal differences in the lung microbiome of patients with pneumoconiosis versus non-pneumoconiosis.
Due to the chronic progressive disease of pneumoconiosis and the usual damage to the respiratory mucosa in pneumoconiosis patients, pneumoconiosis patients have a high probability of the lower respiratory tract (Xin and Zhang, 2017). Our study is the first to use mNGS to reveal the lung flora of pneumoconiosis complicated with pulmonary infection patients. In a previous study, Druzhinin et al. employed 16S to analyze the microbial composition of sputum samples from coal workers' pneumoconiosis (CWP) and observed a significant increase in the abundance of Streptococcus compared to the healthy group (Druzhinin et al., 2022). In addition, Li et al. analyzed the intestinal flora of pneumoconiosis patients and demonstrated a remarkable increase of Prevotella abundance in the pneumoconiosis group compared to the control group . Similarly, we monitored higher abundance of Streptococcus and Prevotella in BALF samples from the P group compared to the Non-P group, however, the differences between both groups were non-significant, which we analyzed may be related to differences in sample type, as well as the fact that sputum specimens are susceptible to oral colonization flora compared to BALF samples.
Infections caused by fungi are gradually increasing in the clinic due to the irrational use of antibiotics and the increased use of hormonal drugs. Aspergillus is one of the main pathogens causing invasive fungal diseases, as well as chronic pulmonary aspergillosis, may worsen symptoms in advanced chronic obstructive pulmonary disease (COPD) (Hammond et al., 2020), and is associated with high mortality (Vandewoude et al., 2004). Aspergillus fumigatus is the most common agent of invasive aspergillosis and has been widely studied and reviewed (Dewi et al., 2021;Deng et al., 2023). However, Aspergillus flavus is the most frequently detected fungi in our studies of the pulmonary microbiome of pneumoconiosis patients, it can produce the most carcinogenic mycotoxin aflatoxins and cause aspergillosis in immunecompromised patients. Meanwhile, in vivo experimental studies have shown that the fungi is more toxic than Aspergillus fumigatus and other Aspergillus species in terms of time to death and initial inoculum in normal and immunocompromised experimental mice Clinical and microbial correlation analysis, Work years, Patient's years of pneumoconiosis-related work; P-years, Pneumoconiosis years; WBC, White Blood Count; PCT, Procalcitonin; CRP, C-reactive protein; NEUT, Neutrophil count; LYC, Lymphocyte count. The symbol * represent significance p < 0.05. (Rudramurthy et al., 2019). The G test is widely used for invasive fungal infections (Lu et al., 2011;Li et al., 2015), while the GM test can further identify invasive aspergillosis for early diagnosis (Guo et al., 2010). In our study, some of the BALF samples were subjected to both G test and GM test, however, their negative results indicated the limitations of the traditional testing method to some extent, while the culture of BALF samples seemed to be unsatisfactory. Due to the specificity of the pneumoconiosis patient population, most of the patients have been on long-term antibiotic and antifungal medication prior to the relevant tests, which we speculate may be one of the reasons for the unsatisfactory results of the traditional tests, while some studies have reported that the detection rate of mNGS is relatively less affected by the use of antibiotics compared to the traditional testing modalities (Miao et al., 2018;Diao et al., 2021). Beyond this, a combination of guidelines and consensus, mNGS will be conducted when conventional tests fail to clarify the pathogen, which may be due to the high cost limitations of sequencing (Chinese Thoracic Society, 2023). We expect the reduced cost of mNGS technology in the future to make this tool more accessible, especially for low resource settings where the burden of infectious diseases is high and the availability of many pathogen-specific assays is low (Ramachandran et al., 2022). Analysis of viruses and fungi in P and Non-P groups. (A) Distribution of fungi at the genus level in the P and Non-P groups. (B) Distribution of fungi at the species level in the P and Non-P groups. (C). Distribution of viruses at the genus level in the P and Non-P groups.
The high detection rate of Mycobacterium in pneumoconiosis patients has been confirmed in large number of studies, including Mycobacterium tuberculosis and NTM (Kim et al., 2009). M. colombiense is mainly found in patients with pneumoconiosis and is an emerging species in the complex group of Mycobacterium avium, characterized by acid resistance, immobility, rod-shaped structure, and slow growth. It was first isolated and described by Murcia in 2006, and can be isolated in blood, sputum, and lymph nodes (Murcia et al., 2006;Tang et al., 2023). The bacterium is prone to cause severe pulmonary infection in immunodeficient or immunosuppressed patient (Yu and Jiang, 2021), disseminated diseases (Pena et al., 2019), ganglionar mycobacteriosis related diseases (Larry et al., 2019), and disseminated diseases associated with immunocompetent patients have also been reported (Esparcia et al., 2008;Tang et al., 2023). Cases of the bacterium have been reported in Europe, America, and Asia (Vuorenmaa et al., 2009;Poulin et al., 2013;Gao et al., 2014). However, there is a lack of attention to this bacterium, and it is often ignored in clinical diagnosis (Van Ingen et al., 2018). Our study identified for the first time that M. colombiense was substantially enriched in BALF samples of P group, which may be related to lung damage of these patients. The detection of this bacterium requires special attention as it could be a potential biomarker to distinguish pneumoconiosis from non-pneumoconiosis. However, this result has not been reported in previous studies of flora associated with pneumoconiosis (Druzhinin et al., 2022;, which may be due to differences in sample types. Although our study inaugurally evaluates the lung microbiota of pneumoconiosis complicated with pulmonary infection patients and reveals a notable enrichment of M. colombiense in the P group, further validation with larger sample sizes still is needed at a later stage to characterize the lung microbiota of pneumoconiosis complicated with pulmonary infection patients. More and more studies have found the relationship between viruses and human diseases. Viruses may cause serious respiratory diseases, tumors, and neuropsychiatric related diseases in humans (Gaglia and Munger, 2018;Bjornevik et al., 2022;Domingo and Rovira, 2020), where respiratory tract viral infection is one of the most common diseases in the human worldwide . We found more virus species in pneumoconiosis patients in this study, suggesting that patients like this may be more susceptible to viral attack, and the viruses detected were mainly Human gammaherpesvirus 4 and Human gammaherpesvirus-like viruses. Like other herpesviruses, the above viruses are double-stranded linear DNA viruses that exhibit a biphasic lifecycle, which are carried for life after infection, and overproduce when immunity is low or compromised, leading to human infection. Studies have shown that herpesviridae reactivation is associated with worse clinical outcomes, possibly as a direct cause or as a manifestation of the outcome of exacerbation of diseases (Huang and He, 2020). We only detailed the lung viruses in pneumoconiosis patients, and the relationship between viruses and the development, diagnosis and treatment of pneumoconiosis patients remains to be explored in more studies.
Overall, our study analyzed the differences in pulmonary microorganisms between pneumoconiosis with pulmonary infection and non-pneumoconiosis with pulmonary infection patients and screened for differential flora between the two groups, such as M. colombiense, F. nucleatum and the genus Capnocytophaga. These species could be used as potential biomarkers for the diagnosis of patients with pneumoconiosis with pulmonary infection. In addition, M. colombiense was also confirmed to be positively correlated with the number of years of work related to pneumoconiosis, tentatively suggesting a correlation between pneumoconiosis and microorganisms. This study contributes to the understanding of the relationship between microorganisms and pneumoconiosis and provides potential biomarkers for the diagnosis of pneumoconiosis with pulmonary infection, as well as basic data for the investigation of the pathogenesis of the disease.
This study still has some shortcomings. First, this is a singlecenter study and the patients enrolled only represent the lung microbiome of pneumoconiosis patients around that center. In addition, the number of patients in this cross-sectional study is relatively small due to the reduced number of pneumoconiosis patients and the fact that the patients are scattered in different hospitals, so more centers are needed to participate and enroll more patients to study the lung microbiome of pneumoconiosis in depth.

Conclusion
In this study, mNGS technology was used to fully expose the microbiome characteristics of the lungs of patients who had pneumoconiosis. Among the bacterial microbiota in the lungs of pneumoconiosis patients, Streptococcus were mainly detected, with Streptococcus pneumoniae as the main organism. Fungi were mainly detected in Aspergillus with Aspergillus flavus as the main organism, and the most frequently detected virus was Human gammaherpesvirus 4. The P and Non-P groups had different species at the species level, namely M. colombiense and F. nucleatum, with the former mainly detected in pneumoconiosis patients and the latter mainly in non-pneumoconiosis patients. As a result, we uncovered microbiome characteristics and differences between pneumoconiosis and non-pneumoconiosis with pulmonary infection patients, which provides a good basis for better understanding the relationship between pneumoconiosis and microorganisms, as well as discovering potential biomarkers.

Data availability statement
The data presented in the study are deposited in the SRA (https://www.ncbi.nlm.nih.gov/sra/) repository, accession number PRJNA985087.

Ethics statement
The studies involving human participants were reviewed and approved by Ethics Committee of West China Fourth Hospital Sichuan University. The patients/participants provided their written informed consent to participate in this study.

Author contributions
MZ and XY designed the study and drafted the manuscript. LX collected the patients' samples and clinical information. ZZS performed the mNGS sequencing and analyzed the data. All the authors read and approved the final manuscript.