Improving Pulmonary Infection Diagnosis with Metagenomic Next Generation Sequencing

Pulmonary infections are among the most common and important infectious diseases due to their high morbidity and mortality, especially in older and immunocompromised individuals. However, due to the limitations in sensitivity and the long turn-around time (TAT) of conventional diagnostic methods, pathogen detection and identification methods for pulmonary infection with greater diagnostic efficiency are urgently needed. In recent years, unbiased metagenomic next generation sequencing (mNGS) has been widely used to detect different types of infectious pathogens, and is especially useful for the detection of rare and newly emergent pathogens, showing better diagnostic performance than traditional methods. There has been limited research exploring the application of mNGS for the diagnosis of pulmonary infections. In this study we evaluated the diagnostic efficiency and clinical impact of mNGS on pulmonary infections. A total of 100 respiratory samples were collected from patients diagnosed with pulmonary infection in Shanghai, China. Conventional methods, including culture and standard polymerase chain reaction (PCR) panel analysis for respiratory tract viruses, and mNGS were used for the pathogen detection in respiratory samples. The difference in the diagnostic yield between conventional methods and mNGS demonstrated that mNGS had higher sensitivity than traditional culture for the detection of pathogenic bacteria and fungi (95% vs 54%; p<0.001). Although mNGS had lower sensitivity than PCR for diagnosing viral infections, it identified 14 viral species that were not detected using conventional methods, including multiple subtypes of human herpesvirus. mNGS detected viruses with a genome coverage >95% and a sequencing depth >100× and provided reliable phylogenetic and epidemiological information. mNGS offered extra benefits, including a shorter TAT. As a complementary approach to conventional methods, mNGS could help improving the identification of respiratory infection agents. We recommend the timely use of mNGS when infection of mixed or rare pathogens is suspected, especially in immunocompromised individuals and or individuals with severe conditions that require urgent treatment.

Pulmonary infections are among the most common and important infectious diseases due to their high morbidity and mortality, especially in older and immunocompromised individuals. However, due to the limitations in sensitivity and the long turn-around time (TAT) of conventional diagnostic methods, pathogen detection and identification methods for pulmonary infection with greater diagnostic efficiency are urgently needed. In recent years, unbiased metagenomic next generation sequencing (mNGS) has been widely used to detect different types of infectious pathogens, and is especially useful for the detection of rare and newly emergent pathogens, showing better diagnostic performance than traditional methods. There has been limited research exploring the application of mNGS for the diagnosis of pulmonary infections. In this study we evaluated the diagnostic efficiency and clinical impact of mNGS on pulmonary infections. A total of 100 respiratory samples were collected from patients diagnosed with pulmonary infection in Shanghai, China. Conventional methods, including culture and standard polymerase chain reaction (PCR) panel analysis for respiratory tract viruses, and mNGS were used for the pathogen detection in respiratory samples. The difference in the diagnostic yield between conventional methods and mNGS demonstrated that mNGS had higher sensitivity than traditional culture for the detection of pathogenic bacteria and fungi (95% vs 54%; p<0.001). Although mNGS had lower sensitivity than PCR for diagnosing viral infections, it identified 14 viral species that were not detected using conventional methods, including multiple subtypes of human herpesvirus. mNGS detected viruses with a genome coverage >95% and a sequencing depth >100× and provided reliable phylogenetic and epidemiological information. mNGS offered extra benefits, including a shorter TAT. As a complementary approach to conventional methods, mNGS could help improving the identification of respiratory infection agents. We recommend the timely use of mNGS when infection of mixed or rare pathogens is suspected, especially in INTRODUCTION Pulmonary infection is a type of respiratory tract infection (RTI) that may lead to various complications and is associated with a high mortality rate worldwide (Magill et al., 2014). Pulmonary infections are cause by a wide variety of pathogens, including bacteria, fungi, viruses, and parasites, alone or in combination. Therefore, accurate and timely diagnosis of the cause of the infection is crucial to enable the appropriate treatment of pulmonary infection and improved outcomes, especially among patients who need combined treatment for coinfections (Hardak et al., 2016).
In current clinical practice, conventional methods for diagnosing the cause of infection include microbial culture, serology, antigen/antibody assays, and polymerase chain reaction (PCR)-based nucleic acid detection (Loeffelholz and Chonmaitree, 2010;Labelle et al., 2010). Nevertheless, the diagnostic efficiency of these methods is hindered by the high diversity of RTI pathogens, as well as the presence of commensal microbiota and pathobionts in the respiratory tract (Huffnagle et al., 2017). For example, microbial culture has a long turn-around time (TAT) and is unable to detect viruses and parasites, while antigen/antibody assays may have limited sensitivity (Loeffelholz and Chonmaitree, 2010). Although conventional PCR-based nucleic acid detection has high sensitivity and specificity, it detects a limited range of microorganisms which may not include the pathogen responsible for the infection. Therefore, pathogen detection and identification methods for pulmonary infection with higher diagnostic efficiency are urgently needed to overcome the limitations in sensitivity, specificity, TAT, and diagnostic spectrum.
Unbiased metagenomic next generation sequencing (mNGS) has been used for the detection of infectious pathogens, especially for detecting rare or newly emergent pathogens (Lu et al., 2020), and exhibits better diagnostic performance than traditional methods (Miao et al., 2018;Zhang et al., 2019). mNGS analysis is capable of simultaneously detecting thousands of pathogens using a diverse range of specimen types Wilson et al., 2019;Zhang et al., 2019;Xing et al., 2020;, and has the potential to substantially increase the diagnostic efficiency. Miao et al. reported that the mNGS had higher sensitivity and specificity than microbial culture, especially for the detection of Mycobacterium tuberculosis, viruses, anaerobic bacteria, and fungi (Miao et al., 2018). Another study by Zhou et al. demonstrated that the performance of mNGS was less affected by prior antibiotic exposure than culture (Zhou et al., 2019). Although mNGS has been applied in the diagnosis of RTI using a range of specimen types Zhou et al., 2019), there have been few comprehensive studies of the performance and value of mNGS for diagnosing pulmonary infections.
In this study, we compared the diagnostic yield between mNGS and conventional methods, and evaluated the clinical impact of mNGS in the diagnosis of pulmonary infections.

Study Design and Participants
This single-center prospective observational study was conducted in Huashan Hospital, Shanghai, China from November 1, 2016 to August 1, 2017. Patients who met the following criteria were enrolled: (1) exhibited typical clinical signs of pulmonary infection such as fever, cough, expectoration, and respiratory failure; and (2) the diagnosis of pulmonary infection was supported by radiological evidence, including the result of chest X-ray or computed tomography scan. Those in whom the diagnosis of an infection was ruled out and those who were lost to follow-up were excluded from the cohort. The recruitment process is shown in Figure 1. Respiratory tract samples, including nasopharyngeal swabs (NPS), sputum, and bronchoalveolar lavage fluid (BALF), were collected from patients within 24 hours (NPS and sputum) or 48 hours (BALF) of admission or disease onset.
The samples were sent for culture and smear tests (culture medium: bacteria: blood agar plates/chocolate agar plates/ MacConkey agar; fungi: Sabouraud Dextrose Agar; Mycobacterium: Roche medium. Condition: 35°C, with 5% CO 2 ). Other conventional pathogen detection tests, such as the FilmArray Respiratory Panel (FA-RP), were conducted as required. Duplicate specimens were later submitted for mNGS analysis.
Informed consent was obtained for each patient prior to enrolment. The study was approved by the ethics review committee of Huashan Hospital (No. Ky2017-338). Patients' medical records were reviewed to collect baseline information, including age, sex, presence of immunosuppressive conditions, onset site, whole blood cell count, C-reactive protein and procalcitonin level, smear test results, culture, and other microbiological results, and the patients' treatment regime.

Metagenomic Next Generation
Sequencing and Data Analysis NPS, sputum, and BALF samples from patients were collected according to standard operating procedures. Each tip of the NPS was immersed in 3 mL preservation medium (UTM-RT transport medium, COPAN Diagnostics Inc, CA, USA). DNA extraction was conducted for each sample, while RNA extraction and reverse transcription were applied according to the patient's manifestations at the discretion of the physician's clinical decisions, particularly if a viral infection was suspected.
For DNA extraction, 1.5 mL microcentrifuge tubes each containing a 0.6 mL sample or immersing preservation medium and 1g 0.5mm glass bead were attached to a horizontal platform on a vortex mixer and agitated vigorously at 2800-3200 rpm for 30 min. Then a 0.3mL sample was separated into a new microcentrifuge tube, and the total DNA was extracted using the TIANamp Micro DNA Kit (DP316, TIANGEN BIOTECH) according to the manufacturer's recommendation. For RNA extraction, total RNA were extracted from the 0.3 mL sample or immersing preservation medium by QIAamp ViralRNA Mini KIT(52904#, QIAGEN). The complementary DNA (cDNA) was generated from RNA templates by reverse transcription using SuperScript ™ II Reverse Transcription Kit (18064-014 Invitrogen), followed by the synthesis of the second strand. The total DNA or cDNA was subjected to library construction through DNA-fragmentation (150bp), end-repair, adapter-ligation, and unbiased PCR amplification. Agilent 2100 was used for quality control of the DNA libraries (200-300bp). Quality qualified libraries were sequenced by BGISEQ-50 platform (Jeon et al., 2014).
After removing low-quality reads (< 35 bp) and computational subtracting human host sequences mapped to the human reference genome (hg19) from the sequencing data by Burrows-Wheeler Alignment (0.7.10-r789) (Li and Durbin, 2009), high-quality sequences were generated. Following the removal of low-complexity reads according to prinseq (version 0.20.4), the remaining sequences were phylogenetically classified by aligning to PMDB (PMseq metagenomic Database, version 3.0, BGI-locally established database) consisting of 2,700 whole genome sequences of viral taxa, 1,494 bacterial genomes or scaffolds, 73 fungi related to human infection, and 47 parasites associated with human diseases, which were downloaded from NCBI (ftp://ftp.ncbi.nlm.nih.gov/genomes/). Using JX625134.1 as the reference genome, two adenovirus B1 genomes were assembled with SPAdes-3.12.0. Thirty-six human adenovirus B genomes of high identity (percent identity ≥ 96.9%) with the assembled adenovirus B1 genomes, and thirteen human adenovirus reference genomes of different serotypes from NCBI Reference Sequence database were downloaded for phylogenetic analysis as outgroup. Single copy genes were identified following genome annotation (by Prokka v1.12), gene alignment (by blastall v2.2.25) and clustering. Human adenovirus genomes were aligned using the single copy genes as conserved regions by MUSCLE v3.7 (Edgar, 2004). Phylogenetic analyses of the conserved regions were conducted by PhyML software v3.0 (Guindon et al., 2010) using Maximum likelihood method, with the HKY85 substitution model and gamma distribution rates model as the chosen parameters. The reliability of each nodes was estimated by aLRT method with non-parametric SH branch support mode.

Criteria for a Positive mNGS Result
The sequencing results of each sample were categorized into 2 tables, each presenting bacteria/fungi and virus, respectively. The specifically mapped read number (SMRN) of each microbial taxonomy was normalized to SMRN per 20 million (M) of total sequencing reads (SDSMRN, standardized SMRN).

S D S M R N = S M R N T o t a l r e a d s Â 20 million
A virus was considered positively detected if: 1) it was among the top 3 viruses with highest SDSMRN; and 2) it had a SDSMRN > 5. A bacterial/fungal species was considered positively detected if: 1) it belonged to the top 10 genera with the highest SDSMRN; 2) it ranked first within its genus; 3) it had a SDSMRN>1; and 4) it was a commonly reported pulmonary infectious pathogen.
However, there were several exceptions for certain pathogens. For the detection of Mycobacterium spp., Nocardia spp., Brucella spp., etc., because of the difficulty of DNA extraction and low possibility for contamination, the pathogen was considered detected if:1) its genus was among the top 20 with highest SDSMRN; 2) it ranked first within its genus; and 3) it had a SDSMRN>1. For the detection of pathogens within Enterobacteriaceae family, only the species with highest SDSMRN was considered as a positive detection.

Conventional Microbiological Analysis
Samples parallel underwent conventional microbiological methods and mNGS. All of the collected samples were sent to the clinical laboratory, Huashan Hospital for culture and smear tests of bacteria, fungi, and mycobacteria. Blood agar plates, chocolate agar plates and MacConkey agar were used for culture of bacteria, with the temperature of 35°C and 5% CO 2 . Roche medium were for mycobacterium. Sabouraud Dextrose Agar were used for fungi, with the temperature of 37°C and 25°C. A FilmArray Respiratory Panel (FA-RP, Biofire, Salt Lake City, UT, USA) was employed for nucleic acid detection if the suspected pathogens were within the detection targets, which consisted of adenovirus, coronavirus (strains HKU1, NL63, 229E, OC43), human metapneumovirus, rhinovirus/enterovirus, influenza (strains A, A/H1, A/H3, A/H1-2009, B), parainfluenza virus (strains 1,2, 3, 4), and respiratory syncytial virus (RSV) as well as the bacterial respiratory pathogens Mycoplasma, B pertussis, and Chlamydophilia.

Statistical Analysis
The chi-square test was applied to assess the pathogen-specific diagnostic performance of each method, reported as sensitivity, specificity, positive predictive value and negative predictive value with their 95% confidence intervals (95% CI). Statistical analysis and figure drawings were performed using the SPSS statistical package 20.0 software and GraphPad Prism 5 software. P values< 0.05 were considered statistically significant.

General Characteristics of the Enrolled Cohort
A total of 111 patients with suspected pulmonary infections consented to sample collection and were clinically screened ( Figure 1). Of these patients, 4 were lost to follow-up, 1 did not receive any mNGS results, and 6 were confirmed to have non-infectious conditions, resulting in the final enrollment of 100 patients. Their baseline characteristics are shown in Table 1. Respiratory samples from these patients were tested using next generation sequencing as well as traditional methods.

Extra Detection of Pathogens by mNGS
To explore the "false-positive" results of mNGS against conventional methods, patients' clinical data were analyzed thoroughly by two experienced physicians. In total, mNGS identified 183 culture-negative pathogens, including Neisseria

Application of the mNGS Data in Genomic Analysis
In addition to pathogen detection, mNGS data could also provide genetic information for epidemiology analysis. As viral genomes are relatively small in size, good genome coverage and sequencing depth were achieved in the detection of certain viral strains for further analysis. In this study, a total of 45 viruses were reported positive by mNGS, of which 13.3% (6/45) had qualified genome coverage (over 90%) and sequencing depth (over 30 ×) for whole genome assembling and genomic analysis. These 6 viral strains consisted of adenovirus B1 (n=4), influenza virus A (n=1) and HPV-4 (n=4). As representative, adenovirus B1 detected in 2 samples with high genome coverage (over 95%) and sequencing depth (over 180 ×) were used for genome assembling and analysis. Both adenovirus B1 genomes were assembled from samples collected in 2017. Phylogenetic analysis of the assembled adenovirus B1, its closely related genomes, as well as reference genomes, revealed that six species of Mastadenovirus (Human mastadenovirus A-F) formed six branches (Figure 3). The Human adenovirus B branch could be classified into three clades: the two strains in this study (ADV-17S0835897 and ADV-17S0836382) and 30 Human adenovirus 7 strains, as well as the Human adenovirus 7 reference genome formed Clade 1; Clade 2 was formed by six Human adenovirus 3 strains and the Human adenovirus 3 reference genome; and Human adenovirus 35 and Human adenovirus 11 references genomes formed Clade 3. The ADV-17S0835897 and ADV-17S0836382 had high genetically similarity with a strain (MG696148) described as a possible cause of a cluster severe acute respiratory infections in Jiangxi province, China, and three strains (KP896479, KP896480 and KP896481) related to an outbreak of febrile respiratory illness in Hubei, China. Since the patients from whom ADV-17S0835897 and ADV-17S0836382 were detected had no contact history with each other, the high genetic similarity of these two stains indicated that there might be an epidemic adenovirus B strain in 2017.

DISCUSSION
Accurate and fast pathogen detection is essential for the management of RTIs. Although previous studies have reported FIGURE 3 | Phylogenetic analysis of the representative adenovirus B1 genomes. This analysis involved 2 newly assembled adenovirus B1 genomes, 36 published human adenovirus B genomes, and 13 human adenoviruses from NCBI Reference Sequence database. The two adenovirus B1 genomes (ADV-17S0835897 and ADV-17S0836382) were located in the same branch, and had high genetic similarity with strains identified in China. the use of mNGS for the identification of respiratory pathogens, few studies have comprehensively evaluated the overall diagnostic performance of mNGS in RTI. This cross-sectional study evaluated the diagnostic yield and extra diagnostic value of mNGS in RTI.
Although mNGS is a good supplement to the current pathogen detection methods, its diagnostic performance has limitations. mNGS had lower sensitivity than PCR in the detection of certain viruses, such as influenza A and rhinovirus. When FA-PR was used as the referent, the sensitivity and specificity of mNGS were 50% and 100%, and mNGS failed to identify 14 viruses. Other studies have had similar findings (Prachayangprecha et al., 2014;Thorburn et al., 2015). Thorburn et al. reported that a mNGS had a sensitivity and specificity of 78% and 80%, respectively, with RT-PCR as referent, and attributed the limited sensitivity of mNGS to the low abundance of the 11 undetected respiratory viruses, which is consistent with the findings of Prachayangprecha et al. Those similar conclusions from different studies suggested that mNGS might not be as sensitive as PCR detecting respiratory viruses. As the sensitivity of mNGS is significantly impacted by the sequencing depth, theoretically, increasing the total number of sequencing reads per sample could improve its sensitivity, and cost as well. In contrast to the above findings, mNGS detected an additional 14 viruses that were beyond the detection targets of FA-RP. Among them, 3 CMV were confirmed to cause infection in immuno-compromised patients, which indicates mNGS's potential in the detection of rare and unexpected viruses, especially under special circumstances. In conclusion, although mNGS with current depth and procedure cannot replace PCR for the diagnosis of common viral RTIs, its unbiased detection enables the identification of viral pathogens that are undetectable using conventional PCR panels.
In the detection of pathogenic bacteria and fungi, although the performance of mNGS varied across different pathogens, it detected significantly more potential pathogens than culture in this study. In our practice, a total of 183 culture-negative pathogens were identified by mNGS. Among them, 16 bacterial/fungal pathogens were considered highly clinically relevant, including fastidious pathogens such as M. tuberculosis and Nocardia spp., which may require a long incubation time, as well as some unculturable pathogens under standard conditions such as Mycoplasma pneumoniae. mNGS, with a relatively short TAT and untargeted nature, was capable of detecting those pathogens quickly. Considering these results, mNGS may serve as an important supplement to current conventional culture, and improve the pathogen detection and disease management of patients with complex infectious conditions (Parize et al., 2017;Pan et al., 2019;Wang et al., 2020). Similar conclusions have been made from previous studies. When used as a supplementary method to culture, mNGS increases the diagnostic yield, in focal and central nervous system infections Zhang Y. et al., 2020).
Nevertheless, mNGS was unable to detect certain culturable pathogens. In this study, mNGS missed 34 bacteria/fungi tested positive by culture, and performed poorly in the detection of pathogens such as Klebsiella pneumoniae. It is possibly due to the hindrance of the commensal microbiome in the respiratory tract (Wypych et al., 2019). Another limitation of mNGS is that it is unable to discriminate the pathogenicity status of the pathobionts detected. In this study, mNGS identified 225 pathogenic bacterium or fungus and 26 viruses in 100 samples, most of which belong to the respiratory tract commensal microbiome or contamination and were not clinically relevant. Such large amounts of information could be confusing and even misleading to physicians while making clinical decisions. Furthermore, a standard criterion for the interpretation of mNGS results, such as the definition of "positive or negative", is still lacking, which may also affect the clinical use of mNGS. In conclusion, although mNGS is not suitable for use as the sole diagnostic method for RTIs, it could improve diagnostic efficiency and serve as a supplementary method to culture. However, the interpretation of mNGS results can be rather confusing presently. mNGS is capable of providing genetic and genomic information with significance in epidemiologic analysis. Since mNGS can provide information on the genetic sequence of the detected pathogens, its application for identifying newly emergent and rare pathogens, has been widely acknowledged (Lu et al., 2020;Zhu et al., 2020). With its genome assembly and genomic analysis procedure, the genetic information provided by mNGS could be further applied in evolutionary and epidemiologic studies. Our results showed that at a total sequencing depth of 20M reads/sample, over 10% of the detected respiratory viruses had adequate genome coverage and depth for further genomic and epidemiologic analysis. Since fungi and bacteria have larger genomes, our study did not achieve whole genome assembly of fungal and bacterial pathogens. However, the assembly of marker genes or partial genomes with mNGS data for genomic analysis has been reported (Zhu et al., 2018).
This study had limitations. First, the sample size was limited, which might have affected the accuracy of the evaluation of the performance of mNGS. Second, the sample types were varied, and included NPS, sputum, and BALF. The lack of standardization of the sample collecting method and site could also have affected the NGS results. To further evaluate the application of mNGS application in the diagnosis of pulmonary infections, multicenter prospective studies with a larger number of participants are required. In addition, the impact of the sample collection method and sample type on mNGS performance need further evaluation.
In conclusion, mNGS currently cannot replace conventional methods of pathogen detection, but its unbiased detection and genetic information capabilities contributed to additional diagnosis yield, making it suitable for use as a supplementary method.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found at: https://db.cngb.org/cnsa/, CNP0001450.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by Ethics Review Committee of Huashan Hospital, Fudan University. The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
Y-YQ, H-YW, and YZ collected and analyzed medical data of the patients, and wrote and revised the manuscript. H-CZ, Y-MZ, XZ, and YY participated in the treatment of the patients during hospitalization and data collection. PC and H-LW participated in the next generation sequencing and data analysis. J-LJ, J-WA, and W-HZ made a critical contribution to the treatment plan of the patient and made a critical revision of the manuscript for important intellectual content.