Microbiomic Analysis on Low Abundant Respiratory Biomass Samples; Improved Recovery of Microbial DNA From Bronchoalveolar Lavage Fluid

In recent years the study of the commensal microbiota is driving a remarkable paradigm shift in our understanding of human physiology. However, intrinsic technical difficulties associated with investigating the Microbiomics of some body niches are hampering the development of new knowledge. This is particularly the case when investigating the functional role played by the human microbiota in modulating the physiology of key organ systems. A major hurdle in investigating specific Microbiome communities is linked to low bacterial density and susceptibility to bias caused by environmental contamination. To prevent such inaccuracies due to background processing noise, harmonized tools for Microbiomic and bioinformatics practices have been recommended globally. The fact that the impact of this undesirable variability is negatively correlated with the DNA concentration in the sample highlights the necessity to improve existing DNA isolation protocols. In this report, we developed and tested a protocol to more efficiently recover bacterial DNA from low volumes of bronchoalveolar lavage fluid obtained from infants and adults. We have compared the efficiency of the described method with that of a commercially available kit for microbiome analysis in body fluids. We show that this new methodological approach performs better in terms of extraction efficiency. As opposed to commercial kits, the DNA extracts obtained with this new protocol were clearly distinguishable from the negative extraction controls in terms of 16S copy number and Microbiome community profiles. Altogether, we described a cost-efficient protocol that can facilitate microbiome research in low-biomass human niches.

In recent years the study of the commensal microbiota is driving a remarkable paradigm shift in our understanding of human physiology. However, intrinsic technical difficulties associated with investigating the Microbiomics of some body niches are hampering the development of new knowledge. This is particularly the case when investigating the functional role played by the human microbiota in modulating the physiology of key organ systems. A major hurdle in investigating specific Microbiome communities is linked to low bacterial density and susceptibility to bias caused by environmental contamination. To prevent such inaccuracies due to background processing noise, harmonized tools for Microbiomic and bioinformatics practices have been recommended globally. The fact that the impact of this undesirable variability is negatively correlated with the DNA concentration in the sample highlights the necessity to improve existing DNA isolation protocols. In this report, we developed and tested a protocol to more efficiently recover bacterial DNA from low volumes of bronchoalveolar lavage fluid obtained from infants and adults. We have compared the efficiency of the described method with that of a commercially available kit for microbiome analysis in body fluids. We show that this new methodological approach performs better in terms of extraction efficiency. As opposed to commercial kits, the DNA extracts obtained with this new protocol were clearly distinguishable from the negative extraction controls in terms of 16S copy number and Microbiome community profiles. Altogether, we described a cost-efficient protocol that can facilitate microbiome research in low-biomass human niches.

INTRODUCTION
In recent years, the study of microbial communities colonizing the human body surfaces have revolutionized many fundamental aspects in the areas of medicine and human biology. Now, these communities of microorganisms are considered critical regulators of the normal physiology, and important indicators of pathological processes linked to human disease (Cho and Blaser, 2012;Knight et al., 2017). Exciting developments have highlighted the importance of the homeostatic interaction between the host and its microbiota in health and wellbeing. The potential role that disturbing these coordinated interactions could play in many human conditions is now center stage. Furthermore, the promise that complementary therapies targeting the reestablishment of balanced communication between host and microbiota could cure disease is gaining traction globally (Cho and Blaser, 2012;Knight et al., 2017;van de Guchte et al., 2018).
Since the preconceived assumption of being a sterile environment, the lower airways and specifically the lungs, have been largely neglected in microbiome research. This fact has delayed the realization that chronic respiratory conditions are greatly influenced not only by the local microbiota of the airways, but also by communities inhabiting niches farther away such as those from the gut (Marsland et al., 2015;Dickson et al., 2016b;Budden et al., 2017Budden et al., , 2019Faner et al., 2017). Additional pitfalls in the systematic investigation of the respiratory microbiota rely on the low microbial biomass characteristic of this environment, and the susceptibility of microbial contamination from the oropharynx during sampling of the lower respiratory tract. Thus, although the existence of a truly microbial ecosystem in the lungs is still seen with skepticism, rather than settled residents, it is thought that the lung microbiota may actually represent a transient population (Dickson et al., 2015).
With the exception of infective processes involving high microbial burden, the bacterial biomass present in biological specimens from the lungs (usually bronchoalveolar fluid or BALF) is low (Dickson et al., 2016a). This linked to the high sensitivity of the current genomic technologies, make BALF specimens highly susceptible to confounding issues due to contaminant DNA (Salter et al., 2014). The disturbing possibility that many published studies could be dominated, or at least influenced by environmental contaminants, has stimulated thinking onto strategies and best practices to avoid and mitigate the pernicious effect of artificial communities (Eisenhofer et al., 2019). A closely related problem is that most of the taxa widely recognized as typical environmental contaminants, are also truly representatives of the lower respiratory microbiota (Morris et al., 2013;Salter et al., 2014). These issues highlight a major hindrance to get a robust foundation in our understanding of the role played by the microbiota in promoting a healthy respiratory system, especially those processes occurring early in life. Importantly, susceptibility to variation introduced by contaminants is concentration-dependent (Salter et al., 2014). Thus, one strategy to minimize the effect of environmental DNA contamination is developing efficient protocols to improve DNA recovery from these low-biomass specimens. Different DNA extraction techniques have been used to profile microbial communities from BALF samples; commercial silica column-based kits (Charlson et al., 2011;Willner et al., 2012;Han et al., 2014;Renwick et al., 2014;May et al., 2015;Dickson et al., 2016b;Laguna et al., 2016;Marsh et al., 2016;Wen et al., 2016;Zemanick et al., 2017;Ahmed et al., 2018;Pattaroni et al., 2018;Kyo et al., 2019;Schneeberger et al., 2019;Tong et al., 2019;Wang et al., 2019), magnetic beads-based procedures (Erb-Downward et al., 2011;Pragman et al., 2012;Bernasconi et al., 2016;Frayman et al., 2017;Kloepfer et al., 2018;Esther et al., 2019;Gomes et al., 2019), homemade protocols (Borewicz et al., 2013;Jorth et al., 2019), salt-extraction approaches or the use of cationic detergents in combination with the classical phenolchloroform extraction technique (Willner et al., 2012). However, not many studies have compared the efficiency of these protocols, particularly using samples from young children (Willner et al., 2012). BALF specimens from very young individuals constitute a rare opportunity of evaluating early host-microbiota coordinated interactions (Pattaroni et al., 2018). In many cases these unique samples are volume-limited, requiring an efficient DNA extraction protocol to minimize the susceptibility of introducing unwanted variability (Salter et al., 2014).
In this report, we present an efficient extraction protocol to recover bacterial DNA from BALF supernatants sampled from cystic fibrosis (CF) infants and chronic obstructive pulmonary disease (COPD) adults. Our approach combines an optimized mixture of hydrolytic enzymes to improve the digestion of cell-wall components (Tighe et al., 2017), with the polyethylene glycol (PEG)-induced ψ condensation of DNA in the presence of NaCl (Cheng et al., 2015). We also compare the performance of this new method with that of a commercial kit optimized for extracting DNA from swabs and body fluids (QIAamp Microbiome Kit, Qiagen, 51704), using a customized TaqMan R qPCR assay. Demonstrating the adequacy of the DNA isolated with this new protocol for downstream high throughput applications, we submitted the DNA extracts to 16S rRNA gene amplicon sequencing. In deep contrast with the column-based DNA extracts, the communities associated with the PEG protocol were clearly distinguishable from the corresponding extraction negative controls, suggesting that the choice of the DNA extraction protocol has dramatic effects in the study of communities with low microbial density.

Patient Samples
Two different cohorts of BALF specimens were used in this study. The first cohort consisted of 5 BALF samples from adults (age range 45-82 years; median 55, interquartile range (IQR) 48-76). Four specimens were from patients diagnosed with chronic obstructive pulmonary disease (COPD, GOLD stage 1-4). The fifth sample was from a healthy smoker individual. Subjects in this cohort were without acute infection at the time of BALF sampling and were asked to confirm that they did not have symptoms of infection or antibiotic use in the previous 6 weeks.
The protocol used for collecting the BALF specimens in the COPD cohort was as previously described (Lokwani et al., 2019).
The second cohort was a subset of 6 infants diagnosed with cystic fibrosis (CF) enrolled in AREST-CF surveillance program. These patients were asymptomatic at the time of BALF acquisition. These babies ranged in age from 0.28 to 0.91 years old (median age 0.63; IQR 0.36-0.79) at the time of the bronchial wash collection and were sampled during the follow-up annual visit as previously described (Mott et al., 2012;Caparros-Martin et al., 2020).

Pre-processing of the BALF Aliquots
All the steps described in this paper were performed under sterility conditions in a laminar flow cabinet located in a pre-PCR room. Each BALF aliquot (1 mL) was centrifuged at 20,000 × g for 30 min at 4 • C. The supernatant was discarded and the pellets were carefully resuspended in 100 µL of HyClone Phosphate Buffered Saline solution (PBS) pH 7.5 without EDTA (GE Healthcare, Life Sciences, United Kingdom) by pipetting up and down using filter barrier tips.

DNA Extraction Procedures Using Commercially Available Kits
We used a column-based commercial kit from Qiagen; the QIAamp DNA microbiome kit (Qiagen, 51704). The purification principle for this protocol is based on the ionic interactions between the DNA and the silica columns. Extraction was strictly performed following the manufacturer instructions except for the elution volume that was set to 25 µL, and a bead-beating pre-treatment that was added to improve cell lysis. For this purpose, resuspended cell pellets were bead-beaten with 0.1 g of zirconia/silica beads (0.1 mm diameter. Daintree Scientific) in a cell disrupter (Mini beadbeater-16 TM , Biospec) using 4 pulses of 1 min each. We did also test the performance of host depletion in increasing the recovering efficiency of bacterial DNA from BALF samples. Removal of host DNA was performed following the protocols and reagents provided within the QiaAmp microbiome kit.

Polyethylene Glycol-Based DNA Extraction Protocol
All the reagents used in this protocol are sterile grade products purchased from either SIGMA or Bioworld. Any solution described in this protocol (e.g., enzyme solutions) was prepared in sterile conditions using these commercial reagents. Resuspended pellets were incubated for 4 h at 35 • C with 20 µL of MetaPolyzyme solution (10 mg/mL in Hyclone PBS pH 7.5 without EDTA) (MAC4L, Sigma-Aldrich) (Tighe et al., 2017). Then, a second incubation with 10 µL of proteinase K (10 ng/mL, Sigma-Aldrich) for 1 h at 56 • C was performed. After the second incubation, samples were stored at 80 • C overnight. Twelve hours later, tubes were thaw on ice and cells crushed using a beadbeater as described above. Tubes were then briefly centrifuged (5,000 × g for 1 min at 4 • C). Supernatants were carefully removed and placed in a new tube. The pellets were resuspended in 100 µL of PBS solution and an additional cycle of bead beating-centrifugation performed. Combined supernatants were extracted with one volume of 24:1 chloroform:isoamyl alcohol solution (25666. Sigma-Aldrich). After vigorous vortexing, tubes were centrifuged (20,000 × g at 4 • C for 10 min). The nucleic acid-containing aqueous phase was carefully removed and placed in a new tube. A back-extraction was performed by adding 50 µL of PBS to the organic (chloroform:isoamyl alcohol) phase. After vortexing and centrifugation, the second and first aqueous phases of the same sample were combined. Nucleic acids were precipitated with 1.5 volumes of sterile 30% polyethylene glycol 8,000 solution in 1.6 M NaCl pH 6.7 (41620040-1, Bioworld) for 2 h on ice. After this incubation, samples were centrifuged (20,000 × g at 4 • C for 30 min). The supernatant was discarded and the pellet washed with 800 µL of a 70% ethanol solution prepared with 200 proof ethyl alcohol (SIGMA, E7023), and sterile molecular biology-grade water (W4502, Sigma-Aldrich). The pellet was then air dried and resuspended with 25 µL of filtered (0.1 µm) sterile molecular biology-grade water (W4502, Sigma-Aldrich).

16S rRNA Gene PCR Amplification
Concentration of DNA was evaluated using a NanoDrop spectrophotometer. PCR amplification was performed as follows. One microliter of DNA was used to amplify a fragment of the 16S rRNA gene using the primers 63F (5 -CAGGCCTAACACATGCAAGTC-3 ) and 1387R (5 -GGGCGGWGTGTACAAGGC-3 ) at a final concentration of 0.4 pmol µL −1 (Marchesi et al., 1998). For amplification we used MyTaq DNA polymerase (Bioline, London, United Kingdom) and nuclease-free water was added to a final volume of 25 µL. PCR conditions were as follow: initial denaturation at 95 • C for 3 min, then 30 cycles (95 • C for 30 s, 60 • C for 30 s and 72 • C for 60 s), following a final extension step at 72 • C for 5 min. Negative controls, containing all the components except DNA templates, were run in parallel. We also included technical controls consisting of a DNA extraction negative control. PCR products were subjected to 1% (w/v) agarose gel electrophoresis to confirm the amplification of a single product of the expected size.

16S rRNA Gene-Based BALF-Associated Microbiota Profiling
For 16S rRNA amplicon -based microbial profiling, DNA extracts were used to generate amplicons using a MetaVx TM Library Preparation kit (GENEWIZ, Inc., South Plainfield, NJ, United States) at Genewiz. The V3-V4 hypervariable region of the 16S rRNA gene was targeted and amplified using forward and reverse primers containing the sequences 5 -CCTACGGRRBGCASCAGKVRVGAAT and 5 -GGACTACNVGGGTWTCTAATCC, respectively. The DNA amplicon libraries were validated using Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, United States) and quantified using Qubit 2.0 Fluorometer. Then, the libraries were multiplexed and sequenced on a MiSeq R instrument (Illumina, San Diego, CA, United States) using a 2 × 300 paired-end configuration accordingly to manufacturer's recommendations.
Pre-processing of the sequencing data was carried out using custom shell scripts. Briefly, quality of the raw data was evaluated using FastQC and MultiQC (Ewels et al., 2016). Based on the generated quality reports, raw data was trimmed and cropped using Trimmomatic to remove low quality reads and any remaining Illumina adapter (Bolger et al., 2014). Paired-end reads were then joint using BBMerge (Bushnell et al., 2017). After pre-processing the sequencing data, we obtained more than 6 million paired end reads (length 250 bp) of high quality sequences (average quality Q30). For Operational Taxonomic Unit (OTU) assignment, reads were clustered at a 98% threshold using the SILVAngs analytical pipeline as previously described (Quast et al., 2013). For taxonomic classification a local BLAST search was done against the non-redundant version of the SILVA SSU Reference (release 132) 1 using blastn (Camacho et al., 2009). OTUs representing less than 0.01% across all samples were considered low-count taxa and were removed. This approach has previously been used to avoid inflation of diversity estimates due to OTUs representing potential PCR-related artifacts (Le Cao et al., 2016). After pre-processing the taxonomic table, we took advance of the functions implemented in the R package decontam to identify and to eliminate OTUs likely originated from contamination using the negative extraction controls (Davis et al., 2018). These putative contaminants are provided in Supplementary Table 5 and were fully removed from the corresponding biological specimens. After removing background contaminants, OTU profiles were transformed to relative proportions. Compositional data was then subjected to centered-log ratio transformation to map data into Cartesian space before performing multivariate analysis (Le Cao et al., 2016;Rohart et al., 2017). Principal component analysis was calculated using the pca function as implemented in the R package mixOmics, which uses singular value decomposition for spectral decomposition (Rohart et al., 2017). OTU profiles were quantified by multiplying the compositional taxonomic table by the total bacterial biomass value obtained through qPCR as previously described (Jorth et al., 2019). Analysis of similarities (ANOSIM) was performed on this quantified OTU table using a Bray-Curtis distance metric.
We used R (version 3.6.1) for data analysis using both built-in and custom-made functions (R Core Team, 2017). When using R packages, functions were implemented following the recommendation of the authors in the package vignettes. Lung microbiota composition was evaluated using the MixMC framework implemented in the R package mixOmics (Le Cao et al., 2016;Rohart et al., 2017). Other packages used in this study were vegan for Procrustes and ANOSIM analyses (Oksanen et al., 2016), decontam (Davis et al., 2018), and PairedData (Champely, 2018) and ggplot2 (Wickham, 2016) for visualization.

Absolute Quantification of Bacterial Biomass Using Quantitative PCR
For quantitation of bacterial DNA we used a Custom TaqMan R Gene Expression Assay Design based on a previously reported universal probe for Bacteria (Nadkarni et al., 2002). 1 http://www.arb-silva.de Stenotrophomonas maltophilia-specific qPCR assay was redesigned from that published by Fraser et al. (2019), to fit with the TaqMan assay conditions recommended by Applied Biosystems. Both TaqMan R probes were labeled with a fluorescein dye (FAM) on the 5 end, and a minor groove binder quencher (MGB) on the 3 . Details about the experimental conditions are provided in Supplementary Table 8 following the MIQE guidelines (Bustin et al., 2009). The PCR reaction mix consisted of 10 µL of TaqMan R Fast Advanced Master Mix (2X), 1 µL of TaqMan R Assay (20X), 7 µL of nuclease-free water and 2 µL of DNA extract, per reaction. Assays were performed in TaqMan R Fast 96-well plates and were run on a ViiA R 7 Real-Time PCR System instrument under recommended conditions (50 • for 2 min; AmpliTaq TM Fast DNA Polymerase activation: 95 • for 5 min and then PCR reaction: 40 cycles of 95 • , 1 s and 60 • , 20 s). Absolute quantification was achieved using a standard curve approach. A serial dilution of microbial DNA standard from Pseudomonas aeruginosa (SIGMA, MBD0014) or from Stenotrophomonas maltophilia (Minerva biolabs, DSM50170), was used for the universal probe and for the Stenotrophomonas maltophilia-specific TaqMan R assays, respectively.

Ethics, Consent and Permissions
Ethical approval related to the CF patients was granted to the AREST CF program by the Princess Margaret Hospital for Children, Perth ethics committee (Ref. 1762/EPP. Date of approval December 10th, 2009). The ethical aspects related to the COPD cohort was approved by Hunter New England LHD Ethics committee [Mechanisms of Inflammatory airways disease (05/08/10/3.09)]. Informed consent for publication and participation in this study was obtained from the patients or guardians.

Data Availability
All the data required to reproduce the results of this study are within the manuscript and its supporting information files. The code used in the analysis in this manuscript can be made available for appropriate scientific purposes upon request to the authors. Demultiplexed FASTQ files are deposited in the Sequencing Read Archive under the Bioproject PRJNA636842.

Microbiomic Related Rationale for Developing a New DNA Extraction Protocol
Several methods have been described in the literature to isolate bacterial DNA from BALF (Supplementary Table 1; Charlson et al., 2011;Erb-Downward et al., 2011;Pragman et al., 2012;Willner et al., 2012;Borewicz et al., 2013;Han et al., 2014;Renwick et al., 2014;May et al., 2015;Bernasconi et al., 2016;Dickson et al., 2016a,b;Laguna et al., 2016;Marsh et al., 2016;Wen et al., 2016;Frayman et al., 2017;Zemanick et al., 2017;Ahmed et al., 2018;Kloepfer et al., 2018;Pattaroni et al., 2018;Esther et al., 2019;Gomes et al., 2019;Jorth et al., 2019; Frontiers in Microbiology | www.frontiersin.org Kyo et al., 2019;Schneeberger et al., 2019;Tong et al., 2019;Wang et al., 2019). In some instances, successful recovery of DNA has been associated with the utilization of large volumes of starting material. However, in some circumstances and particularly in the case of infants this is not always possible to achieve. Work in our laboratory using both silica column-or magnetic beadbased procedures to isolate DNA from low volumes (∼1 mL) of bronchial washing supernatants, have exhibited non-optimal bacterial DNA extraction yields unless the sample was associated with an evident bronchial infection. This situation is not particularly satisfactory when the amount of biological material that can be retrieved through bronchial washing is limited, as normally applies to newborn and infant cohorts. Polyethyleneglycol (PEG)-based methods have successfully been used to improve the efficiency of recovering bacteriophages, exosomes or edible nanoparticles (Castro-Mejia et al., 2015;Deregibus et al., 2016;Garcia-Romero et al., 2019;Kalarikkal et al., 2020). These methods have also been shown to improve the yield and purity of microbial DNA recovered from different low-biomass ecological niches (Arbeli and Fuentes, 2007). Mechanistically, PEG induces the condensation of DNA molecules in the presence of salts, a process also called ψ condensation (Cheng et al., 2015). In view of these positive results, we reasoned that combining an improved cell lysis step and the PEG-induced DNA precipitation would result in a significant improvement in the recovery of bacterial DNA from reduced volumes of low-biomass BALF. Details of this protocol are provided in the material and methods section and Supplementary Figure 1.

The PEG-Based Protocol Shows Better Recovery of Bacterial DNA Compared to a Column-Based Kit
We evaluated the efficiency of three different extraction protocols in isolating microbial DNA from reduced volumes of lowbiomass BALF samples. Two or three equal volume aliquots (1 mL) per patient were processed using three different protocols: (a) the PEG protocol described in this report and (b) the QIAamp microbiome kit (Qiagen, 51704). For those specimens for which 3 mL of BALF were available, matched aliquots of the extracted DNA were either untreated or (c) subjected to the optional treatment to deplete host DNA. Clinical details of the study subjects are provided in Supplementary Table 2.
Levels of bacterial DNA were evaluated employing a quantitative PCR (qPCR) custom assay. As indicated from the results of the qPCR experiments, the overall performance of the PEG-based protocol was superior to the column-based kit (Figure 1 and Supplementary Table 3). Success in the isolation of bacterial nucleic acids was further confirmed by selectively amplifying a specific fragment of the 16S rRNA gene (Supplementary Figure 2). In four of the bronchial wash specimens from CF infants, the number of 16S copies obtained with the PEG protocol was higher than the background present in the negative extraction control (Supplementary Table 3). Interestingly, the amount of bacterial DNA retrieved from these samples using the QIAamp microbiome kit was indistinguishable from the corresponding extraction control. This was independent of whether a host depletion step was included. This data suggests that the differences in DNA recovery are likely to be due to either a suboptimal interaction between the DNA and silica surfaces, or a lysis-related extraction bias. On the other hand, similar results were obtained for the BALF samples from the adult cohort, with the PEG protocol producing DNA extracts with higher number of 16S copies than the commercial kit (Figure 1  and Supplementary Table 3). Interestingly, we observed that the number of 16S copies present in the negative extraction controls using the PEG control trended to be lower than the background noise obtained with the column-based kit [mean(SD), PEG: 761.12(158.02); QIAamp microbiome kit: 1218.1(362.3); Welch t-test 0.19] (Supplementary Table 3).

DNA Extracted With the PEG-Based Protocol Is Suitable for Downstream Sequencing and Microbiome Analysis
Commercial kits ensure high quality DNA extracts, which are virtually PCR inhibitors-free. Such DNA samples are appropriate for high-throughput downstream applications. Absence of PCR inhibitors in the PEG extracts was confirmed by conventional PCR targeting a 1,344 bp DNA fragment of the 16S rRNA gene (Supplementary Figure 2). We also tested whether the DNA obtained with the PEG protocol was suitable for 16S rRNA amplicon sequencing. For this approach, we targeted the V3-V4 hypervariable region of the 16S rRNA gene. Amplicon libraries from 31 samples including 5 negative extraction controls were generated and sequenced at Genewiz (Suzhou, China). We obtained 6,679,284 paired-end reads (single length of 250 bp) with a high base calling accuracy (median Phred-like Q-score of 30, IQR 25-37).
After progressing the specific filtering steps described in the methods section, we obtained an OTU profile representing 231 bacterial taxa. As recommended by harmonized global publication data standards (Eisenhofer et al., 2019), we are reporting the taxa detected in negative extraction controls as well as their absolute abundance (Supplementary Table 4). As expected, negative extraction controls presented significantly lower bacterial biomass than the targeted biological specimens (Figure 1A). We also observed that the number of reads in the blank extraction controls was also reduced (controls: median 16,327,IQR 10,849;biological specimens: median 33,786,IQR 21,552. Welch t-test p-value = 0.002). However, total biomass was not linearly correlated with the number of reads, confirming that this variable is not a reliable surrogate of bacterial density [F(1, 29) = 1.889, p-value 0.18, multiple R 2 0.06]. To evaluate the influence of the environmental background associated with each extraction procedure in the taxonomic profiles of the targeted biological specimens, we also made use of the R package decontam (Davis et al., 2018). We identified and filtered out putative contaminants by applying the "combined" method implemented in the function is contaminant (Davis et al., 2018;Supplementary Table 5).
We next evaluated the microbial communities present in both the clinical samples and the extraction controls, using a principal component analysis model (Figure 2). Inspection of the first two components of the model revealed that the samples processed with the column-based kit were closely located to their corresponding negative extraction controls. These samples were distributed along the component accounting for the lower proportion of explained variance (component 2), as opposed to the samples extracted with the PEG method (Figure 2). Thus, the microbial communities profiled from the PEG DNA extracts yielding a number of 16S copies higher than the background noise, were clearly distinguishable from the extraction negative controls (Analysis of similarities, ANOSIM. global R = 0.7748, p < 0.05). These observations suggest that the biological variability captured by the PEG method is higher than that obtained after processing the samples with the commercial kit. We speculated that this could be a consequence of the higher DNA recovery exhibited by this extraction protocol ( Figure 1B and Supplementary Table 3). To test this hypothesis, we calculated the Procrustes distance between paired samples and related this to the absolute difference in bacterial biomass between them using a linear model. The outcome of this analysis confirmed that the higher the difference is in concentration the larger the Procrustes distances are between paired samples [F(1, 13) = 4.706, p-value 0.049, multiple R 2 0.26].
Finally, we analyzed the composition of the resulting microbial communities present in the BALF specimens (Figures 3A,B). Accordingly with the PEG extractions, bronchial washings from CF infants were less diverse than BALF specimens from COPD adults [Richness. Mean (Standard deviation, SD) CF cohort: 42.6(11.4), mean (SD) adult cohort: 76.4(7.7). Welch t-test p-value 0.0002] (Supplementary Figure 3 and Supplementary  Table 6). Bronchial wash-associated communities in CF infants were characterized by the presence of oral flora (Streptococcus and Rothia) and Proteobacteria (Stenotrophomonas and Pseudomonas) (Figure 3A). On the contrary, microbial communities associated with BALF from COPD adults exhibited a higher richness (Supplementary Figure 3 and Supplementary  Table 6). In general the communities from adult individuals were dominated by OTUs assigned to taxa that are typical colonizers of the human oral cavity (Aas et al., 2005). We found interesting that the microbial profiles of COPD patients classified as GOLD1-3 categories (samples 18177, 12306, and 11026) were similar, with an evident increase in bacterial load as disease progresses (GOLD 1-4, samples 18177, 12306, 11026, and 19677) (Supplementary  Table 3). Conversely, the BALF-associated microbiota of the patient classified as GOLD4 (19677) was characterized by the presence of opportunistic oral pathogens such as Brucella or Elizabethkingia (Figure 3B). Further analysis using larger cohorts will address whether this microbial succession in the lungs of COPD patients play a role in disease progression. In the case of samples extracted with the column-based protocol, the OTU profiles were similar to those of the extraction controls (column-based no host depletion method: ANOSIM global R = 0, p-value = 0.46), except for those samples for which a higher concentration of DNA was retrieved (e.g., sample Q19677 and Q12306, Figure 3B and Supplementary Table 3). In these instances, the resulting communities were similar to those profiled in the PEG-associated DNA extracts (ANOSIM, global R = 0, p-value = 0.66) ( Figure 3B). Importantly, the 16S profiles from the PEG extracts in which the bacterial DNA levels were higher than the controls, were consistent with the reported clinical microbiology findings (Supplementary Table 7). In the adult cohort of BALF specimens, we also identified Stenotrophomonas as part of the resulting communities associated with the PEG method. Because of the high prevalence of this pathogen in our cohort, and the ensuing possibility that it could be a contaminant associated with the PEG method, we proceeded with the quantification of Stenotrophomonas maltophilia DNA using qPCR (Supplementary Table 3 and Figures 3C,D). The results of this specific assay demonstrated the presence of genetic material from Stenotrophomonas maltophilia in BALF (Supplementary Table 3 and Figures 3C,D). Importantly, the concentration of Stenotrophomonas maltophilia DNA in the negative extraction controls was at a significantly lower level [mean (SD) PEG extraction controls: 2.26(3.2), mean (SD) PEG extracted samples: 765.57(784.49). Welch t-test p-value 0.01] (Supplementary Table 3). This data confirms that in our cohort, the OTU assigned to the Stenotrophomonas taxon is unlikely to be of contaminant origin, except for those extracts in which the quantity of bacterial DNA is similar to that of the negative extraction controls.

DISCUSSION
Microbiome research using low biomass samples requires special attention to prevent the introduction of unacceptable variability in the form of environmental contaminants (Salter et al., 2014). When the microbial density in the biological specimens is low, extraction procedures can easily introduce bias. This can impact true biological variability and seems to be dependent on available DNA concentration during analysis (Salter et al., 2014). To overcome this limitation, large volumes of starting material are often needed. This is not always possible, especially when dealing with bronchial washes from infants. In this work, we describe a new method for the isolation of DNA from low biomass BALF samples. We demonstrate that this novel protocol yields a higher number of 16S bacterial copies compared to the negative extraction control and captures a higher biological variability. Using equal-volume aliquots from the same BALF specimen, we also show that the PEG method performs better than a column-based kit specifically designed for microbiome studies in human body fluids.
Our study provides further "best practice" confirmation on the necessity of sequencing extraction negative controls, especially when using samples in which the expected microbial density is low (Eisenhofer et al., 2019). We observed in our study, that the taxonomic profiles of the BALF samples were dependent on the extraction method used. Absolute quantification of the bacterial DNA present in our extracts suggested that the contrasting OTU profiles were likely due to differences in the recovery of bacterial DNA associated with each extraction method. In light of these results we agree with recent global recommendation that, when possible, bacterial DNA quantification should be performed to discern true biological variation from environmental noise (Schneeberger et al., 2019).
In this study, our respiratory BALF samples were characterized by the presence of microorganisms associated with the oral cavity. A previous study has shown that oral flora, which can reach the lower airways through microaspiration, is associated with pulmonary inflammation (Segal et al., 2013;Dickson et al., 2017). In this regard, colonization of the lower airways by oral bacterial may underlie an important mechanism in airway disease progression. We noticed that most of the samples processed with the PEG method and especially the CF specimens, reported high levels of an OTU assigned to the Stenotrophomonas taxon. Since this organism is commonly identified as an environmental contaminant (Morris et al., 2013;Salter et al., 2014), we performed a species-specific absolute quantification assay. The results of this approach revealed a higher concentration (in some instances up to 1,000 times higher) of genetic material from Stenotrophomonas maltophilia in the biological specimens than in the negative controls, suggesting a true biological signal. Thus the higher detection of Stenotrophomonas in the PEG extracts is likely associated with the better performance shown by this protocol. Stenotrophomonas maltophilia represents an emergent pathogen in chronic lung disease, which is commonly detected in medical equipment in hospitals and as a transient colonizer in hospitalized patients (Conly and Shafran, 1996;Parkins and Floto, 2015). The association between the presence of this pathogen and the use of anti-Pseudomonas prophylaxis may also help to explain its prevalence in the OTU profiles of the CF cohort (Conly and Shafran, 1996;Parkins and Floto, 2015). However, because the absence of reported Stenotrophomonas infection in the study cohorts by classical clinical microbiological analysis, we recognize that this OTU could represent a transient colonizer. Further studies are required to ascertain if this observation with a possible emerging pathogen is biologically meaningful, especially in CF infants. On the basis of this observation, we also recommend that, apart from microbial load, quantification of suspected taxa through qPCR should be performed with microbiome studies.
We also provide two TaqMan R assays based on those previously reported by Nadkarni et al. (2002) and Fraser et al. (2019). We redesigned the published probes to fit with the assay conditions recommended by Applied Biosystems, mainly related to the size of the targeted amplicon. Previous studies using BALF samples have quantified bacterial load through qPCR (Erb-Downward et al., 2011;May et al., 2015;Bernasconi et al., 2016;Laguna et al., 2016;Marsh et al., 2016;Zemanick et al., 2017;Pattaroni et al., 2018;Esther et al., 2019;Jorth et al., 2019;Schneeberger et al., 2019). In these reports, authors targeted DNA fragments of variable length ranging from 180 to 590 bp (Erb-Downward et al., 2011;May et al., 2015;Bernasconi et al., 2016;Laguna et al., 2016;Marsh et al., 2016;Zemanick et al., 2017;Pattaroni et al., 2018;Esther et al., 2019;Jorth et al., 2019;Schneeberger et al., 2019). Except for those primers used by Bernasconi et al. (2016) (180 bp), this amplicon length range is far from that recommended for qPCR (50-150 bp), because longer products show suboptimal amplification efficiencies (Debode et al., 2017). Unfortunately, any of those studies reported key experimental parameters recommended for publication of qPCR assays such as efficiency (Bustin et al., 2009). To ensure reproducibility and provide the reader with all the information necessary to evaluate the quality and interpretation of the data presented, we have described all the experimental information related to the redesigned TaqMan R assays following the MIQE guidelines in Supplementary Table 8 (Bustin et al., 2009).
We acknowledge some limitations in our study. Firstly, we did not carry out a comprehensive comparison of different DNA extraction protocols. On the basis of the popularity of silica columns in microbiome studies using BALF (Supplementary Table 2), we only evaluated the performance of our PEG protocol with a column-based commercial kit. Thus, we cannot rule out that other extraction technologies may perform similarly to the PEG method. Secondly, we did not use communities with defined known composition. Thus, we cannot surely ensure the precision and dynamic range of the PEG protocol. Because of the former, we have been cautious of fitting all these limitations with the claims we have made.
In summary, we present a new method that compared to a commercial kit, increases the recovery of bacterial DNA from BALF samples. Although, we only tested this method using BALF material, this protocol could also be suitable for DNA isolation from different low-biomass biological specimens such as swabs. We are confident that this approach will further assist researchers to reveal the complex modalities of hostmicrobiota interaction in body niches composed of low microbial density and abundance.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found below: https://www.ncbi.nlm. nih.gov/, Bioproject PRJNA636842.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Princess Margaret Hospital for Children, Perth Ethics Committee (Ref. 1762/EPP; Date of approval December 10th, 2009) and Hunter New England LHD Ethics committee [Mechanisms of Inflammatory airways disease (05/08/10/3.09)]. Written informed consent to participate in this study was provided by the participants' legal guardian/next of kin.