Comparison Analysis of Different DNA Extraction Methods on Suitability for Long-Read Metagenomic Nanopore Sequencing

Metagenomic next-generation sequencing (mNGS) is a novel useful strategy that is increasingly used for pathogens detection in clinic. Some emerging mNGS technologies with long-read ability are useful to decrease sequencing time and increase diagnosed accuracy, which is of great significance in rapid pathogen diagnosis. Reliable DNA extraction is considered critical for the success of sequencing; hence, there is thus an urgent need of gentle DNA extraction method to get unbiased and more integrate DNA from all kinds of pathogens. In this study, we systematically compared three DNA extraction methods (enzymatic cell lysis based on MetaPolyzyme, mechanical cell lysis based on bead beating, and the control method without pre–cell lysis, respectively) by assessing DNA yield, integrity, and the microbial diversity based on long-read nanopore sequencing of urine samples with microbial infections. Compared with the control method, the enzymatic-based method increased the average length of microbial reads by a median of 2.1-fold [Inter Quartile Range (IQR), 1.7–2.5; maximum, 4.8) in 18 of the 20 samples and the mapped reads proportion of specific species by a median of 11.8-fold (Inter Quartile Range (IQR), 6.9–32.2; maximum, 79.27]. Moreover, it provided fully (20 of 20) consistent diagnosed results to the clinical culture and more representative microbial profiles (P < 0.05), which all strongly proves the excellent performance of enzymatic-based method in long-read mNGS–based pathogen identification and potential diseases diagnosis of microbiome related.


INTRODUCTION
Metagenomic next-generation sequencing (mNGS) is a hypothesis-free and unbiased approach that has the potential to detect all the known and unidentified pathogens yet. Because of its target agnostic nature, mNGS enables the discovery of new organisms in clinical sample and is especially suitable for rare, novel, and atypical etiologies of complicated infectious diseases, as well as the molecular diagnosis of polymicrobial infections (Goldberg et al., 2015;Cummings et al., 2016;Gu et al., 2019). Although such an unbiased approach appears highly suitable for pathogen diagnosis, difference in pathogens lysis method results in different pathogen distribution (Mattei et al., 2019). Currently, the most common used method is mechanical lysis with hard bead-beating, which may result in excessive DNA fragmentation (Salonen et al., 2010). This method fades the advantage of long sequence reading for the emerging sequencing techniques such as Nanopore and PacBio (Rhoads and Au, 2015;Wang et al., 2021). Furthermore, longer sequence reads can increase taxonomic resolution of sequence classification because they are more readily classified to species or subspecies level; meanwhile, short reads are often difficult to classify to species accurately and can sometimes result in misdiagnoses (Schlaberg et al., 2017). Therefore, there is still an urgent need for optimized cell wall degradation methods that provide DNA with high integrity from all kinds of pathogens.
Urinary tract infections (UTIs) are one of the most common infections in human, which can be caused by the broader microorganisms of bacteria and fungi (Hasman et al., 2014;Zhang et al., 2022). The vast microbial diversity present results in different optimal DNA extraction methods for different cell wall structures and compositions (Maukonen et al., 2012). Therefore, urine metagenomic pathogen diagnosis studies require an optimized DNA extraction method ensuring efficient cell lysis, minimal DNA shearing and unbiased microbial DNA recovery. In addition, it also needs to generate the most representative distribution of present microbial species. Notably, urine can be collected non-invasively in large volumes and therefore represents an attractive target for diagnostic assays. Although there has been much attention and efforts paid on establishment of mNGS-based diagnosed assay for UTI (Imirzalioglu et al., 2008;Schmidt et al., 2017;Li et al., 2020), there has been fewer studies aimed to evaluate the compatibility of DNA extraction methods for emerging long-read mNGS testing.
In this study, we compared three DNA extraction methods of mechanical lysis, enzymatic lysis, and a control method (DNA extracted directly without pre-cell lysis). Using metagenomic nanopore sequencing as the indicator, we assessed the quantity and integrity of the extracted DNA, the microbial diversity recovery, and the proportion of target microbial reads while keeping all the other steps standardized, with the goal of selecting a most compatible DNA extraction method for greater identification of potential pathogens when using long-read mNGS-based pathogen diagnostic analysis.

Study Design
DNA of the urine samples were extracted with three different methods in this study: Method 1, DNA extracted directly by the IndiSpin Pathogen Kit (Indical Bioscience); Method 2, DNA extracted based on mechanical lysis; Method 3, DNA extracted based on enzymatic lysis. We compared the three DNA extraction methods by evaluating the DNA yield and integrity, DNA recovery of specific species, and microbial diversity. Overview of this study is shown in Figure 1.

Subjects and Urine Sample Collection
A clinical diagnosis of UTI required to refer to the culture result and consider indicators including a white blood cell count of > 10 7 /L, an epithelial cell count of < 10 7 /L, fever, dysuria, frequency of urination, and urgency (Willner et al., 2014;Kumar et al., 2015). In addition, the following criteria were used to determine inclusion in this study: the patients who had a few symptoms including urinary urgency, frequent urination, and painful urination; and the culture results were available and positive. Urine samples with less than 1 ml remained or with more than three species positive in culturation were excluded. There were 20 urine samples finally collected from 20 adults included in this FIGURE 1 | Schematic workflow of the study. DNA was extracted with three methods from urine samples and performed sequencing based on MinION. Created with BioRender.com.

DNA Extraction Methods
Each urine sample was aliquoted (1 ml) into three 1.5-ml Eppendorf tubes (Eppendorf) and centrifuged at 20,000 × g for 5 min to enrich for microbes. Then, 800 ml of supernatant was discarded, and the pellet was resuspended in the residual volume (200 µl) by gentle vortex to prepare the enriched urine samples. The detailed methods to extract DNA are listed below.
(i) Method 1. DNA Extraction Directly Without Pre-Cell Lysis One aliquot of each enriched urine samples (200 ml) was used to extract DNA directly by the IndiSpin Pathogen Kit without pre-cell lysis, to be a method control. Briefly, after 200 ml of urine sample was added to a 20-ml aliquot of Proteinase K, 100 ml of Buffer VXL including 1 mg of Carrier RNA was added to the mixture and incubated for 15 min at 20°C-25°C. After this, 350 ml of Buffer ACB was added to the samples and mixed thoroughly by pulse vortex. Then, all the lysates were transferred to the Mini column and centrifuged at 6,000 × g for 1 min. The collection tubes containing the filtrate were discarded and placed the Mini column in the clean collection tubes. Six hundred microliters of Buffer AW1 was added to the Minin column for washing the DNA by a centrifugation of 6,000 × g for 1 min. The washing step above was repeated using 600 ml of Buffer AW2. After this, the membrane was dried by centrifuging at 20,000 × g for 2 min with clean collection tubes. Finally, the DNA was eluted by 100 ml of Buffer AVE. The concentrations of DNA were measured using Qubit 4.0 fluorometer with the dsDNA HS Assay kit (Thermo Fisher Scientific).

(ii) Method 2. DNA Extraction With Mechanical Lysis
Mechanical lysis of cell walls was accomplished with bead beating. One aliquot of enriched urine samples (200 ml) was transferred into Pathogen Lysis Tubes (Qiagen) with glass beads, and 50 ml of Buffer ATL (containing Reagent DX, Qiagen) was added according to the manufacturer's instructions. The Pathogen Lysis Tubes were then attached to a horizontal platform on a vortex mixer and vortexed for 10 min at maximum speed. After that, the Pathogen Lysis Tubes were removed and briefly spined to collect any drops from the inside of the lid. DNA was extracted from the supernatant using the IndiSpin Pathogen Kit as described in Method 1.

(iii) Method 3. DNA Extraction With Enzymatic Lysis
One aliquot of enriched urine samples (200 ml) was used to extract DNA by enzymatic lysis method. Five microliters of lytic enzyme solution (Qiagen) and 10 µl of MetaPolyzyme [Sigma Aldrich; reconstituted in 750 µl of Phosphate Buffer Saline (PBS)] were added to the 200-µl samples and mixed by gentle pipetting. Mixed samples were incubated at 37°C in shaker for 1 h to lyse microbial cells. DNA was extracted from each post-lysed sample using the IndiSpin Pathogen Kit as described in Method 1.

Library Preparation and Sequencing
All the samples mNGS testing were based on MinION platform (Oxford Nanopore Technology (ONT)). The samples included in this data set were processed and sequenced regardless of microbial DNA concentration to provide an accurate representation of the data that would likely be obtained from metagenomic analysis of urine in clinical settings. Library preparation was performed using the PCR Barcoding Kit (SQK-PBK004, ONT) according to the manufacturer's instruction, with 2-min extension and 15 cycles in the PCR amplification step. Up to six barcoded samples were loaded per flow cell for each sequencing run. Full details regarding library preparation are provided in Supplementary Methods.
Nanopore sequencing was performed using R9.4.1 flow cells (FLO-MIN106) on MinION. A total of 75 µl of library DNA was loaded into the flow cell according to the manufacturer's instructions. ONT MinKNOW GUI software (version 4.2.8) was used to collect raw sequencing data.

Bioinformatic Analysis
The raw sequencing data were processed using our automatic bioinformatics pipeline composed of a set of fixed external software (ont-Guppy, bwa, SAMtools, BLASTn). The processing step consists of (1) trimming adapters using ont-Guppy; (2) subtraction human host sequences mapped to the human reference genome (GRCh38, https://www.ncbi.nlm.nih.gov/datahub/assembly/GCF_000001405.39/) using Burrows-Wheeler alignment with BWA-MEM algorithm; (3) output SAM file was indexed and sorted with SAMtools (version 1.7) to generate nonhuman reads; (4) all the nonhuman reads were classified by simultaneous alignment to RefSeq microbial genome databases (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq) consisting of viruses, bacteria, fungi, and parasites using BLASTn (version 2.10.1); (5) species classification result was finally outputted as.csv file after processing by two custom Python scripts and Linux commands. The automatic bioinformatics pipeline is available at https://github. com/gitzl222/APDNS/.

Statistical Analysis
All statistical analyses were performed using GraphPad Prism version 8.4. Normality was tested for all datasets using the D'Agostino Pearson omnibus normality test, and correlation was analyzed using Pearson correlation. All data were logtransformed and further analyzed using Kruskal-Wallis test or the two-tailed paired t-test as appropriate to calculate the statistical significance between the methods. A P-value less than 0.05 was considered as statistically significant.

DNA Yield and Integrity
The purpose of this study was to select and statistically validate an optimal method for the microbial DNA extraction to be applied in long-read mNGS-based pathogen diagnosis of clinical samples. Three extraction methods were compared using clinical urine samples, and their DNA yield, integrity, and the specific species abundance were used as screening criteria to determine the best method. For differences analysis of DNA yield and integrity between the three DNA extraction methods, we counted the DNA concentrations and the average length of microbial reads and found that they varied a lot not only between samples of each method but also between different DNA extraction methods (Figure 2). We further investigated the statistical difference among them.
For DNA yield, we found that DNA concentrations extracted by mechanical-based method are significantly lower than the control method (P < 0.0001, Figure 3A), whereas the enzymaticbased method showed no significant differences with control method (P > 0.05, Figure 3A). This result indicated that enzymatic-based method has no extra effect, but mechanical lysis with bead-beating has negative effects on DNA yield.
For DNA integrity, we found that the average length of microbial reads generated by enzymatic-based method was significantly longer than the control method (P < 0.0001, Figure 3B) and mechanical-based method (P < 0.01, Figure 3B), increased by a median of 2.1-fold (IQR, 1.7-2.5; maximum, 4.8) in 18 of the 20 samples and 1.9-fold (IQR, 1.4-2.3; maximum, 5.0) in 16 of the 20 samples (Table 1), respectively.

Abundance Variation for Specific Species
To determine adaption of these three DNA extraction methods for mNGS-based pathogen diagnosis, we counted the consistency between results of culture and mNGS and found that the enzymatic-based method provided a fully consistent result while the other two methods gave 15 of 20 and 14 of 20, respectively ( Table 2). To evaluate the abundance variation of specific species, we next calculated the mapped reads number and proportion of the specific species that can be identified by culture ( Table 2). By calculating the sequencing depth of all three DNA extraction methods, we found that the total number of reads generated by MinION of each sample showed no significant difference among all the three DNA extraction methods (all P > 0.05, Figure 3C). On the basis of the same depth of sequencing, the enzymatic-based method increases the mapped reads proportion of specific species by a median of 11.8fold (IQR, 6.9-32.2; maximum, 79.27; Figure 3D) in 14 of the 20 samples compared with the control method, except one from a gram-negative bacteria E. coli infection (P19). In particular, the remaining five samples (P3, P5, P7, P10, and P14) were found of significant increase in reads number of specific species from "no reads detected" to "large number of reads detected". The mechanical-based method showed a decreased proportion of mapped reads in most (9 of 15) samples although no significant difference observed compared with the control method. Furthermore, the five samples with no reads of specific species detected in the control method were also not detected any targeted reads in mechanical-based DNA extraction method. Finally, sample P1, which could be correctly detected by control method, was not detected by mechanical-based method, indicating that this method may lead to loss of microbial DNA sequences.

Impact of DNA Extraction Method on Microbial Diversity Composition
To evaluate the impact of DNA extraction method on microbial diversity composition, we quantified the relative abundance of microbial taxa per sample based on nanopore sequencing. We first compared the total number of microbial species for each of the three DNA extraction methods ( Figure 4A). The total number of microbial species was normalized by the total number of reads per sample and made pairwise comparison across the three DNA extraction methods (see Supplementary  Table S1 for raw data). Enzymatic-based method observed more microbial species in urine samples than the control method (P < 0.05), whereas the other two methods gave no significant difference of microbial species diversity (P > 0.05). We further evaluated the microbial diversity variation by the alpha and beta diversities. Alpha diversity by Shannon index indicated that significant increase of microbial diversity was observed in enzymatic-based method compared with the other two methods (all P < 0.05, Figure 4B). The beta diversity with principal coordinate analysis (PCoA) was based on the Bray-Curtis dissimilarity, and the PERMANOVA test showed a significant difference of the microbial composition among these three methods (P < 0.05, Figure 4C). For evaluating the microbial DNA extraction efficiency ratio of the three methods, we compared the proportion of total microbial reads per sample for each method (see Supplementary Table S1 for raw data). Similarly, enzymatic-based method increased the microbial proportion by a median of 9.2-fold (IQR, 3.1-26.0; maximum, 69.0; Figure 5A) compared with control group, whereas the mechanical-based method had a median of 0.9-fold (IQR, 0.6-1.2; maximum, 3.3; Figure 5A). In addition, compared with the mechanical-based method, enzymatic-based method increased the microbial proportion by a median of 11.9-fold (IQR, 3.3-22.1; maximum, 74.5; Figure 5A). To assess which types of species were most impacted by the extraction methods, we investigated the distribution and relative abundance of the most common species (Figures 5, 6). We found that gram-positives had a visible variation and fungi species had a significant variation (P < 0.0001, Figure 5B) in relative abundance across methods, whereas the variation in gram-negatives abundance was not obvious. These results are in line with previous observations that gram-positive bacteria and fungi are more likely to be affected by DNA extraction methods (McOrist et al., 2002;Santiago et al., 2014;Ackerman et al., 2019). In addition, these results also showed low bacterial abundance in samples from fungal infected patients. Likewise, in samples from bacteria-infected patients (P18 and P19), the fungal abundance was low. We further compared the microbial relative abundance of these three methods using Kruskal-Wallis test and found that significant difference existed between each other (all P < 0.0001).

DISCUSSION
In metagenomic sequencing studies, variations in the DNA extraction protocol can have important downstream effects on the observed microbial composition. Maximizing DNA concentration while also minimizing fragmentation are key aspects to consider when selecting an extraction method. This is both because high-quality libraries are required for sequencing, and protocols that consistently recover lowyield or highly fragmented DNA are likely to skew the observed community composition (Costea et al., 2017). The emergence of new long-read sequencing techniques such as nanopore has raised the bar for DNA quality and extraction methods. However, there is a paucity of studies to evaluate performance of different DNA extraction methods for longread mNGS-based pathogen diagnostic testing. In this study, for selecting a best suited DNA extraction method to support pathogen diagnosis based on long-read mNGS testing, we systematically compared three DNA extraction methods by assessing DNA yield, integrity, and the microbial diversity based on metagenomic nanopore sequencing of urine samples from patients with UTI.
Among the three methods, the DNA concentration of the mechanical-based method was significantly lower compared with the other methods, which may be result from the loss of excessive short DNA sequences during DNA purification on silica membrane (Dilley et al., 2021). However, we performed analysis of correlation to compare DNA yield and microbial proportion within each DNA extraction method, and there were no correlations observed (all P > 0.1) as host DNA accounts for a large proportion in urine samples. These results are in line with published literature (Yuan et al., 2012). Therefore, DNA yield alone appears to be an unrepresentative measure for extraction efficiency because microbial DNA accounts for a little proportion of total DNA in urine samples (Salonen et al., 2010).
For the comparison of microbial reads length generated by mNGS testing, although there was no significant difference observed between mechanical-based method and control method, enzymatic-based method generated much longer-read length, indicating that the long DNA sequences had been released to a greater extent after the enzymatic cell lysis and resulting in outputted DNA with high integrity. Hence, these results proved the better compatibility of enzymatic-based method to the longread sequencing technologies. Unusually, this result also seems to show that mechanical-based method did not make excessive shearing of DNA. However, it can also be interpreted with the preference of the silica membrane to capture longer DNA sequences (Doran and Foran, 2014;Dilley et al., 2021), which is also in line with the result of lower DNA yield of mechanical-based method above.
Pathogen diagnosis using MinION-based mNGS testing is common in recent studies (Schmidt et al., 2017;Charalampous et al., 2019;Moon et al., 2019). Unbiased cell lysis and complete DNA extraction of all microbial pathogens are crucial to recover specific pathogenic species accurately in mNGS testing, as reads derived from the normal microbiota in human may influence pathogens identification (Gu et al., 2019). Among the 20 urine samples with microbial infection, enzymatic-based mNGS testing provided a fully consistent result with pathogen detected by culture. In contrast, control method and mechanical-based method missed detection of five and six samples, respectively. All the missed pathogens were fungal pathogens, indicating that some fungal cells are more difficult to lyse, again, in line with the result from a previous study (Ackerman et al., 2019). In addition, on the basis of the detection result of control method and enzymatic method, the cell lysis treatment in advance is necessary and effective for DNA extraction of clinical samples with unknown infectious agents when using sequence-based detection methods, although it increases the turnround time of DNA extraction.
The microorganisms that colonize various anatomical sites of the human body play important roles in human health and disease (Dethlefsen et al., 2007), it is critical to understand the urinary microbiome comprehensively and accurately to develop novel therapies for UTI (Xiao et al., 2016;Neugent et al., 2020). Enzymatic-based method provided the largest normalized species number and the microbial proportion among the three methods; especially for the recovery of fungi and gram-positive microbiota, the enzymatic-based method obtained a most high abundance, indicating that it can generate a more representative microbial diversity composition from urine samples. These results proved that the enzymatic-based method can serve as an unbiased and reliable procedure for DNA extraction in the future sequence-based metagenomic analyses.
This study presents some limitations. First, it is difficult to assess which DNA extraction method came closest to the biological truth for absence of parallel evaluation of DNA extraction methods with a mock community. Second, we did not investigate any additional factors that may affect the metagenomic results, such as that of reagent and laboratory contamination (Salter et al., 2014).
In conclusion, we proved excellent performance of enzymatic-based method for long-read mNGS testing through systematically comparing three DNA extraction methods. We anticipate that procedures for DNA extraction will likely further improve in the future and propose that using a combination of lytic enzyme solution and MetaPolyzyme for effective lysis of a range of microbes, including both fungi and bacteria with minimal shearing. Although we have only proved the advantage of enzymatic-based DNA extraction method on urine samples, this can probably be extended to other samples such as stool and bronchoalveolar lavage fluid. By combining reliable organism lysis, unbiased sequencing, and comprehensive reference databases, long-read mNGS testing can be applied in real clinical practice for hypothesis-free and universal pathogen detection, promising to improve diagnostic accuracy of all microbiological infections.

DATA AVAILABILITY STATEMENT
The datasets presented in this study can be found in online repositories. The names of the repository/repositories and accession number(s) can be found in the article/Supplementary Material.

ETHICS STATEMENT
The studies involving human participants were reviewed and approved by the Institutional Review Board (IRB) of the Beijing Dongfang Hospital (reference no. JDF-IRB-2020003101). The patients/participants provided their written informed consent to participate in this study.

AUTHOR CONTRIBUTIONS
WH, PL, YJ, and LZ conceived and designed the study. LZ and TC carried out the experimental work and analyzed the data. LZ, TC, and YW conceptualized the experimental methods, performed bioinformatics, and wrote the original draft of the manuscript. PL, WH, and YJ participated in the review and editing of the manuscript. All authors contributed to the article and approved the submitted version.

FUNDING
The project was financed by the National Natural Science Foundation of China (82002115).