The effect of low-abundance OTU filtering methods on the reliability and variability of microbial composition assessed by 16S rRNA amplicon sequencing

PCR amplicon sequencing may lead to detection of spurious operational taxonomic units (OTUs), inflating estimates of gut microbial diversity. There is no consensus in the analytical approach as to what filtering methods should be applied to remove low-abundance OTUs; moreover, few studies have investigated the reliability of OTU detection within replicates. Here, we investigated the reliability of OTU detection (% agreement in detecting OTU in triplicates) and accuracy of their quantification (assessed by coefficient of variation (CV)) in human stool specimens. Stool samples were collected from 12 participants 22–55 years old. We applied several methods for filtering low-abundance OTUs and determined their impact on alpha-diversity and beta-diversity metrics. The reliability of OTU detection without any filtering was only 44.1% (SE=0.9) but increased after filtering low-abundance OTUs. After filtering OTUs with <0.1% abundance in the dataset, the reliability increased to 87.7% (SE=0.6) but at the expense of removing 6.97% reads from the dataset. When filtering was based on individual sample, the reliability increased to 73.1% after filtering OTUs with <10 copies while removing only 1.12% of reads. High abundance OTUs (>10 copies in sample) had lower CV, indicating better accuracy of quantification than low-abundance OTUs. Excluding very low-abundance OTUs had a significant impact on alpha-diversity metrics sensitive to the presence of rare species (observed OTUs, Chao1) but had little impact on relative abundance of major phyla and families and alpha-diversity metrics accounting for both richness and evenness (Shannon, Inverse Simpson). To increase the reliability of microbial composition, we advise removing OTUs with <10 copies in individual samples, particularly in studies where only one subsample per specimen is available for analysis.


Introduction
New technological advancements and computational methods have enabled investigations of microbial communities that are no longer limited to bacterial culturing methods, leading to the discovery of hundreds of new bacterial species in the human gut.Importantly, these studies highlight the significant role that the gut microbiome plays in both heath and disease.However, methodological variations in sample collection protocols, processing, and analytics make the reproducibility of findings across multiple studies or meta-analyses challenging (Lozupone et al., 2013;Goodrich et al., 2014).Currently, there is no consensus regarding best practices for human stool collection and analysis, leading to concerns regarding the reliability and reproducibility of these datasets.Several factors including DNA extraction methods, sequencing technologies, and analytical approaches are known to have a significant impact on the characterization of the stool microbiota (Lauber et al., 2010;Carroll et al., 2012;Goodrich et al., 2014;Sinha et al., 2015;Chiu and Chao, 2016;Sinha et al., 2017;Vogtmann et al., 2017;Antosca et al., 2020;Bartolomaeus et al., 2020), underscoring the importance of using standardized protocols to minimize bias.
In this study, we focused on analytical approaches to deal with low-abundance operational taxonomic units (OTUs) identified through 16S rRNA amplicon sequencing.These low-abundance OTUs, often thought to be spurious, can account for up to 50% of the detected OTUs in a sample, thereby skewing microbial diversity metrics (Eckburg et al., 2005;Gorzelak et al., 2015).The few studies that have examined the reliability (defined as % agreement in detecting OTU in replicates) of microbial composition in subsamples of the same fecal specimens showed that high abundance taxa are more reproducible compared to lowabundance taxa that are only sporadically detected within replicates (Wu et al., 2010;Gorzelak et al., 2015).Given that approximately 80% of bacterial species found in stool correspond to uncultivable species (Eckburg et al., 2005), estimating the microbiota composition relies on sequencing approaches.This is further compounded by PCR, which is used to generate amplicons and can lead to the overestimations of spurious OTUs, further inflating estimates of microbial diversity (Pienaar et al., 2006;Lahr and Katz, 2009;Goodrich et al., 2014;Potapov and Ong, 2017;Ma et al., 2019).One suggested approach to overcome these concerns is to sequence several replicates of the same specimen.While this approach can significantly improve the reliability of OTU detection (Zhou et al., 2008;Zhou et al., 2011), it is often not feasible particularly in studies with large numbers of subjects.
The aim of this study was to test if analyzing a single sample (oneplicate) per individual provides a reliable assessment of microbial composition.Reliability and validity are critical for advancing the use of human microbiome indicators in future human microbiome research and clinical trials.We investigated the reliability and variability of OTU detection within triplicates of the same human stool specimens.We applied several methods for filtering the low-abundance OTUs and determined their impact on alpha-and beta-diversity metrics.We provide an overview of comparative results and recommendations for data analysis that may improve the reliability and reproducibility of microbial composition estimates across studies.

Study participants and sample collection
The study was approved by the Health Sciences Institutional Review Board of the University of Wisconsin-Madison (#2016-0251).Recruitment and stool collection are described elsewhere (Holzhausen et al., 2021).Briefly, the study population consisted of 12 volunteers with a mean age of 35.4 years (SE = 3.1).Each stool specimen was aliquoted into three subsamples (triplicate) and frozen at −80°C within 30 min of stool production.

DNA extraction, PCR, and sequencing
DNA extraction from stool samples has been previously described in detail (Eggers et al., 2018;Eggers et al., 2019;Kates et al., 2020;Holzhausen et al., 2021).Briefly, DNA was extracted from lysed cells using phenol:chloroform:isoamylalcohol followed by isopropanol precipitation in the presence of sodium acetate.DNA was purified using a NucleoSpin Gel and PCR Clean-up Midi kit (Takara Bio USA, Inc., Mountain View, CA) as previously described (Kozich et al., 2013).Amplicons were generated from 25-50 ng of gDNA via PCR with primers (F-GTGCCAGCMGCCGCGGTAA; R-GGACTACHVGGGTWTCTAAT) targeting the V4 region (Kozich et al., 2013) along with sequencing adapters and barcodes to differentiate samples of the bacterial 16S rRNA gene.Amplicons were then subjected to gel electrophoresis on a 1% low melt agarose gel containing SYBR Safe DNA Gel Stain (Invitrogen).Bands of 380 bp were excised and purified using a Zymoclean DNA recovery kit (Zymo Research, Irvine, CA).DNA was sequenced on an Illumina MiSeq using a v2 2x250 bp paired-end sequencing kit (Illumina, San Diego, CA), with a final library concentration of 10 pmol/l and a 10% PhiX Control.

Data processing
Raw sequencing data were processed in mothur (version 1.43.0)following a Standard Operating Procedure for MiSeq data (Kozich et al., 2013).Briefly, contigs (overlapping sequences) were aligned to the SILVA (v132) database, reads of incorrect length were removed, and chimeras (determined by UCHIME) and undesirable reads (e.g., Archea, Eukaryota, chloroplasts, mitochondria, and unknowns) were removed.Sequences were assigned to OTUs with a threshold of 97% similarity using the GreenGenes (version gg_13_8_99) database (DeSantis et al., 2006;Edgar et al., 2011;Quast et al., 2013).Fastq files were submitted to NCBI's Short Read Archive and are publicly available under accession number PRJNA962543.

Statistical analysis
Alpha-diversity was estimated using Observed OTUs, Chao 1, Shannon, and Inverse Simpson metrics (Shannon, 1948;Simpson, 1949;Chao, 1984), which were calculated using the Phyloseq package in R (McMurdie and Holmes, 2013).The precision of OTU quantification was analyzed by coefficient of variation (CV), which is calculated as the ratio of the standard deviation to the mean.A higher CV indicates greater dispersion in the variable.Means were calculated from copy counts in triplicate for each OTU.The effect of different filtering methods on relative sequence abundance was assessed by t-test followed by Mann-Whitney rank sum test.Differences were considered statistically significant at p < 0.05.Beta-diversity was estimated using Bray-Curtis dissimilarity, using the vegan package in R, and PERMANOVA tests were used to calculate proportion of variability explained by individual.

Microbial composition of samples
Among the triplicate samples from 12 individuals, sequencing of the V4 region of the 16S rRNA gene resulted in 1,853,072 total raw reads or an average of 51,474 reads per sample.After filtering chimeras and removing low-quality reads and sequences of incorrect length, there were a total of 40,572 (SD = 10,883) reads per sample (range, 24,776-74,720).These included 3,761 unique OTUs, 80% of which were annotated at the genus level.

Reliability and variability of OTU detection in sample replicates
A coefficient of variation (CV) was calculated for each OTU analyzed within triplicates of the same specimen to estimate the variability of their quantification (Figure 1B); in general, a lower CV indicates less dispersion and variability of quantification (Reed et al., 2002;Hanneman et al., 2011).We found that the CV for OTUs with >1,000 sequences in each replicate was <0.279; for 100-1,000 sequences, 0.273; for 10-100 sequences, 0.332; for 1-10 copies, 0.542; and for OTUs that were present in only one or two of the triplicates with <10 copies, 1.44.
We also found that OTUs with abundances >1%, having on average >371 sequences, were detected with 100% reliability (% OTUs detected in all three replicates) with good quantification accuracy (Figure 1C).OTUs with abundances between 0.1% and 1%, which had 37-371 sequences, were also detected with high reliability (99.5%) and accuracy.In contrast, reliability of detection were decreased (84%), while the variability in quantification increased for lower abundance OTUs detected between 0.01% and 0.1%, with an average number of sequences between 4 and 37.However, for OTUs with <0.01%abundance, which have <4 sequences on average, there was a lesser reliability (22%), and the variability in quantification was high (Figure 1B).

A B C
The distribution of OTUs and reliability of detection.

The effect of filtering methods on reliability of OTU detection
After assessing variability in the unfiltered sample, we then applied different approaches to filter low-abundance OTUs in order to determine which filtering method provided the best reliability of microbiome composition with respect to alpha-diversity metrics.The cutoff criteria for filtering OTUs were based on the OTU abundance: 1) in the whole dataset (sequences from all samples pooled together); 2) within an individual sample (each replicate was treated as an individual sample); or 3) within a triplicate (Table 1).Based on these cutoffs, removing low-abundance OTUs significantly increased the proportion of OTUs reliably detected across replicates from 44% (no OTUs removed) to up to 100% (OTUs filtered out if not present in all three replicates).Although low-abundance OTUs account for the largest proportion of unique OTUs, they represented only a small proportion of the sequences in the sample.Filtering methods excluded between 0.21% and 6.97% of the reads from the analysis (Table 1).

The effect of filtering methods on alpha-diversity metrics
Filtering low-abundance OTUs resulted in the removal of between 29% and 76% of the OTUs from the analysis with significant impact on alpha-diversity metrics (Table 2).As expected, the Chao1 estimator, which accounts for both the presence and abundance of OTUs and is known to be sensitive to communities with many species with low abundance (Kim et al., 2017), was significantly affected by all filtering methods.Other alpha-diversity metrics including Shannon's and Inverse Simpson's, which account for both richness and evenness, were less affected by low-abundance OTU removal.Filtering OTUs with <0.1% abundance in the dataset had the most significant impact on all alpha-diversity metrics.Most filtering methods did not have significant effects on the relative abundance of the five major phyla (Table 3); however, excluding OTUs with <0.1% abundance in the dataset significantly impacted phyla with smaller relative abundances, such as Actinobacteria and Proteobacteria.

The effect of filtering methods on beta-diversity
Filtering low-abundance OTUs resulted in tighter clustering between points (reduced BC dissimilarity) compared to the unfiltered dataset (Figure 2).As expected, filtering based on overall relative abundance <0.1% (Figures 2B, H) resulted in the most dramatic reduction in BC dissimilarity between samples.Despite tighter clustering, PERMANOVA analyses revealed that the proportion of variability attributable to an individual (i.e., R 2 ) was 0.94 (p = 0.001), regardless of the filtering method.This suggests that all filtering methods were able to reduce variation in overall microbiome signatures, without losing important biological signals such as the individual providing the sample.

Discussion
While gut microbiome research has significantly expanded over the last decade, there are no broadly accepted consensus protocols that would ensure the reproducibility and reliability of study outcomes.The Microbiome Quality Control (MBQC) project and  (Goodrich et al., 2014;Sinha et al., 2015;Sinha et al., 2017).The estimates of microbiome composition that rely on PCR and sequencing technologies are sources of methodological artifacts that may lead to the detection of spurious OTUs (Acinas et al., 2005;Sipos et al., 2007;Aird et al., 2011;Goodrich et al., 2014) and thus complicate downstream analyses given their overwhelming contributions to the diversity of a sample (Ley et al., 2008;Goodrich et al., 2014).However, to distinguish spurious and rare OTUs is challenging, especially in situations where only one subsample per subject is analyzed.To circumvent this problem, low-abundance OTUs are often excluded from the analysis.However, filtering approaches vary among studies, and there is no agreed-upon consensus as to what filtering threshold should be employed.
To address the methodological question of how best to filter and the implications of filtering in study outcome measures, we sequenced three replicates from 12 human specimens to determine the reliability of OTU detection.Only 44% of the OTUs were shared by all three replicates, and over 40% of the OTUs were found only in one of the replicates.OTUs that were detected sporadically within triplicates had very low abundances, usually with <10 copies, and had low accuracy of quantification, as  assessed by CV.In contrast, OTUs with >0.1% abundance within a sample were reliably detected with good accuracy of quantification.Overall, our data indicate that OTUs with lesser abundance had greater CVs, suggesting lower detection accuracy.The CV is commonly used to determine intra-and inter-assay variability (Reed et al., 2002) to assess pipetting techniques, user effect, batch effect, and lab to lab variability.In such studies, a CV < 0.1 for intraassay variability and CV <0.15 for inter-assay variability is considered excellent.Although the same scale may not be applicable to evaluate the accuracy of OTU detection by 16S sequencing because the sources of quantification errors are different, the increasing CV values with decreasing OTU abundance provides better understanding of limitation of lowabundance OTU detection by 16S sequencing.Amplicon sequencing of soil samples found even smaller OTU overlap in three replicates (8.2%), suggesting low reproducibility of amplicon-sequencing-based microbiome composition (Zhou et al., 2011).Others have attributed such low reproducibility to several factors, one of them being non-representative subsampling (Zhou et al., 2008).Beyond analytic variability, some variability may also be due to inadequate mixing of fecal sample before taking subsamples for DNA extraction that are not representative and would disproportionally affect rare taxa.
In general, the approaches we used to filter low-abundance OTUs improved the reliability of microbiota composition estimates, but there are several factors to consider when choosing the appropriate filtering methods.We found that methods based on OTU abundance in the whole dataset (e.g., pooling all sequences from all samples) provided good reliability (% OTUs detected in all three replicates), but they are affected by study sample size, and thus, the threshold for OTU removal may differ among studies.For example, when filtering OTUs with <0.1% abundance, the threshold is 10,000 copies for a dataset with 10,000,000 sequences but only 1,000 copies in smaller dataset with 1,000,000 sequences.Therefore, the microbiota composition estimates of the same sample will be different in different datasets.
Removing OTUs based on the number of copies within individual samples also improved the reliability of replicates while retaining higher proportions of rare OTUs.This method is independent of sample size and would thus be more consistent from study to study.Our data suggest that OTUs with <10 copies in the samples are less reliably and accurately detected and are randomly distributed within triplicates.Therefore, it is reasonable to remove these OTUs from analyses.
Filtering methods significantly affected beta-diversity.Removing OTUs based on the copy number in individual samples retained relatively larger beta-diversity compared to filtering by relative abundance in the dataset.The distance and relative position in the Bray-Curtis dissimilarity plot among individuals are dependent on the filtering method, further underlying the significance of analytical approach, which may contribute to low reproducibility and reliability of gut microglial studies.Although the most optimal filtering method would depend on overall study goals, based on our analysis, we would recommend censoring OTUs with <10 copies in individual samples during analysis of 16S rRNA gene sequencing.However, the best reliability of microbiota composition estimates is achieved when at least two replicates of the same specimen are analyzed; this approach would be recommended when low-abundance OTUs are important to consider.
The study has several strengths in that it analyzes repeated human microbiome samples and considers numerous ways of filtering and comparison of results.While sample size could be considered a limitation of this study, the analysis of 12 individuals with three replicates provided sample size with sufficient statistical power for the analyses.While many consider whole genome sequencing (WGS) the gold standard alternative to 16s sequencing, WGS has other analytic challenges and can be very cost prohibitive.The use of 16s amplicon sequencing is not suitable to annotate taxa at the species level, which is a limitation.However, it is a cost-effective method and will likely remain to be widely used to annotate the microbiome.

Conclusions
The aim of the study was to evaluate the reliability of OTU detection and to assess the impact of different filtering methods on microbiome alpha-and beta-diversity and composition.To increase the reliability of microbial composition, we advise removing OTUs with <10 copies in individual samples, particularly in studies where only one subsample per specimen is available for analysis.Excluding very low-abundance OTUs has a significant impact on alphadiversity metrics sensitive to the presence of rare species but had little impact on relative abundance of major phyla and families and alpha-diversity metrics accounting for both richness and evenness.
was supported by a Eunice Kennedy Shriver National Institute of Health and Human Development grant to the Center for Demography and Ecology at the University of Wisconsin-Madison (T32 HD007014).KM's time was supported by R01 AG061080 and a National Institute on Aging grant to the Center of Demography of Health and Aging at the University of Wisconsin-Madison (P30 AG17266).
(A)  The average distribution of OTUs based on the relative sequence abundances within samples.The pie chart shows average number of OTUs in each relative abundance category (blue, >1%; orange, 0.1-1%; gray, 0.01<0.1%;yellow, <0.01%) and proportion (%) of sequences representing each relative abundance category On average, 19 OTUs with >1% relative abundance comprised 80% of all sequences within sample; 48 OTUs with 0.1%-1% relative abundance comprised 15.5% of sequences.The majority of OTUs (194) with <0.1% relative abundance (gray, yellow) comprised 4.5% of sequences.(B) Coefficient of variation (CV) calculated based on the number of copies for each OTU.(C) The reliability and accuracy of OTU detection.OTUs with higher relative sequence abundance were more reliably and accurately detected within replicates than lower abundance OTUs.Nikodemova et al. 10.3389/fcimb.2023.1165295Frontiers in Cellular and Infection Microbiology frontiersin.org FIGURE 2The beta-diversity, based on Bray-Curtis dissimilarity for various filtering methods, including (A) no filtering, all OTUs included, (B) OTUs with relative abundance <0.1% removed, (C) OTUs with >10 copies in at least one sample in the dataset removed, (D) OTUs with one copy filtered from individual replicates, (E) OTUs with <10 copies filtered from individual replicates, (F) OTUs with <10 copies in at least one of three replicates filtered from all replicates, (G) OTUs not present in all three replicates filtered, and (H) a zoomed-in plot (B), where OTUs with relative abundance <0.1% were filtered.Each color represents an individual; three dots of the same color represent three replicates from the individual.

TABLE 1
The effect of filtering method on the reliability of OTU detection within triplicates.
*OTUs fulfilling the filtering criteria were removed from all samples.**OTUs were removed only from samples fulfilling the filtering criteria.

TABLE 3
The effects of filtering method on the relative sequence abundance of the top 5 phyla.

TABLE 2
The effect of filtering method on different a-diversity metrics.p<0.05 vs All OTUs included, **p<0.001vs All OTUs included. *