Comparison of Mothur and QIIME for the Analysis of Rumen Microbiota Composition Based on 16S rRNA Amplicon Sequences

López-García, Adrian; Pineda-Quiroga, Carolina; Atxaerandio, Raquel; Pérez, Adrian; Hernández, Itziar; García-Rodríguez, Aser; González-Recio, Oscar

doi:10.3389/fmicb.2018.03010

ORIGINAL RESEARCH article

Front. Microbiol., 13 December 2018

Sec. Systems Microbiology

Volume 9 - 2018 | https://doi.org/10.3389/fmicb.2018.03010

Comparison of Mothur and QIIME for the Analysis of Rumen Microbiota Composition Based on 16S rRNA Amplicon Sequences

1. Departamento de Mejora Genética Animal, Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria, Madrid, Spain
2. Departamento de Producción Animal, NEIKER-Tecnalia, Vitoria-Gasteiz, Spain
3. Departamento de Producción Agraria, Escuela Técnica Superior de Ingeniería Agronómica, Alimentaria y de Biosistemas, Universidad Politécnica de Madrid, Madrid, Spain

Abstract

Background: Microbiome studies need to analyze massive sequencing data, which requires the use of sophisticated bioinformatics pipelines. Up to date, several tools are available, although the literature is scarce on studies that compare the performance of different bioinformatics pipelines on rumen microbiota when 16S rRNA amplicons are analyzed. The impact of the pipeline on the outcome of the results is also unknown, mainly in terms of the output from studies using these tools as an intermediate phenotype (pseudophenotypes). This study compares two commonly used software (Quantitative Insights Into Microbial Ecology) (QIIME) and mothur, and two microbial gene data bases (GreenGenes and SILVA) for 16S rRNA gene analysis, using metagenome read data collected from rumen content of a cohort of dairy cows.

Results: We compared the relative abundance (RA) of the identified OTUs at the genus level. Both tools presented a high degree of agreement at identifying the most abundant genera: Bifidobacterium, Butyrivibrio, Methanobrevibacter, Prevotella, and Succiniclasticum (RA > 1%), regardless the database. There were no statistical differences between mothur and QIIME (P > 0.05) at estimating the overall RA of the most abundant (RA > 10%) genera, either using SILVA or GreenGenes. However, differences were found at RA < 10% (P < 0.05) when using GreenGenes as database, with mothur assigning OTUs to a larger number of genera and in larger RA for these less frequent microorganisms. With this database mothur resulted in larger richness (P < 0.05), more favorable rarefaction curves and a larger analytic sensitivity. These differences caused significant and relevant differences between tools at identifying the dissimilarity of microbiotas between pairs of animals. However, these differences were attenuated, but not erased, when SILVA was used as the reference database.

Conclusion: The findings showed that the SILVA database seemed a preferred reference dataset for classifying OTUs from rumen microbiota. If this database was used, both QIIME and mothur produced comparable richness and diversity, and also in the RA of most common rumen microbes. However, important differences were found for less common microorganisms which impacted on the beta diversity calculated between pipelines. This may have relevant implications at studying global rumen microbiota.

Introduction

Research on ruminal microbiota is becoming increasingly important in dairy cattle as the microbial communities and their genome expression are related to important traits as health condition (Zilber-Rosenberg and Rosenberg, 2008), feed enteric fermentation (Zhou et al., 2009, 2010), or methane emissions (Wallace et al., 2015; Kamke et al., 2016; Roehe et al., 2016). The differences in the microbiota composition have also been proposed as a predictor or proxy of the differences in complex traits and environmental phenotypes (Ross et al., 2013; Kamke et al., 2016). Improving these traits is relevant for farm profitability and sustainability (Basarab et al., 2013; Bell et al., 2013; González-Recio et al., 2014a). Further, there is increasing interest on inferring the host genetic influence on the microbiota composition (Goodrich et al., 2016; Roehe et al., 2016). Tools that accurately estimate the microbial composition are essential to associate microbiota to phenotype variability.

Advances in sequencing technologies allow for obtaining genomic information in a fast and affordable manner. Whole metagenome and rRNA amplicons sequencing provide useful information to characterize the microbial composition in a given environment. Metagenomic information from hypervariable regions in the 16S and 18S ribosomal RNA amplicons are so far preferred in microbiome research due to their lower cost and reasonable accuracy. The results of these kind of studies rely on computational tools that provide accurate characteristics from large data sets of DNA sequences from the community under investigation (Lindgreen et al., 2016). Several authors have reviewed the specifications of different bioinformatics tools to analyze 16S rRNA gene sequences (Lozupone et al., 2005; Nilakanta et al., 2014; Oulas et al., 2015). Among these tools, mothur (Schloss et al., 2009; Kozich et al., 2013) and Quantitative Insights Into Microbial Ecology (QIIME) (Caporaso et al., 2010) are currently two of the most used suits of tools to analyze sequencing information from rRNA amplicons. However, comparisons between these tools on real data sets are scarce. For instance, other authors performed a benchmark study in order to investigate the performance of several tools in terms of microbial taxonomy and function (Lindgreen et al., 2016). These authors applied the methods on synthetic whole sequence metagenomes, which aim to represent the complexities encountered in a non-specific environment. In that study, QIIME resulting on a high specificity at determining the genus level but low sensitivity, whereas mothur was not tested. A recent study evaluated QIIME and mothur in fecal samples collected from preterm infants, showing slight differences in terms of the effective number of genera, richness and relative abundance (RA) detected (Plummer and Twin, 2015). Up to the best of our knowledge, the performance of these tools has not been yet evaluated in aligning rumen metagenome samples to public amplicons databases. Rumen microbiota poses the difficulty that most species have not been yet isolated, and therefore gene data bases may lack of many of the species in the rumen.

The aim of this study was to compare the rumen microbiota composition resulted from two different software: mothur and QIIME, when aligned against GreenGenes (GG) or SILVA databases. The null hypothesis is that the software and data base used to determine the ruminal microbial composition do not impact the results and conclusion from rumen microbiota studies.

Results

This study used sequence data from the hypervariable region V4 of the 16S amplicon from ruminal content in 18 dairy cows. The libraries were generated by means of Nextera kit. The 250 bp paired-end sequencing reactions were performed on a MiSeq platform (Illumina, San Diego, CA, United States). The sequences were processed using the two softwares: QIIME package version 1.9.1 (Caporaso et al., 2010) and mothur version 1.39.5 (Kozich et al., 2013). The RA of 16S rRNA gene reads for each sample and bioinformatics tool was used to infer the taxonomical composition of the samples, taking into account the copy number of 16S genes calculated from each tool. Two reference panels were considered for this purpose: GreenGenes (GG) database (May 2013 version) and SILVA (release 132). The detailed pipeline from each software is shown in Figure 1.

FIGURE 1

After filtering and chimera removal, both tools used a similar number of sequences to cluster (P > 0.05), regardless the database used. In average, QIIME left 54,544 reads (SD = 9,041) per animal, whereas mothur worked with 53,790 reads per sample (SD = 7,709). However, mothur clustered these sequences in a larger number of OTUs regardless the database (Figure 2). Using QIIME with GG as reference database kept the lowest number of OTUs for classification.

FIGURE 2

Taxonomical Richness

The performance of each tool was evaluated by looking at the assignment of individual OTUs and the number of genera classified. The RA of genera in each sample was calculated after excluding those genera that appeared at RA < 0.1% across all samples.

Figure 3 shows the rarefaction curves from each tool. Mothur detected a larger number of OTUs (Figures 3A,B) and also of microbial taxa at the genus level (Figures 3C,D) (P < 0.01) in the samples than QIIME using both GG and SILVA databases. Opposite, QIIME classified a larger number of genera after filtering by RA > 0.1% (Table 1). Most of the additional genera encountered by mothur were in very low abundance.

FIGURE 3

Table 1

Reference database	Total genera assigned			Relative abundance
	QIIME	Mothur	Mothur and QIIME	QIIME	Mothur	Mothur and QIIME
GreenGenes	1	6	23	0.19 (-)¹	2.89 (9.67)	2.60 (8.30)
SILVA	13	3	52	0.28 (0.13)	1.90 (6.51)	1.79 (5.67)

Total of genera (N) and its average relative abundance (standard deviation within brackets) assigned by each tool (only by QIIME, only by mothur or by both).

Results from each reference data set are presented separately. ¹No standard deviation calculated with n = 1.

Classification

GreenGenes

On average, mothur clustered a significantly (P < 0.001) higher number of OTUs per sample than QIIME. In average per sample, QIIME could not assign 61% (SD = 2.7) of clustered OTUs to a known genus, considering known every genus not named as “unclassified,” “uncultured,” “ambiguous,” “unidentified,” “unknown,” or null, whereas mothur could not assign a larger proportion (67%, SD = 2.5) of OTUs. QIIME was less restrictive at assigning OTU to genus level (P < 0.001), which might be related with the higher initial number of OTUs clustered by mothur, as we mentioned before. With this database, mothur identified a total of 29 different genera appearing in more than one sample, whereas QIIME assigned 24. Twenty three of these genera were common to both pipelines. The former aligned sequence data to six additional known genera, although most of them appeared in an average RA lower than 0.5%. Three out of these six genera had low representativeness, appearing in less than four out of 18 samples. On the other hand, the only QIIME-exclusive genus, Bacillus, had a low average RA and low representativeness, appearing only in three samples. Table 1 shows the average RA of genera assigned by one or both tools, highlighting that reads that were assigned to a known genus by only one of the tools appeared in very low RA. Both tools were capable of assigning around 99% of reads to any known taxonomy rank belonging to either bacteria or archaea kingdoms.

A scatter plot of the RA estimated by each tool for each genus within sample are shown in Figure 4. A strong Pearson correlation (0.996; P < 0.001) was found between RA obtained from each tool. Although some small variability can be seen for some samples, there were not statistical differences in the overall RA between tools (P > 0.05). However, this disagreement was more evident for microorganisms at RA < 10%, for which significant differences were found between both tools (P < 0.05), and these differences were even higher at RA < 1% (P < 0.01), and the regression coefficient of RA from QIIME on RA from mothur differed from one, becoming even lower when subsetting the RA dataset (Table 2).

FIGURE 4

Table 2

Reference population	Coefficient	R²	Coefficient	R²
	GreenGenes		SILVA
All	1.10**	0.90**	1.14**	0.98**
Relative abundance < 10%	0.50**	0.31**	0.86**	0.84**
Relative abundance < 1%	-0.03**	0.05**	0.12**	0.14**

Regression (slope and regression coefficient estimates) of the relative abundance from QIIME on the relative abundance from mothur using GreenGenes or SILVA as reference data set.

^∗∗P < 0.001.

SILVA

Mothur also clustered a significantly higher number of OTUs in known taxa than QIIME (P < 0.001) when SILVA was the reference database even though mothur filtered out a larger number of reads, but the differences were more attenuated than using GG. These OTUs from mothur were nonetheless classified into a lower number of known taxa than using QIIME (Table 1). Both tools identified a total of 52 known genera. Mothur aligned sequence data to three additional exclusive known genera that appeared in more than 1 sample, and QIIME identified 13 genera that did not appeared in mothur. With SILVA as database, mothur could not assign an average of 36.1% per sample (SD = 1.37) of clustered OTUs to a known genus, but with QIIME only 9.1% (SD = 1.36) of OTUs were not assigned to known genera. Thus, mothur appeared to be much more restrictive (P < 0.001) at assigning OTUs to genus level when SILVA was used as the reference database.

Figure 5 shows a scatter plot of the RA estimated by each tool. As in the previous case for GG, a strong correlation (0.996; P < 0.001) was found between RA obtained from each tool. However, the regression coefficient of RA from QIIME on RA from mothur deviated from 1 at RAs < 10%, although smaller differences were observed compared to GG (Table 2). This suggests that mothur detects larger RA of microbes that are present in lower proportion in the rumen.

FIGURE 5

In summary, both tools were able to classify microorganism from the following genus: Prevotella, Butyrivibrio, Succiniclasticum, Methanobrevibacter, Treponema, Bifidobacterium, Pseudobutyrivibrio, Ruminococcus, Mogibacterium, Lachnospira, Acetobacter, Methanosphaera and Desulfovibrio, regardless the database. In addition, other microbes were not identified to genus level, but as members of Lachnospiraceae and Ruminococcaceae families, regardless the database. The OTUs unable to be classified by QIIME at the genus level were from the Paraprevotellaceae, Coriobacteriaceae, Prevotellaceae, and Succinivibrionaceae families when GG was used as the reference database and from the Christensenellaceae family when SILVA was used as the reference dataset. The OTUs that were not assigned to a genus level by mothur belonged to Enterobacteriaceae and Spirochaetaceae families using GG, and to Bacillaceae, Enterobacteriaceae, Erysipelotrichaceae, Family_XIII, Prevotellaceae, and Spirochaetaceae families using SILVA. Also, members from Bacteroidetes, Firmicutes, and Proteobacteria phyla were not assigned to family level when using mothur, regardless the database.

The genera that were identified exclusively by either mothur or QIIME are shown in Table 3. This table includes the reference database they were detected with, and previous studies reporting these microbes in rumen microbiota. Other genera were classified by both tools, but not for both databases. Among those with RA > 0.5% we found YRC22 and Clostridium when GG was the reference database, and Acetitomaculum, Saccharofermentans, Schwartzia, Candidatus_Saccharimonas and some groups from families Ruminococcaceae, Christensenellaceae, Rickenellaceae, Lachnospiraceae, and Prevotellaceae when SILVA was used. Five taxa were identified for any combination of tool and database that have not been reported in rumen so far: Eubacterium hallii group, Eubacterium_nodatum_group, Ruminococcaceae UCG-011, Ruminococcus gauvreanii group, and Prevotella P9.

Table 3

Genus	Function/activity in rumen	Reference database	Tool	Previous source(s)
p-75-a5	Detected in ruminal liquid fraction	GreenGenes	mothur	Jewell et al., 2015
SHD-231	Detected in rumen. Reduced in diets containing linseed diets	GreenGenes	mothur	de Carvalho et al., 2017
Lachnospira bacterium FD2005	Detected in rumen	SILVA	mothur	Azevedo et al., 2015.
Papillibacter	Detected in rumen. Cellulose- degrading bacteria	SILVA	mothur	Zhang et al., 2014
Ruminococcus	Cellulolytic bacteria	SILVA	QIIME	Wallace et al., 2015.
Bacillus	Amylolytic bacterium	GreenGenes, SILVA	QIIME	Gallo et al., 2016
Eubacterium_ cellulosolvens_group	Fibrolytic (Sika deer) Detected in sheep rumen	SILVA	QIIME	Li et al., 2013 Azevedo et al., 2015
Eubacterium_coprosta noligenes_group	Detected in rumen	SILVA	QIIME	Tong et al., 2018 Popova et al., 2017
Eubacterium_ ruminantium_group	Present in rumen with an appropriate balance of degradable protein and carbohydrates	SILVA	QIIME	Abdelmegeid et al., 2018
Eubacterium_ ventriosum_group	Present in forestomach (Alpacas and Sheep)	SILVA	QIIME	Abdelmegeid et al., 2018
Lachnospiraceae_ NK4A136_group	Detected in rumen	SILVA	QIIME	Azevedo et al., 2015
Roseburia	Adherent bacteria community involved in plant degradation	SILVA	QIIME	Huws et al., 2016

Genera identified exclusively by mothur or QIIME, their function or activity in the rumen (if known), the reference database it was identified from, and information source or reference.

Diversity

Beta-diversity was calculated to investigate the dissimilarity between rumen microbiotas within tool. Results clearly clustered by software at taxonomical levels of genus, family and Phylum, regardless the reference database used (Figure 6). This figure also shows that the dissimilarities between samples were larger between than within software at lower taxonomic levels (genus and family), whereas distances at the phylum level were similar between and within software.

FIGURE 6

Computation requirements were not reported here as they greatly depend on the computational strategy applied in terms of parallelization, available number of cores, and the bioinformatician’s creativity to design more efficient pipelines.

Discussion

The results of this study support previous research showing differences between bioinformatics tools analyzing 16S rRNA amplicons. The number of OTUs and the taxonomic classification resulted thereof was compared between mothur and QIIME.

The results herein show that the final number of taxa and their overall RAs are not statistically different between tools using SILVA as reference database, but beta-diversity between samples clustered together by software used. Mothur worked with a larger number of OTUs, and these were classified into a larger number of genera than by QIIME when GG was the reference database. Plummer and Twin (2015) showed larger richness (total number of different genera) using QIIME compared to mothur, using the same reference database for alignment, although they used human preterm gut samples. They also showed statistical differences between tools on the comparison for diversity within samples. Our results show that using more updated SILVA releases mitigated these differences in terms of richness and diversity, suggesting that not only the implemented pipeline/program strongly influences diversity results, but database should also be considered in microbiota analysis. Therefore, it must be pointed out that comparison between pipelines in terms of number of OTUs must be done within database, and in terms of number of OTUs remaining for classification. It is also worth to highlight that in the current study both pipelines utilize a reference database for chimera filtering as well as a differentiated OTU labeling, making the comparison for raw OTUs unfeasible. Analyzing OTUs instead of taxonomical levels might be of interest in some circumstances. The results at the taxa level showed differences between tools and databases, which may be extended at the more specific level of OTUs.

It must be pointed out that the objective of this study was not to determine what tool provides a more accurate picture of the true microbiota composition, since the latter is unknown in cultivated rumen samples. There is no gold standard microbiota with known composition as many of the microbes in the rumen cannot be isolated or have not been yet cultivated. Sequencing the 16S rRNA gene poses the limitation that closely related microbes can be indistinguishable as they harbor almost identical sequences at this amplicon, and the different tools handle these drawbacks differently. A favorable pipeline should maximize sensitivity with a minimum impair in specificity. According our results, we infer that mothur provides higher diversity than QIIME regardless the reference database. There were five taxonomy groups detected in our study that have not been reported in rumen microbiota before. The rumen microbiota is largely underrepresented in databases and most of them have not been cultured yet (Stewart et al., 2018). Therefore, we consider that these groups of microorganisms might be considered as new candidates, although it must be corroborated in future studies through deep sequencing analysis or culture isolation. If they are true positive, they may be potential candidates to create mock communities to challenge bioinformatics tools. Lindgreen et al. (2016) showed low sensitivity scores and an impaired prediction performance with QIIME using a benchmark metagenome. However, as noted by these authors, QIIME uses custom databases that only contain specific marker sequences such as 16S rRNA. Therefore, performance of QIIME in whole metagenome analyses cannot be extrapolated to 16S or 18S amplicons studies. In addition, mothur was not used in such a study.

There are two main differences between mothur and QIIME: the OTU clustering algorithm and the algorithm for taxonomic classification. The alignment and clustering processes differ between software, as well as the chimera detection. Mothur handles the taxonomic classification using a naïve Bayesian classifier using a pseudo-bootstrapping to generate a confidence score, which must be over 80% to assign a read to a given taxonomy (Wang et al., 2007). QIIME uses the usearch algorithm to find the closest match in a reference data base (Edgar, 2010), which has been reported problematic at identifying the closest reference because it is sensitive to the order of the reference sequences as they can be identical over the region being considered (Westcott and Schloss, 2015). Usearch shows a high level of sensitivity to detect reference sequences, however, the specificity of those matches was poor relative to the true best match. High error rates have been previously reported with GG, and could be substantially improved by randomizing the sequences (Westcott and Schloss, 2015). Further the poorly GreenGenes-aligned sequences artificially increases the distances between sequences, which may also impair the accuracy of the classification. QIIME uses a closed reference pick up strategy in a single step, which implies some difficulties at disentangling all dissimilarities with mothur. The way mothur is implemented here resembles a pseudo open-reference pick up, because there were a previous step of chimera filtering and a known reference database was used for classification. These arguments might explain the poorest performance of QIIME in our data set when GG was used as the reference data set.

This is a proof of principle analysis showing how the choice of bioinformatics pipeline and the reference data set can impact the analysis of 16S rRNA gene sequencing data from rumen microbiotas. Nonetheless, the bio-informatics tools could perform differently in samples from different sites as different body parts may host different taxonomic composition, making the algorithms more or less efficient at detecting the true composition.

In the light of the results obtained in this study we can conclude that the impact of the tool is relatively small in terms of richness as a more updated and comprehensive reference database is used. SILVA seems to be a preferred reference data set as a larger number of different genera were identified, and more consistent results were obtained between tools. SILVA is a more updated database, whereas GG has not been updated since 2013. However, differences were detected in terms of beta-diversity, and differences between pipelines were obtained for microbes in lower abundances, yet belonging to the core microbiome. In this sense, mothur showed larger sensitivity at detecting microorganisms that can potentially populate the cow rumen. This may be important, as differences in the RA of less frequent groups of microbes may be relevant. These differences affect the microbiota similarity between samples or individuals. In turn, this would affect the phenotypic variance of a complex trait explained by the microbiota using mixed models that accounted for the microbiota composition as a random effect with a covariance structure given by these similarities between samples. Performance of mixed model methodology under Best Linear Unbiased Prediction or any other Reproducing Kernel Hilbert Space scenario greatly depends on the structure of the covariance or kernel matrix used as reported in González-Recio et al. (2009, 2014b). Incorrect or improper microbiotas similarity matrices might bias the proportion of variance explained by the microbiota or genetic correlation estimates between host genome and its metagenome. Up to the best of our knowledge, there is no proof of concept to determine what tool provides a more suitable similarity matrix. The degree of aimed microbiota specificity may affect the choice of the pipeline. Mock communities that mimic the true composition of rumen microbiota are not yet available. This study also highlights the necessity to create benchmark samples with a known composition of cultivated ruminal microorganisms to evaluate different bioinformatics tools, as well as the convenience of including more rumen specific communities into the gene databases. In this sense, those mock samples could include the genera that have been detected by only one of the tools (Table 3). Moreover, It must be consider that the samples used in this study combined the four possible ruminal fractions and the RAs in the samples might differ from the true composition in the rumen. Nonetheless, this is not expected to affect the comparison between pipelines.

Materials and Methods

This study was carried out in accordance with Spanish Royal Decree 53/2013 for the protection of animals used for experimental and other scientific purposes. An ethics committee was not necessary in this case because it was conducted on pre-existing data from a previous trial based on routine management practices in commercial farms. Data used in this study were described in Gonzalez-Recio et al. (2018). In brief, samples were obtained from ruminal content from 18 cows from 2 breeds (10 Holstein and 8 Brown Swiss) allocated in the Fraisoro Farm School (Zizurkil, Gipuzkoa, Spain). Ruminal samples were collected from each dairy cow using a stomach tube connected to a mechanical pumping unit. About 100 ml of each ruminal extraction were placed into a container and were frozen immediately after the extraction and then stored at -20 ± 5°C until analysis. Samples were gradually thawed overnight at refrigeration (5 ± 3°C) and squeezed through four layers of sterile cheesecloth to separate solid (solids with a particle size smaller than the diameter of the sampling tube) from liquid digesta phases. This latter phase was subsequently separated into planktonic organisms and bacteria associated with the liquid fraction. The solid phase was separated in associated and adherent fractions. Fractionation procedures were carried out following the methodology described in Yu and Foster (2005). The four fractions were lyophilized and combined to obtain a unique sample with the four fractions represented proportionally (on dry matter basis).

After composition, DNA extraction was performed using the commercial Power Soil DNA Isolation kit (Mo Bio Laboratories, Inc., Carlsbad, CA, United States) following manufacturer’s instructions. The extracted DNA was subjected to paired-end Illumina sequencing of the V4 hypervariable region of the 16S rRNA. Universal bacterial 16S rRNA gene primers (515F: 5′-GTGCCAGCMGCCGCGGTAA-3′ and 806R: 5′-GGACTACHVHHHTWTCTAAT-3′ (Caporaso et al., 2011) were used to generate the bacterial amplicon libraries (expected amplicon size 250 bp). The libraries were generated by means of Nextera kit. The 250 bp paired-end sequencing reactions were performed on a MiSeq platform (Illumina, San Diego, CA, United States). Data are publicly available at http://www.ebi.ac.uk/ena/data/view/PRJEB26635.

Sequences were pre-processed using Trimmomatic tool (v 0.36) (Bolger et al., 2014). Sequences below 220 bp in length and average quality score below 30 on a window of 20 bases were discarded. In total, 3,261,168 reads were analyzed. The remaining sequence data were then processed using the two softwares: QIIME package version 1.9.1 (Caporaso et al., 2010) and mothur version 1.39.5 (Schloss et al., 2009; Kozich et al., 2013). In the case of QIIME, forward and backward reads were joined with join_paired_ends.py. Chimeras were identified and filtered using usearch method (Rognes et al., 2016). Finally, the tool was used to pick closed-reference OTUs from the GreenGenes database (May 2013 version) or SILVA database (Quast et al., 2013)¹ (release 132) and representative sequences with a 99% of similarity were kept. The pipeline for mothur also began by joining forward and backward reads. Chimeras and unique sequences were removed using UCHIME (Edgar et al., 2011). Sequences were then preclustered, and finally classified using the default method (naïve Bayesian classifier; Wang et al., 2007) on classify.seqs(), with the same cut-off for sequence identity and reference databases as above. OTUs were summarized at phylum, class, order, family, and genus. Phylogenetic groups with an abundance lower than 0.1% in all samples were excluded from the final analyses. The pipelines used can be found in a git-hub repository².

All statistical analyses were performed in R v3.5.1 (R Core Team, 2015). When Pearson correlation was calculated, the statistical significance was tested using the cor.test() command from the base package.

Filtering and Chimera Removal

Differences in the number of sequences left after chimera removal from each tool (mothur vs. QIIME) was computed using a least squared mean regression.

The linear model was:

where y was the vector of the number of reads left for each sample after filtering and chimera removal with either mothur or QIIME (n = 2 × 18), μ is the intercept, x is the incidence vector assigning each record to the corresponding tool (mothur vs. QIIME), β is the coefficient estimate, and e is the vector of residuals assumed to be independently and identically normally distributed. The level of significance was set to α = 0.05.

Richness and Relative Abundance

Differences between mothur vs. QIIME were computed using a simple generalized linear model. Sequence reads from each sample (n = 18) were analyzed with the mothur or QIIME pipelines, and using either SILVA or GG databases. The statistical analysis for the resulting richness and RAs were computed within database as follows:

where y was the vector of number of microbial taxa at the genus level (or their RA) assigned either with mothur or QIIME (n = 2 × 18) using GG, μ is the intercept, x is the incidence vector assigning each record to the corresponding tool (mothur vs. QIIME), β is the coefficient estimate, and e is the vector of residuals assumed to be independently and identically normally distributed. The level of significance was set to α = 0.05.

Further, the same statistical analysis was performed using the RAs obtained with SILVA as the reference data base.

Similarly, the number of unclassified reads from each tool within reference database were analyzed using the same model as above.

Dissimilarity Matrix and Principal Component Analyses

Non-metric multidimensional scaling (nMDS) was performed to explore the ruminal community structure, using the phyloseq package (v 1.24.2). The ordinate function was used to estimate dissimilarities using Bray–Curtis distances. Plot_ordination was used to plot these dissimilarities between mothur and QIIME pipelines with either SILVA or GreenGenes as the reference databases, grouping taxa by genus, family, and phylum levels.

Statements

Author contributions

AL-G, AP, and IH selected the steps in the pipelines for mothur and Qiime and analyzed the sequence files. AG-R, CP-Q, and RA executed the animal experiments, collected and analyzed the samples, discussed the results and helped to write the manuscript. AL-G and OG-R implemented the statistical analyses. OG-R designed the experiment and wrote the first draft of the manuscript. AL-G, CP-Q, RA, AG-R, and OG-R discuss the results. All authors read and approved the final manuscript.

Funding

Funding from grants INIA RTA2012-00065-C04, RTA2015-00022-C3-02, and LIFE SEEDCAPITAL-12 ENV/ES/590 is acknowledged.

Acknowledgments

The authors wish to thank Iurancha González for assistance during sampling and Ana Hurtado, Beatriz Oporto, and Medelin Ocejo for DNA preparations, as well as the staff from Escuela Agraria de Fraisoro (Basque Department of Environment, Rural Development, Agriculture and Fishery).

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Footnotes

1.^http://www.arb-silva.de

2.^https://github.com/alopgar/16S-analysis/tree/master/16S-mothur-and-QIIME-pipeline-comparison

References

1
AbdelmegeidM. K.ElolimyA. A.ZhouZ.LopreiatoV.McCannJ. C.LoorJ. J. (2018). Rumen-protected methionine during the peripartal period in dairy cows and its effects on abundance of major species of ruminal bacteria.J. Anim. Sci. Biotechnol.9:17. 10.1186/s40104-018-0230-8
2
AzevedoA. C.BentoC. B. P.RuizJ. C.QueirozM. V.MantovaniaH. C. (2015). Distribution and genetic diversity of bacteriocin gene clusters in rumen microbial genomes.Appl. Environ. Microbiol.207290–7304. 10.1128/AEM.01223-15
3
BasarabJ. A.BeaucheminK. A.BaronV. S.OminskiK. H.GuanL. L.MillerS. P.et al (2013). Reducing GHG emissions through genetic improvement for feed efficiency: effects on economically important traits and enteric methane production.Animal7(Suppl. 2), 303–315. 10.1017/S1751731113000888
4
BellM. J.EckardR. J.Haile-MariamM.PryceJ. E. (2013). The effect of changing cow production and fitness traits on net income and greenhouse gas emissions from Australian dairy systems.J. Dairy Sci.967918–7931. 10.3168/jds.2012-6289
5
BolgerA. M.LohseM.UsadelB. (2014). Trimmomatic: A flexible trimmer for Illumina Sequence Data.Bioinformatics302114–2120. 10.1093/bioinformatics/btu170
6
CaporasoJ. G.KuczynskiJ.StombaughJ.BittingerK.BushmanF. D.CostelloE. K.et al (2010). QIIME allows analysis of high-throughput community sequencing data.Nat. Methods7335–336. 10.1038/nmeth.f.303
7
CaporasoJ. G.LauberC. L.WaltersW. A.Berg-LyonsD.LozuponeC. A.TurnbaughP. J.et al (2011). Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample.Proc. Natl. Acad. Sci. U.S.A.108(Suppl.)4516–4522. 10.1073/pnas.1000080107
8
de CarvalhoI. P. C.FiorentiniG.de Souza CastagninoP.de JesusR. B.MessanaJ. D.Granja-SalcedoY. T.et al (2017). Supplementation with lipid sources alters the ruminal fermentation and duodenal flow of fatty acids in grazing Nellore steers.Anim. Feed Sci. Technol.227142–153. 10.1016/j.anifeedsci.2017.02.017
- CrossRef
- Google Scholar
9
EdgarR. C. (2010). Search and clustering orders of magnitude faster than BLAST.Bioinformatics262460–2461. 10.1093/bioinformatics/btq461
10
EdgarR. C.HaasB. J.ClementeJ. C.QuinceC.KnightR. (2011). UCHIME improves sensitivity and speed of chimera detection.Bioinformatics272194–2200. 10.1093/bioinformatics/btr381
11
GalloA.GiubertiG.DuvalS.MoschiniM.MasoeroF. (2016). Short communication: the effect of an exogenous enzyme with amylolytic activity on gas production and in vitro rumen starch degradability of small and large particles of corn or barley meals.J. Dairy Sci.993602–3606. 10.3168/jds.2015-9904
12
González-RecioO.GianolaD.RosaG. J.WeigelK. A.KranisA. (2009). Genome-assisted prediction of a quantitative trait measured in parents and progeny: application to food conversion rate in chickens.Genet. Sel. Evol.41:3. 10.1186/1297-9686-41-3
13
González-RecioO.PryceJ. E.Haile-MariamM.HayesB. J. (2014a). Incorporating heifer feed efficiency in the Australian selection index using genomic selection.J. Dairy Sci.973883–3893. 10.3168/jds.2013-7515
14
González-RecioO.RosaG. J. M. M.GianolaD. (2014b). Machine learning methods and predictive ability metrics for genome-wide prediction of complex traits.Livest. Sci.166217–231. 10.1016/j.livsci.2014.05.036
- CrossRef
- Google Scholar
15
Gonzalez-RecioO.ZubiriaI.García-RodríguezA.HurtadoA.AtxaerandioR. (2018). Short communication: signs of host genetic regulation in the microbiome composition in two dairy breeds: Holstein and Brown Swiss.J. Dairy Sci.1012285–2292. 10.3168/jds.2017-13179
16
GoodrichJ. K.DavenportE. R.BeaumontM.JacksonM. A.KnightR.OberC.et al (2016). Genetic determinants of the gut microbiome in UK twins.Cell Host Microbe19731–743. 10.1016/j.chom.2016.04.017
17
HuwsS. A.EdwardsJ. E.CreeveyC. J.Rees StevensP.LinW.GirdwoodS. E.et al (2016). Temporal dynamics of the metabolically active rumen bacteria colonizing fresh perennial ryegrass.FEMS Microbiol. Ecol.92:fiv137. 10.1093/femsec/fiv137
18
JewellK. A.McCormickC.OdtC. L.WeimerP. J.SuenG. (2015). Ruminal bacterial community composition in dairy cows is dynamic over the course of two lactations and correlates with feed efficiency.Appl. Environ. Microbiol.814697–4710. 10.1128/AEM.00720-15
19
KamkeJ.KittelmannS.SoniP.LiY.TavendaleM.GaneshS.et al (2016). Rumen metagenome and metatranscriptome analyses of low methane yield sheep reveals a Sharpea-enriched microbiome characterised by lactic acid formation and utilisation.Microbiome4:56. 10.1186/s40168-016-0201-2
20
KozichJ. J.WestcottS. L.BaxterN. T.HighlanderS. K.SchlossP. D. (2013). Development of a dual-index sequencing strategy and curation pipeline for analyzing amplicon sequence data on the MiSeq Illumina sequencing platform.Appl. Environ. Microbiol.795112–5120. 10.1128/AEM.01043-13
21
LiZ. P.LiuH. L.LiG. Y.BaoK.WangK. Y.XuC.et al (2013). Molecular diversity of rumen bacterial communities from tannin-rich and fiber-rich forage fed domestic Sika deer (Cervus nippon) in China.BMC Microbiol.13:151. 10.1186/1471-2180-13-151
22
LindgreenS.AdairK. L.GardnerP. P. (2016). An evaluation of the accuracy and speed of metagenome analysis tools.Sci. Rep.6:19233. 10.1038/srep19233
23
LozuponeC.KnightR.FirrellS.FoulkesM. A.JablonskiK. A.CollmanR.et al (2005). UniFrac: a new phylogenetic method for comparing microbial communities.Appl. Environ. Microbiol.718228–8235. 10.1128/AEM.71.12.8228-8235.2005
24
NilakantaH.DrewsK. L.FirrellS.FoulkesM. A.JablonskiK. A. (2014). A review of software for analyzing molecular sequences.BMC Res. Notes7:830. 10.1186/1756-0500-7-830
25
OulasA.PavloudiC.PolymenakouP.PavlopoulosG. A.PapanikolaouN.KotoulasG.et al (2015). Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies.Bioinform. Biol. Insights975–88. 10.4137/BBI.S12462
26
PlummerE.TwinJ. (2015). A comparison of three bioinformatics pipelines for the analysis of preterm gut microbiota using 16S rRNA gene sequencing data.J. Proteomics Bioinform.8283–291. 10.4172/jpb.1000381
- CrossRef
- Google Scholar
27
PopovaM.McGovernE.McCabeM. S.MartinC.DoreauM.ArbreM.et al (2017) The structural and functional capacity of ruminal and cecal microbiota in growing cattle was unaffected by dietary supplementation of linseed oil and nitrate.Front. Microbiol.8:937. 10.3389/fmicb.2017.00937
- CrossRef
- Google Scholar
28
QuastC.PruesseE.YilmazP.GerkenJ.SchweerT.YarzaP.et al (2013). The SILVA ribosomal RNA gene database project: improved data processing and web-based tools.Nucleic Acids Res.41D590–D596. 10.1093/nar/gks1219
29
R Core Team. (2015). R: A Language and Environment for Statistical Computing.Available at:https://www.r-project.org/
- Google Scholar
30
RoeheR.DewhurstR. J.DuthieC.-A.RookeJ. A.McKainN.RossD. W.et al (2016). Bovine host genetic variation influences rumen microbial methane production with best selection criterion for low methane emitting and efficiently feed converting hosts based on metagenomic gene abundance.PLoS Genet.12:e1005846. 10.1371/journal.pgen.1005846
31
RognesT.FlouriT.NicholsB.QuinceC.MahéF. (2016). VSEARCH: a versatile open source tool for metagenomics.PeerJ4:e2584. 10.7717/peerj.2584
32
RossE. M.MoateP. J.MarettL. C.CocksB. G.HayesB. J. (2013). Metagenomic predictions: from microbiome to complex health and environmental phenotypes in humans and cattle.PLoS One8:e73056. 10.1371/journal.pone.0073056
33
SchlossP. D.WestcottS. L.RyabinT.HallJ. R.HartmannM.HollisterE. B.et al (2009). Introducing mothur: open-source. platform-independent, community-supported software for describing and comparing microbial communities.Appl. Environ. Microbiol.757537–7541. 10.1128/AEM.01541-09
34
StewartR. D.AuffretM. D.WarrA.WiserA. H.PressM. O.LangfordK. W.et al (2018). Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen.Nat. Commun.9:870. 10.1038/s41467-018-03317-6
35
TongJ.ZhangH.YangD.ZhangY.XiongB.JiangL. (2018) Illumina sequencing analysis of the ruminal microbiota in high-yield and low-yield lactating dairy cows.PLoS One13:e0198225. 10.1371/journal.pone.0198225
36
WallaceR. J.RookeJ. A.McKainN.DuthieC.-A.HyslopJ. J.RossD. W.et al (2015). The rumen microbial metagenome associated with high methane production in cattle.BMC Genomics16:839. 10.1186/s12864-015-2032-0
37
WangQ.GarrityG. M.TiedjeJ. M.ColeJ. R. (2007). Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy.Appl. Environ. Microbiol.735261–5267. 10.1128/AEM.00062-07
38
WestcottS. L.SchlossP. D. (2015). De novo clustering methods outperform reference-based methods for assigning 16S rRNA gene sequences to operational taxonomic units.PeerJ3:e1487. 10.7717/peerj.1487
39
YuZ.FosterR. (2005). Methods in Gut Microbial Ecology for Ruminants.Berlin: Springer.
- Google Scholar
40
ZhangR.ZhuW.ZhuW.LiuJ.MaoS. (2014). Effect of dietary forage sources on rumen microbiota, rumen fermentation and biogenic amines in dairy cows.J. Sci. Food Agric.941886–1895. 10.1002/jsfa.6508
41
ZhouM.Hernandez-SanabriaE.GuanL. L. (2009). Assessment of the microbial ecology of ruminal methanogens in cattle with different feed efficiencies.Appl. Environ. Microbiol.756524–6533. 10.1128/AEM.02815-08
42
ZhouM.Hernandez-SanabriaE.GuanL. L. (2010). Characterization of variation in rumen methanogenic communities under different dietary and host feed efficiency conditions, as determined by PCR-denaturing gradient gel electrophoresis analysis.Appl. Environ. Microbiol.763776–3786. 10.1128/AEM.00010-10
43
Zilber-RosenbergI.RosenbergE. (2008). Role of microorganisms in the evolution of animals and plants: the hologenome theory of evolution.FEMS Microbiol. Rev.32723–735. 10.1111/j.1574-6976.2008.00123.x

Summary

Keywords

mothur, QIIME, 16S rRNA, rumen microbiota, bovine

Citation

López-García A, Pineda-Quiroga C, Atxaerandio R, Pérez A, Hernández I, García-Rodríguez A and González-Recio O (2018) Comparison of Mothur and QIIME for the Analysis of Rumen Microbiota Composition Based on 16S rRNA Amplicon Sequences. Front. Microbiol. 9:3010. doi: 10.3389/fmicb.2018.03010

Received

31 July 2018

Accepted

21 November 2018

Published

13 December 2018

Volume

9 - 2018

Edited by

Garret Suen, University of Wisconsin-Madison, United States

Reviewed by

Irene Wagner-Doebler, Helmholtz-Zentrum für Infektionsforschung, Helmholtz-Gemeinschaft Deutscher Forschungszentren (HZ), Germany; Luciano Takeshi Kishi, São Paulo State University, Brazil; Kimberly Anne Dill-McFarland, University of British Columbia, Canada

Updates

This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Oscar González-Recio, gonzalez.oscar@inia.es; ogrecio@gmail.com

This article was submitted to Systems Microbiology, a section of the journal Frontiers in Microbiology

Disclaimer

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

Systems Microbiology

ORIGINAL RESEARCH article

Comparison of Mothur and QIIME for the Analysis of Rumen Microbiota Composition Based on 16S rRNA Amplicon Sequences

Abstract

Introduction