Critical Issues in Mycobiota Analysis

Fungi constitute an important part of the human microbiota and they play a significant role for health and disease development. Advancements made in the culture-independent analysis of microbial communities have broadened our understanding of the mycobiota, however, microbiota analysis tools have been mainly developed for bacteria (e.g., targeting the 16S rRNA gene) and they often fall short if applied to fungal marker-gene based investigations (i.e., internal transcribed spacers, ITS). In the current paper we discuss all major steps of a fungal amplicon analysis starting with DNA extraction from specimens up to bioinformatics analyses of next-generation sequencing data. Specific points are discussed at each step and special emphasis is placed on the bioinformatics challenges emerging during operational taxonomic unit (OTU) picking, a critical step in mycobiota analysis. By using an in silico ITS1 mock community we demonstrate that standard analysis pipelines fall short if used with default settings showing erroneous fungal community representations. We highlight that switching OTU picking to a closed reference approach greatly enhances performance. Finally, recommendations are given on how to perform ITS based mycobiota analysis with the currently available measures.


Supplementary Materials and Methods
The following subsections describe the protocols and workflows mentioned in the main text in detail, including used commands, parameters, and settings. d. The tissue pellet was digested with ATL buffer (Qiagen, Chatsworth, CA, USA) and treated with proteinase K solution at 56°C overnight. Subsequently, samples were processed using columns as per the manufacture's protocol.

Statistical analysis of qPCR data
Quantitative PCR data were assessed with Shapiro-Wilk normality test for their normal distribution.
Data are given as mean ± standard error of the mean. Statistical analyses were performed with GraphPad Prism 5 software, by the use of one-way ANOVA and Dunnett's post hoc test for multiple comparisons. P-values <0.05 were considered statistically significant.

In silico ITS1 mock community generation
The in silico ITS1 mock community is based on the publicly available UNITE ITS collection (version

Bioinformatical analysis of ITS1 fragments
To demonstrate the differences between de novo OTU picking strategies and closed reference based approaches the same set of ITS1 fragments was analyzed with mothur (Schloss et al. 2009), QIIME (Caporaso et al. 2010), and MICCA (Albanese et al. 2015) in default, de novo, as well as in closed reference mode. Details and commands for each tool are given in detail in the following sections.
Unless otherwise specified, standard values and settings have been used with the applied commands.

mothurdefault de novo OTU picking
In general the analysis followed the MiSeq SOP of Kozich et al. (2013, accessed May 2016) starting with align.seqs, since amplicons were already pre-processed (described within section 1.4). In the absence of an available reference sequence alignment, an alignment was manually created based on UNITE (version 6, 2014-12-30). Briefly, pre-clustered and pre-formatted version of UNITE for mothur was applied to ITSx to extract ITS1 fragments only. The Given taxonomic classification was used to select only one representative per species. According to this criteria a subset of 5,699 ITS1 fragments remained and were finally aligned by muscle version 3.8.31 (Edgar 2004)

QIIMEclosed reference OTU picking
The in silico ITS1 mock sequences were analyzed using QIIME (version 1.8.0) using modified settings for closed reference OTU picking. Exact commands are given in the box below. Reads were clustered into OTUs using pick_otus.py with blast as method. For each OTU created by the closed reference approach, taxonomic classification was added according to the given identifier and the corresponding taxonomy file sh_tax_qiime_ver7_97_22.08.2016.txt by a custom R script for further comparison with the true annotation.

MICCAclosed reference OTU picking
Pre-processed in silico ITS1 amplicons were used for closed reference OTU picking with MICCA as recommended within the tool documentation (http://micca.org/docs/latest/commands/otu.html, accessed September 2016). QIIME formatted and pre-clustered (97% identity) UNITE, version 7 (sh_refs_qiime_ver7_97_22.08.2016.fasta) was used as closed reference database. Taxonomic classification was assigned according to the given taxonomic reference database. Used commands and settings are listed below: In contrast, for different phylum level fungi only conservation around the 5.8S is detectable, which makes the MSA of distinct fungal ITS fragments meaningless, supplementary data sheet S2C. For ITS fragments of the same genus the ratio between conservation and variation allows even for species discrimination supplementary data sheet S2B.

Supplementary Tables and Files
Supplementary  supplementary_data_sheet_S5.zip