DAnIEL: A User-Friendly Web Server for Fungal ITS Amplicon Sequencing Data

Trillions of microbes representing all kingdoms of life are resident in, and on, humans holding essential roles for the host development and physiology. The last decade over a dozen online tools and servers, accessible via public domain, have been developed for the analysis of bacterial sequences; however, the analysis of fungi is still in its infancy. Here, we present a web server dedicated to the comprehensive analysis of the human mycobiome for (i) translating raw sequencing reads to data tables and high-standard figures, (ii) integrating statistical analysis and machine learning with a manually curated relational database and (iii) comparing the user’s uploaded datasets with publicly available from the Sequence Read Archive. Using 1,266 publicly available Internal transcribed spacers (ITS) samples, we demonstrated the utility of DAnIEL web server on large scale datasets and show the differences in fungal communities between human skin and soil sites.

. Correlation networks of wound and dandruff humansamples. SparCC correlation networks per sample group using A: default threshold (| | > 0.2) and B: (| | > 0.1) using the interactive GUI. C: Distribution of network topology metrics over genera in the correlation network.  Boxplots are shown if the steps were run in parallel for each sample. It took 2.9 h to process the soil samples (N=300) and 62.9 h to process the human samples (N=1350).

Fig. S5
: Benchmarking taxon existence. DAnIEL was run on 10 samples consisting simulated reads using denoising methods DADA2 and PIPITS and classification methods BLAST consensus (blast) and Naïve Bayes (nb). Contingency tables were calculated by counting samples in which a taxon was both measured and simulated. DADA2 outperformed PIPITS in all metrics. NB classification outperformed BLAST in terms of specificity and precision but not in sensitivity.

Fig. S6
: Benchmarking taxon abundance. DAnIEL was run on 10 samples consisting simulated reads using denoising methods DADA2 and PIPITS and classification methods BLAST consensus (blast) and Naïve Bayes (nb). Difference of measured to true abundance was calculated for each sample and taxon. Furthermore, L2 norm of measured abundance profile to the true one was calculated for each sample. DADA2 in combination with BLAST yielded the most accurate results.

Fig S7:
Comparison between ASV and OTU profiling in the soil case study. Overall, 70 out of the 73 significantly differentially abundant genera of the ASV method were also significant in the OTU method. A: Alpha diversity, B/C: SparCC correlation coefficients of co-abundant genera per sample group. Insignificant correlations are shown in grey. D: The phylogenetic tree shows, if a clade was present in any or both methods after keeping only the 10 taxa with the highest commutative abundance for each denoising method. Boxplots indicate the difference of total sum scaled abundances over the samples for each genus leaf in the tree. Values are z-scaled to variance of 1 and a mean of 0 for each taxon separately.

Fig S8:
Comparison between ASV and OTU profiling in the human case study. Overall, 36 out of the 40 significantly differentially abundant genera of the ASV method were also significant in the OTU method. A: Alpha diversity, B/C: SparCC correlation coefficients of co-abundant genera per sample group. Insignificant correlations are shown in grey. D: The phylogenetic tree shows, if a clade was present in any or both methods after keeping only the 10 taxa with the highest commutative abundance for each denoising method. Boxplots indicate the difference of total sum scaled abundances over the samples for each genus leaf in the tree. Values are z-scaled to variance of 1 and a mean of 0 for each taxon separately.