<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v2.3 20070202//EN" "journalpublishing.dtd">
<article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" article-type="research-article">
<front>
<journal-meta>
<journal-id journal-id-type="publisher-id">Front. Microbiol.</journal-id>
<journal-title>Frontiers in Microbiology</journal-title>
<abbrev-journal-title abbrev-type="pubmed">Front. Microbiol.</abbrev-journal-title>
<issn pub-type="epub">1664-302X</issn>
<publisher>
<publisher-name>Frontiers Media S.A.</publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="doi">10.3389/fmicb.2018.00297</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Microbiology</subject>
<subj-group>
<subject>Original Research</subject>
</subj-group>
</subj-group>
</article-categories>
<title-group>
<article-title>Linking Associations of Rare Low-Abundance Species to Their Environments by Association Networks</article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author" corresp="yes">
<name><surname>Karpinets</surname> <given-names>Tatiana V.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="author-notes" rid="fn001"><sup>&#x002A;</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/405634/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Gopalakrishnan</surname> <given-names>Vancheswaran</given-names></name>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
<xref ref-type="aff" rid="aff4"><sup>4</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Wargo</surname> <given-names>Jennifer</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
<xref ref-type="aff" rid="aff3"><sup>3</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Futreal</surname> <given-names>Andrew P.</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib>
<contrib contrib-type="author">
<name><surname>Schadt</surname> <given-names>Christopher W.</given-names></name>
<xref ref-type="aff" rid="aff2"><sup>2</sup></xref>
<xref ref-type="aff" rid="aff5"><sup>5</sup></xref>
<uri xlink:href="http://loop.frontiersin.org/people/23200/overview"/>
</contrib>
<contrib contrib-type="author">
<name><surname>Zhang</surname> <given-names>Jianhua</given-names></name>
<xref ref-type="aff" rid="aff1"><sup>1</sup></xref>
</contrib></contrib-group>
<aff id="aff1"><sup>1</sup><institution>Department of Genomic Medicine, The University of Texas MD Anderson Cancer Center</institution>, <addr-line>Houston, TX</addr-line>, <country>United States</country></aff>
<aff id="aff2"><sup>2</sup><institution>Biosciences Division, Oak Ridge National Laboratory</institution>, <addr-line>Oak Ridge, TN</addr-line>, <country>United States</country></aff>
<aff id="aff3"><sup>3</sup><institution>Department of Surgical Oncology, The University of Texas MD Anderson Cancer Center</institution>, <addr-line>Houston, TX</addr-line>, <country>United States</country></aff>
<aff id="aff4"><sup>4</sup><institution>Department of Epidemiology, Human Genetics and Environmental Sciences, University of Texas School of Public Health</institution>, <addr-line>Dallas, TX</addr-line>, <country>United States</country></aff>
<aff id="aff5"><sup>5</sup><institution>Department of Microbiology, University of Tennessee, Knoxville</institution>, <addr-line>Knoxville, TN</addr-line>, <country>United States</country></aff>
<author-notes>
<fn fn-type="edited-by"><p>Edited by: <italic>Michele Guindani, University of California, Irvine, United States</italic></p></fn>
<fn fn-type="edited-by"><p>Reviewed by: <italic>Rohita Sinha, University of Nebraska-Lincoln, United States; Stephen Woloszynek, Drexel University, United States</italic></p></fn>
<fn fn-type="corresp" id="fn001"><p>&#x002A;Correspondence: <italic>Tatiana V. Karpinets, <email>tvkarpinets@mdanderson.org</email></italic></p></fn>
<fn fn-type="other" id="fn002"><p>This article was submitted to Systems Microbiology, a section of the journal Frontiers in Microbiology</p></fn></author-notes>
<pub-date pub-type="epub">
<day>07</day>
<month>03</month>
<year>2018</year>
</pub-date>
<pub-date pub-type="collection">
<year>2018</year>
</pub-date>
<volume>9</volume>
<elocation-id>297</elocation-id>
<history>
<date date-type="received">
<day>20</day>
<month>10</month>
<year>2017</year>
</date>
<date date-type="accepted">
<day>08</day>
<month>02</month>
<year>2018</year>
</date>
</history>
<permissions>
<copyright-statement>Copyright &#x00A9; 2018 Karpinets, Gopalakrishnan, Wargo, Futreal, Schadt and Zhang.</copyright-statement>
<copyright-year>2018</copyright-year>
<copyright-holder>Karpinets, Gopalakrishnan, Wargo, Futreal, Schadt and Zhang</copyright-holder>
<license xlink:href="http://creativecommons.org/licenses/by/4.0/"><p>This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.</p></license>
</permissions>
<abstract>
<p>Studies of microbial communities by targeted sequencing of rRNA genes lead to recovering numerous rare low-abundance taxa with unknown biological roles. We propose to study associations of such rare organisms with their environments by a computational framework based on transformation of the data into qualitative variables. Namely, we analyze the sparse table of putative species or OTUs (operational taxonomic units) and samples generated in such studies, also known as an OTU table, by collecting statistics on co-occurrences of the species and on shared species richness across samples. Based on the statistics we built two association networks, of the rare putative species and of the samples respectively, using a known computational technique, Association networks (Anets) developed for analysis of qualitative data. Clusters of samples and clusters of OTUs are then integrated and combined with metadata of the study to produce a map of associated putative species in their environments. We tested and validated the framework on two types of microbiomes, of human body sites and that of the <italic>Populus</italic> tree root systems. We show that in both studies the associations of OTUs can separate samples according to environmental or physiological characteristics of the studied systems.</p>
</abstract>
<kwd-group>
<kwd>metagenome</kwd>
<kwd>microbiome</kwd>
<kwd>unsupervised analysis</kwd>
<kwd>alpha and beta diversity</kwd>
<kwd>sparse data</kwd>
<kwd>Anets</kwd>
<kwd>qualitative data</kwd>
</kwd-group>
<counts>
<fig-count count="8"/>
<table-count count="0"/>
<equation-count count="0"/>
<ref-count count="43"/>
<page-count count="16"/>
<word-count count="0"/>
</counts>
</article-meta>
</front>
<body>
<sec><title>Introduction</title>
<p>The rare low-abundance microbial species, which have been referred to as the &#x201C;rare biosphere&#x201D; (<xref ref-type="bibr" rid="B38">Sogin et al., 2006</xref>), have attracted increasing attention in the recent literature because of their unknown ecology and potential evolutionary and ecological importance (<xref ref-type="bibr" rid="B43">Youssef et al., 2010</xref>; <xref ref-type="bibr" rid="B28">Pedros-Alio, 2012</xref>; <xref ref-type="bibr" rid="B7">Coveley et al., 2015</xref>; <xref ref-type="bibr" rid="B23">Lynch and Neufeld, 2015</xref>; <xref ref-type="bibr" rid="B36">Sharon et al., 2015</xref>; <xref ref-type="bibr" rid="B20">Jousset et al., 2017</xref>). Although sequencing errors and undersampling of OTUs may contribute to extent of the &#x201C;rare biosphere,&#x201D; the advent of new bioinformatics tools (<xref ref-type="bibr" rid="B33">Schloss and Westcott, 2011</xref>; <xref ref-type="bibr" rid="B29">Preheim et al., 2013</xref>; <xref ref-type="bibr" rid="B11">Edgar and Flyvbjerg, 2015</xref>; <xref ref-type="bibr" rid="B36">Sharon et al., 2015</xref>; <xref ref-type="bibr" rid="B5">Callahan et al., 2016</xref>) as well as experimental and technological approaches (<xref ref-type="bibr" rid="B20">Jousset et al., 2017</xref>) are increasingly compelling of the presence and complexity of these rare taxa. Biological explanations (<xref ref-type="bibr" rid="B28">Pedros-Alio, 2012</xref>; <xref ref-type="bibr" rid="B7">Coveley et al., 2015</xref>; <xref ref-type="bibr" rid="B23">Lynch and Neufeld, 2015</xref>; <xref ref-type="bibr" rid="B20">Jousset et al., 2017</xref>) and other factors, such as poor taxonomic resolution of short reads, especially for closely related species or those poorly represented in the genomic database, incomplete or inadequate sampling, dispersal limitation, spatial and temporal partitioning of the environment, and the nestedness of ecological mutualistic networks, may contribute to such results (<xref ref-type="bibr" rid="B3">Bascompte et al., 2003</xref>; <xref ref-type="bibr" rid="B43">Youssef et al., 2010</xref>; <xref ref-type="bibr" rid="B31">Rosindell et al., 2011</xref>; <xref ref-type="bibr" rid="B41">Unterseher et al., 2011</xref>; <xref ref-type="bibr" rid="B19">James et al., 2012</xref>; <xref ref-type="bibr" rid="B26">Mi et al., 2012</xref>; <xref ref-type="bibr" rid="B28">Pedros-Alio, 2012</xref>; <xref ref-type="bibr" rid="B39">Suweis et al., 2013</xref>).</p>
<p>The numerous rare OTUs are a typical output of 16S rRNA amplicon sequencing studies, especially those with many and diverse samples. The resultant sparse datasets present a challenge for common statistical tools. The data matrix produced by such studies are usually comprised of species-like groups (rows) and their abundances calculated as the number of sequencing reads representing each species across multiple samples (columns). The species-like groups are typically inferred by a conventional aggregation of sequences into OTUs based on a sequence identity threshold or, in more recent work, by amplicon sequence variants (ASVs) (<xref ref-type="bibr" rid="B5">Callahan et al., 2016</xref>; <xref ref-type="bibr" rid="B4">Callahan, 2017</xref>). In both cases, most species-like groups could be representative of species-specialists; they are not only low in abundance in a given sample, but are also rare across samples and environments. Known computational tools for analyzing the sparse data often address the sparsity problem by filtering out very rare species or by collapsing species to a higher-level hierarchy. Although the aggregation reduces sparsity (dominance of zeros in the dataset) of the data, the OTUs-level insights into the structure of microbiome will be lost. By excluding the rare OTUs, such as those found in less than 30% of samples, we also may lose information. It is not clear how extensive this loss might be.</p>
<p>In addition to sparsity, the 16S rRNA gene sequencing data have other challenges including their compositionality and dimensionality (essentially greater number of OTUs than the number of samples). The data compositionality means that we don&#x2019;t know the real OTU abundances and have to deal with proportions of species relative to their sum in each sample. Several methods have been proposed to address the challenges (<xref ref-type="bibr" rid="B25">McMurdie and Holmes, 2014</xref>; <xref ref-type="bibr" rid="B40">Tsilimigras and Fodor, 2016</xref>). The most recent methods proposed to infer species&#x2013;species relationships from the 16S rRNA amplicon datasets include Compositionality Corrected by REnormalization and PErmutation (CCREPE) (<xref ref-type="bibr" rid="B14">Faust et al., 2012</xref>), metagenomeSeq (<xref ref-type="bibr" rid="B27">Paulson et al., 2013</xref>), Sparse Correlations for Compositional data (SparCC) (<xref ref-type="bibr" rid="B15">Friedman and Alm, 2012</xref>), a mixture model framework (<xref ref-type="bibr" rid="B25">McMurdie and Holmes, 2014</xref>), SParse InversE Covariance Estimation for Ecological Association Inference (SpiecEasi) (<xref ref-type="bibr" rid="B22">Kurtz et al., 2015</xref>), and gCoda (<xref ref-type="bibr" rid="B13">Fang et al., 2017</xref>). Each of the tools addresses dimensionality and compositionality challenges of the datasets using different computational approaches. The cumulative sum scaling normalization and the zero-inflated Gaussian distribution mixture model are used in metagenomeSeq to account for biases resulting from under-sampling when selecting the differential abundant OTUs. The log-ratio transformation and the variance are used in SparCC to overcome compositionality of the data. The data dimensionality and compositionality are even more efficiently addressed by SpiecEasi and gCoda using the data transformation borrowed from the compositional data analysis and then inferring the interaction graph from the transformed data by neighborhood selection or by sparse inverse covariance selection.</p>
<p>All abovementioned tools, however, analyze the OTU table after filtering out most rare OTUs (Supplementary Figures <xref ref-type="supplementary-material" rid="SM1">S1A</xref>&#x2013;<xref ref-type="supplementary-material" rid="SM1">D</xref>). In case of SparCC, the filtering is the most stringent because the algorithm employs log-transformations of the read counts. The basic assumption of the approach is that all OTUs are present in the dataset; therefore small values must be assigned to undetected OTUs to include them in the analysis. The percentage of rare OTUs may be even greater in studies with large number of samples or when sampling takes place in more diverse environments, such as the Human Microbiome Project (HMP) dataset and the <italic>Populus</italic> Root Microbiome (PRM) dataset (Supplementary Figures <xref ref-type="supplementary-material" rid="SM1">S1E,F</xref>). In the study we have made an attempt to explore the biological role of the rare low-abundance OTUs in these two environments using existing data from Human body sites (2012) and from <italic>Populus</italic> roots (<xref ref-type="bibr" rid="B35">Shakya et al., 2013</xref>). To reduce the burden of filtering for the rare OTUs and overcome the problem of compositionality we treat the OTUs as qualitative variables and apply an analytical tool specific for analysis of such datasets.</p>
</sec>
<sec><title>Results</title>
<sec><title>Approach</title>
<p>Our initial analysis of the Human and <italic>Populus</italic> microbiome datasets reveals that both datasets are in agreement with the well-known occupancy&#x2013;abundance relationship (<xref ref-type="bibr" rid="B16">Gaston, 1996</xref>), which positively links the species abundances and the number of sites/samples they occupy. We find that in both datasets, OTUs that are more common across samples are also more abundant, and rare OTUs across samples are usually less abundant (<bold>Figures <xref ref-type="fig" rid="F1">1A,B</xref></bold>). Notably, the number of common abundant OTUs is extremely small in the datasets. Considering this observation we decided to treat the rare OTUs as qualitative data by replacing the putative species abundances with the presence/absence call (0/1 values). Although in this approach we lose information on abundances, at the same time, the resulting dataset will not be compositional. In addition, we get the chance to transform the data to collect additional statistics on co-occurrences of species with each other and to quantify interdependencies of the species. The quantification is based on an assumption that rare OTUs (putative species) are associated because they are dependent upon one another in each studied environment. They may be dependent metabolically, when metabolites produced by one species are consumed by another species. They also may have similar optimal growth conditions or offer complementary functions to support microbial community as a whole (<xref ref-type="bibr" rid="B20">Jousset et al., 2017</xref>). All these factors may lead to co-occurrences of the rare OTUs in the samples. We quantify the co-dependence of OTUs by calculating a co-occurrence profile of each OTU with all other OTUs in the data and by interrogating similarities of the emerged profiles for each pair of OTUs. We performed the calculations by applying a previously developed statistical tool, Association Network (Anets) (<xref ref-type="bibr" rid="B21">Karpinets et al., 2012</xref>), used for discovering of associations in qualitative datasets<sup><xref ref-type="fn" rid="fn01">1</xref></sup> and refer to the resultant network as Anets-OTUs.</p>
<fig id="F1" position="float">
<label>FIGURE 1</label>
<caption><p>Occupancy&#x2013;abundance relationship. <bold>(A)</bold> Human Microbiome Project (HMP) dataset (43140 OTUs &#x00D7; 2910 Samples). <bold>(B)</bold> <italic>Populus</italic> Root Microbiome (PRM) dataset (24434 OTUs &#x00D7; 83 Samples).</p></caption>
<graphic xlink:href="fmicb-09-00297-g001.tif"/>
</fig>
<p>In addition to that Network, we also build the network of samples, Anets-Samples, using the same algorithm. By combining both networks we produce a map where associated OTUs and associated samples are clustered according to their presence/absence. This map can be further compared with characteristics of the studied environments. An overview of this computational framework is shown in <bold>Figure <xref ref-type="fig" rid="F2">2</xref></bold> and details of the implementation are provided in Supplementary Data Sheet <xref ref-type="supplementary-material" rid="SM1">S1</xref>.</p>
<fig id="F2" position="float">
<label>FIGURE 2</label>
<caption><p>Computational framework used in the study to explore associations of rare species.</p></caption>
<graphic xlink:href="fmicb-09-00297-g002.tif"/>
</fig>
<p>We also used a simulated dataset (<bold>Figure <xref ref-type="fig" rid="F3">3A</xref></bold>) to illustrate and explain computations underlying the proposed framework. In this study, we have two synthetic microbial communities with four associated species (circles) in the first community and four associated species (triangles) in the second community. Species in each community are co-dependent, and therefore more often co-occur in their parent environment. We made 12 random samples of species from the communities and organized the sampling results as an OTU table (<bold>Figure <xref ref-type="fig" rid="F3">3B</xref></bold>) with species/OTUs in rows and samples in columns. All species identified in the samples are rare; they are found only in 2&#x2013;5 out of 12 samples. Thus, we replaced the species abundances with the presence/absence (1/0) values.</p>
<fig id="F3" position="float">
<label>FIGURE 3</label>
<caption><p>Generating Anets-OTUs using the simulated study. <bold>(A)</bold> A simulated study of two synthetic microbial communities: four species shown by colored (red, green, blue, brown) circles (Community 1), and four different species shown by colored (red, green, blue, brown) triangles (Community 2). The same color of the species indicates their close taxonomic relationship. To introduce noise in sampling, two species from the second community were added to the first community, and one species from the first community was added to the second community. Six samples were taken to identify species in each community and to generate an OTU table with the species abundances. <bold>(B)</bold> OTU table of the simulated study. <bold>(C)</bold> The table of co-occurrences for each pair of OTUs. Values of the table show the number of samples where each pair of species co-occurs. <bold>(D)</bold> Pair-wise similarities of the co-occurrence profiles for each pair of species. Red colored associations were used to generate Anets-OTUs. <bold>(E)</bold> Anets-OTUs. <bold>(F)</bold> The table of the shared species richness for each pair of samples. Values of the table show how many OTUs are shared for each pair of samples. <bold>(G)</bold> Pair-wise similarities of the shared species richness profiles for each pair of samples. Red colored associations were used to generate Anets-Samples. <bold>(H)</bold> Anets-Samples. <bold>(I)</bold> A map of the associated species and samples.</p></caption>
<graphic xlink:href="fmicb-09-00297-g003.tif"/>
</fig>
<sec><title>Association Network of Species</title>
<p>To generate the Anets-OTUs we first transform the OTU table to produce a new table where rows and columns consist of OTUs and each cell shows the number of samples where two OTUs co-occur in the data (<bold>Figure <xref ref-type="fig" rid="F3">3C</xref></bold>). The transformed table, therefore, gives us the co-occurrence profiles for each OTU with the rest. We further use these profiles to infer pair-wise associations of the OTUs (<bold>Figure <xref ref-type="fig" rid="F3">3D</xref></bold>). Although the input of the approach is OTU table with 1/0 values instead of counts, the statistics collected in the transformed table produces continuous variables. The Anets program provides three options to quantify the pair-wise similarities of the profiles. The options include Spearman correlation (default), Pearson correlation, and cosine (Jaccard index). While alternative similarity metrics may be appropriate for particular datasets, in these studies we found that the Pearson correlation coefficient was most robust for identifying association networks. We calculate the Pearson correlation to measure similarity of the profiles for each pair of OTUs and consider the OTUs associated if the correlation coefficient R > = 0.30. The selected pairs of OTUs predict the network (Anets-OTUs) of seven species with seven associations separated into two clusters/communities (<bold>Figure <xref ref-type="fig" rid="F3">3E</xref></bold>). The species inferred by the Anets-OTUs in each cluster correspond to two communities provided in the mock study (<bold>Figure <xref ref-type="fig" rid="F3">3A</xref></bold>). The algorithm did not recover only one species from the Environment 1 of the study.</p>
<p>While, the calculations described in this small illustrative dataset can be implemented in Excel, in case of real datasets, with many samples and OTUs, the calculations can be performed using the Anets program (<xref ref-type="bibr" rid="B21">Karpinets et al., 2012</xref>). The program also calculates the <italic>p</italic>-value for each association using the Monte-Carlo simulation. The associated species, therefore, can be selected using a <italic>p</italic>-values threshold. The Anets-OTUs produced for the mock study is small and doesn&#x2019;t require clustering. For the real dataset, different algorithms and software tools can be used to cluster the network as described in Supplementary Data Sheet <xref ref-type="supplementary-material" rid="SM1">S1</xref>.</p>
</sec>
<sec><title>Association Network of Samples</title>
<p>A similar algorithm was used to generate the associations of samples (<bold>Figures <xref ref-type="fig" rid="F3">3F</xref>&#x2013;<xref ref-type="fig" rid="F3">H</xref></bold>). In this case we transform the OTU table to produce a new table where both rows and columns consist of samples and each cell represents the number of shared OTUs for each pair of samples. The ecological interpretation of the number is the shared species richness for a pair of samples. We consider two samples associated if they have a similar profile of the shared species richness values across all samples in the dataset. Such indirect similarity can establish an association between each pair of samples even if the majority of species in the samples are not common. Computationally, the algorithm generating the Anets-Samples (<bold>Figures <xref ref-type="fig" rid="F3">3F</xref>&#x2013;<xref ref-type="fig" rid="F3">H</xref></bold>) is similar to the algorithm of the Anets-OTUs (<bold>Figures <xref ref-type="fig" rid="F3">3C</xref>&#x2013;<xref ref-type="fig" rid="F3">E</xref></bold>). As before, the transposed table is used to compute profiles of shared species richness values for the samples (<bold>Figure <xref ref-type="fig" rid="F3">3F</xref></bold>) followed by estimation of pair-wise correlations (<bold>Figure <xref ref-type="fig" rid="F3">3G</xref></bold>) and clustering (<bold>Figure <xref ref-type="fig" rid="F3">3H</xref></bold>). As we can see in the <bold>Figure <xref ref-type="fig" rid="F3">3H</xref></bold>, the clustering recovers associations among 9 out of 12 samples in the illustrative study. The final step of the framework is an integration of the results obtained by Anets-OTUs and Anets-Samples by building a presence/absence map of the associated species and samples (<bold>Figure <xref ref-type="fig" rid="F3">3I</xref></bold>).</p>
</sec>
</sec>
<sec><title>Applying the Approach to Experimental Datasets</title>
<p>In order to test our methodology, we employed the described framework to analyze two well- established and published experimental datasets from a study of <xref ref-type="bibr" rid="B18">Human Microbiome Project Consortium, 2012</xref> and from a study of the PRM (<xref ref-type="bibr" rid="B35">Shakya et al., 2013</xref>). In each of these datasets, 16S or 28S rRNA amplicon sequencing was used to profile the microbiome in different environments. By applying our methodology in an unsupervised manner to build a map of associated OTUs and samples, we were able to test how well the inter-sample associations reproduced their observed phenotype in the environment, with the added advantage of studying associations of rare OTUs underlying the grouping of samples.</p>
<sec><title><italic>Populus</italic> Root Microbiome</title>
<p>The dataset (<xref ref-type="bibr" rid="B35">Shakya et al., 2013</xref>) includes 2999 fungal OTUs and 24435 bacterial OTUs identified in 84 samples taken in May and in September from two geographical locations, Tennessee (TN) and North Carolina (NC) associated with the roots of Eastern Cottonwood (<italic>Populus deltoides)</italic> trees at along two different rivers. The study also collected a set of soil properties and host characteristics for each of the 23 sampling locations; we used these metadata to examine their relationships with the associations of samples discovered by the Anets-Samples.</p>
<p>Examination of the OTU table from the study reveals that common species (found in &#x223C;60% of samples) or generalists in <italic>Populus</italic> root are represented by only 61 OTUs, or 0.22% of total number of OTUs in the dataset. As expected, the majority of OTUs had low-abundance and was rare (<bold>Figure <xref ref-type="fig" rid="F1">1B</xref></bold>). After applying the Anets-OTUs algorithm to the OTU table we found six large associations of OTUs (<italic>p</italic>-value &#x003C; 0.05). A further enrichment analysis (see section &#x201C;Materials and Methods&#x201D;) attributed each association to a location, TN and NC, and to a sampling season, May or September (<bold>Figure <xref ref-type="fig" rid="F4">4A</xref></bold>). This analysis revealed that communities of low-abundance OTUs, were underlying groups of samples based on known environmental factors from the study. To further confirm the grouping we built a heat map of the associated OTUs (horizontal axis) across all samples (vertical axis) organized by the geographical location and season and sampling (<bold>Figure <xref ref-type="fig" rid="F4">4B</xref></bold>). While, it can be appreciated that many rare Anets-OTUs are present across all samples, some of them often co-occur in samples from a particular location or a season. The largest microbial association includes OTUs found in <italic>Populus</italic> rhizosphere in any season and in any location. Some associations are more common for TN or NC, and some associations are more common in September or May. This pattern suggests a tight link between the identified associations of the rare OTUs and a particular environmental factor. We noticed, for example, that a fungal OTU representing the genus <italic>Inocybe</italic> was found only in the NC cluster. Indeed, species of the genera have been tied to their environments rather than their hosts more than other fungal species (<xref ref-type="bibr" rid="B8">Cripps, 1997</xref>). Our results are consistent with this experimental observation; they also indicate that the other fungal genera in the cluster, such as <italic>Ceratobasidium</italic>, have similar biological characteristics.</p>
<fig id="F4" position="float">
<label>FIGURE 4</label>
<caption><p>Associations of rare species and samples in PRM study. <bold>(A)</bold> Communities of associated fungal and bacterial OTUs discovered by the Anets-OTUs algorithm in rhizoshpere of <italic>Populus deltoides</italic>. Nodes in the network indicate OTUs and edges indicate pair-wise association between them. The node color shows the community (cluster) assignment inferred by clustering. <bold>(B)</bold> Presence&#x2013;absence map of the associated OTUs; the cell color is red if OTU is present in the sample and it is black if OTU is absent. OTUs are grouped according to the microbial communities inferred by Anets-OTUs and sorted by mean abundance; samples are grouped according to clusters inferred by Anets-Samples and sorted by the shared richness. <bold>(C)</bold> Two associations of <italic>Populus</italic> rhizoshpere samples with the shared species richness revealed by Anets-Samples; color indicates samples taken in NC (red) and in TN (green). <bold>(D)</bold> Hierarchical clustering of the soil properties; brackets indicate three cluster of soil samples with distinct soil properties: green bracket indicates the cluster of soil samples that correspond to the association of rhizosphere samples in TN, red bracket indicates the cluster of soil samples that correspond to the association of rhizosphere samples in NC, black bracket and black squares indicate samples that don&#x2019;t found as associated by Anets-Samples.</p></caption>
<graphic xlink:href="fmicb-09-00297-g004.tif"/>
</fig>
<p>The analysis confirms that clustering at low taxonomic levels may be crucial in discriminating different environments. We find that although OTUs in each of the associations often belong to the same phyla, they are more distinct at lower taxonomic levels, such as order (Supplementary Tables <xref ref-type="supplementary-material" rid="SM1">S1A,B</xref>). For example, microbial communities of <italic>Populus</italic> roots in both locations, TN and NC, include phylums <italic>Proteobacteria</italic> with less number of OTUs in NC (Supplementary Table <xref ref-type="supplementary-material" rid="SM1">S1A</xref>). At the level of order, however, the <italic>Proteobacteria</italic> in NC had greater richness (10 orders) when compared with TN (seven orders), and included <italic>Rhodocyclales, Syntrophobacterales, Rhodobacterales</italic>, and <italic>Burkholderiales</italic> orders that were not observed in TN. Microbial communities in both locations, TN and NC, also included numerous species from phylum <italic>Acidobacteria.</italic> The microbial community in TN, however, was dominated by the order Solibacterales; this taxa, however, was not found in NC. This example clearly demonstrates that by analyzing the dataset at the level of OTUs and collapsing them after linking their associations to environments may be a better strategy for exploration of subtle difference among microbiomes in similar environments.</p>
<p>By applying the Anets-Samples algorithm to the OTU table we revealed two distinct clusters of samples in the PRM dataset (<bold>Figure <xref ref-type="fig" rid="F4">4C</xref></bold>). Within each cluster, all samples had similar profiles of the shared species richness across all samples (<italic>p</italic> &#x003C; 0.01). Furthermore, there was a clear association with metadata of the study, with the first cluster representing a subset of samples from TN, and the second cluster representing a subset of samples from NC. Eight samples did not associate with either cluster. These results mirror the results of <xref ref-type="bibr" rid="B35">Shakya et al. (2013)</xref> that used variance partitioning of transformed datasets to show that watershed (TN vs. NC), season, and sampling site within a watershed, respectively, had the greatest effect on community structure followed by other factors. To determine other environmental factors contributing to the separation of samples in two clusters we examined the variance partitioning of the bacterial OTUs within each cluster with respect to host and soil properties, geographic locations, seasons, and diversity of corresponding fungal community. The analysis was performed the same way as in the original study (see section &#x201C;Materials and Methods&#x201D;). A large proportion of variance (67.8%) of the bacterial OTUs across all samples was unexplained in the original study, whereas only 9% of variance was explained by soil properties. In contrast, among the samples that were selected by the Anets-Samples as significantly associated, only 25% of variance remained unexplained, while the greatest proportion of the variance (30.1%) was attributed to the studied soil properties (Supplementary Figure <xref ref-type="supplementary-material" rid="SM1">S2</xref>). The expected proportion of the variance estimated by the permutation test, via a random selection of the same number of samples, would be only 19%.</p>
<p>To examine the effect of soil on the separation of samples in more detail we hierarchically clustered 16 soil properties measured in the study and found that two associations discovered by the Anets-Samples in <italic>Populus deltoides</italic> rhizosphere (<bold>Figure <xref ref-type="fig" rid="F4">4C</xref></bold>), correspond to two distinct soil clusters inferred from the soil properties (<bold>Figure <xref ref-type="fig" rid="F4">4D</xref></bold>). This relationship was not found in the original study and again suggesting the importance of rare microbial species for differentiating subtle environmental conditions in addition to the traditional methods that more heavily weight species abundance and dominant taxa. In case of PRM we observe that a set of TN samples found as associated by Anets share relatively greater Zn, Mn, and Ca contents in the soil and a greater soil pH. A set of associated NC samples share relatively low values of these soil characteristics. Those samples, either from TN or NC, that are not identified by Anets-Samples as significantly associated, have a variable content of the soil properties as well as relatively greater sand content and lower clay and organic matter contents than the associated samples. The results point to the soil properties as a crucial factor underlying similarity of microbial communities in <italic>Populus deltoides</italic> rhizosphere.</p>
</sec>
<sec><title>Microbiomes of Human Body Sites</title>
<p>The HMP dataset has been characterized in several publications (<xref ref-type="bibr" rid="B14">Faust et al., 2012</xref>; <xref ref-type="bibr" rid="B30">Project, 2012</xref>; <xref ref-type="bibr" rid="B1">Aagaard et al., 2013</xref>) and includes samples obtained from 18 different body sites of 180 healthy men and women. As noted before (<bold>Figure <xref ref-type="fig" rid="F1">1A</xref></bold>), the majority of OTUs in the dataset is rare and has low-abundance. Considering the large size of the OTU table produced in the study we started the analysis with the construction of the Anets-Samples (<bold>Figures <xref ref-type="fig" rid="F3">3F</xref>&#x2013;<xref ref-type="fig" rid="F3">H</xref></bold>) to find associations (clusters) of samples with similar profiles of the shared species richness and to discard samples-outliers. Most samples (74%) in the dataset were found to be associated (<italic>p</italic>-value &#x003C; 0.01) with at least one other sample in the network. Visualization and clustering of the network using the Markov clustering algorithm (MCL) (<xref ref-type="bibr" rid="B42">Van Dongen, 2008</xref>) revealed seven large disconnected component and 206 clusters (Supplementary Figure <xref ref-type="supplementary-material" rid="SM1">S3</xref>). We next used an enrichment analysis (see section &#x201C;Materials and Methods&#x201D;) to annotate the inferred clusters by sample metadata (sex of the human subject, body site, and sub-site) and to assign significantly enriched body sites and sub-sites to the clusters. <bold>Figure <xref ref-type="fig" rid="F5">5A</xref></bold> shows components of the network comprised of oral and skin samples colored according to sub-sites. Samples that belonged to a particular subsite tended to cluster together according to the Figure and to the enrichment analysis. Thus, the Anets-Samples allowed us to predict origin of samples from different oral sub-sites, such as keratinized gingiva, buccal mucosa, hard palate, saliva, throat, and tongue. There were also several distinct associations of samples originated from multiple skin subsides. Interestingly, one association of samples (cluster 16 in <bold>Figure <xref ref-type="fig" rid="F5">5A</xref></bold>) was comprised of male human subjects.</p>
<fig id="F5" position="float">
<label>FIGURE 5</label>
<caption><p>Associations of rare species and samples in the HMP study. <bold>(A)</bold> Associations of oral and skin samples. Samples in the networks are represented by filled circles colored according to the sampling sub sites in the HMP study. Edges between circles indicate significant association between samples in terms of the shared species richness. Red and black ovals label associations predicted by clustering of the Anets-Samples. Name of each cluster was inferred by the enrichment analysis as described in Section &#x201C;Materials and Methods.&#x201D; Black ovals indicate clusters (2, 10, and 16) that were further analyzed by the Anets-OTUs algorithm. <bold>(B)</bold> Associations of rare species discovered by Anets-OTUs in samples comprised clusters 2, 10, and 16. Small components of the network are not included. OTUs are represented by nodes (filled circles) where color indicates different clusters inferred by Markov clustering. The largest clusters are referred as communities. Edges between nodes represent significant associations (<italic>p</italic> &#x003C; 0.001) between a pair of OTUs. They are labeled by black ovals and have associated bar charts showing the number of OTUs from most abundant taxonomic ranks labeled as G (<italic>Genus</italic>) and O (<italic>Order</italic>). <bold>(C)</bold> Heat map of abundances (in terms of sequencing reads) of associating microbial OTUs (horizontal axis) in three distinct clusters of samples (vertical axis) collected from the human skin. OTUs are grouped according to the microbial communities inferred by Anets-OTUs and sorted by mean abundance; samples are grouped according to clusters inferred by Anets-Samples and sorted by the shared richness. Each cell shows the number of OTU reads. Color of cells in the map shows the number of reads representing the OTUs in the sample: 10 reads or more (dark orange), from 1 to 10 reads (light orange), and not represented by reads (gray). Cluster IDs indicated in <bold>(A,B)</bold> are shown in vertical and horizontal bars of the heat map respectively.</p></caption>
<graphic xlink:href="fmicb-09-00297-g005.tif"/>
</fig>
<p>We further focused the analysis on 314 skin samples that represent three distinct, disconnected in the Anets-Samples, clusters labeled by black ovals in <bold>Figure <xref ref-type="fig" rid="F5">5A</xref></bold>. To reveal communities of microbial OTUs discriminating these clusters we built the Anets-OTUs using, as input, an OTU table comprised of these 314 samples in columns and 43140 OTUs in rows. The generated Anets-OTUs included 412 associated OTUs (<italic>p</italic>-value &#x003C; 0.001); and subsequent clustering of the network revealed four major microbial communities (<bold>Figure <xref ref-type="fig" rid="F5">5B</xref></bold>). The enrichment analysis showed statistically significant links between the communities and the Anets-Samples clusters (Supplementary Table <xref ref-type="supplementary-material" rid="SM1">S2</xref>). The map generated from the initial dataset by extracting abundance values of the associating OTUs further confirmed the links (<bold>Figure <xref ref-type="fig" rid="F5">5C</xref></bold>). Importantly, the three distinct clusters of samples, originated from skin of different human subjects, have significant differences in microbial communities at the OTU level, although most OTUs contributing to the difference belonged to the genus <italic>Propionibacterium.</italic> Indeed, microbial community 1 comprised of OTUs of the genus <italic>Propionibacterium</italic> (<bold>Figure <xref ref-type="fig" rid="F5">5B</xref></bold>) was significantly enriched in Anets-Samples clusters 2 and 10 (<bold>Figure <xref ref-type="fig" rid="F5">5A</xref></bold>), but not in Anets-Samples cluster 16 (<bold>Figure <xref ref-type="fig" rid="F5">5A</xref></bold>). Microbial community 2 comprised of a distinct set of OTUs from the same genus (<bold>Figure <xref ref-type="fig" rid="F5">5B</xref></bold>) was significantly enriched only in Anets-Samples cluster 2 (<bold>Figure <xref ref-type="fig" rid="F5">5A</xref></bold>). The third microbial community comprised of OTUs of the genera <italic>Propionibacterium</italic> and <italic>Actinomycetales</italic> (<bold>Figure <xref ref-type="fig" rid="F5">5B</xref></bold>) was enriched in Anets-Samples cluster 10 (<bold>Figure <xref ref-type="fig" rid="F5">5A</xref></bold>), and the fourth microbial community (OTUs from the genera <italic>Staphylococcus</italic> and <italic>Propionibacterium</italic>) was enriched in Anets-Samples cluster 16 comprised of male human subjects. The <italic>p</italic>-value 0.01 (Fisher exact test) was used as the significance threshold in the enrichment analysis. Thus, the OTU level clustering was important to discriminate microbial communities of the clustered samples.</p>
</sec>
</sec>
<sec><title>Validation of the Anets Algorithm</title>
<p>We use 1250 oral samples of HMP to investigate the robustness and limitations of the Anets algorithm, to compare it with other methods and to explore potential biases and confounding factors.</p>
<sec><title>Library Size as Potential Confounding Factor</title>
<p>The Library Size (LS) affects the number of identified rare species and, therefore, may introduce a technical bias in the OTU table if there are significant differences in LSs among studied environments. We explore this affect using known annotations of oral samples by subsites. Specifically, pair-wise comparisons were performed among all the subsites in terms of the library size and then in terms of the number of rare OTUs. We find that log-transformed values of the library size in the oral samples have a normal distribution (Supplementary Figure <xref ref-type="supplementary-material" rid="SM1">S4</xref>). Significant differences between average values (Wilcoxon test) were observed for 2 out of 15 pair-wise comparisons (Supplementary Figure <xref ref-type="supplementary-material" rid="SM1">S5</xref>), and only for one comparison, &#x201C;Tongue dorsum&#x201D; versus &#x201C;Hard palate,&#x201D; the difference in LS is also associated with the significantly different number of rare species (Supplementary Table <xref ref-type="supplementary-material" rid="SM1">S3</xref>). In general, most rare OTUs are the least abundant and the mean number of such OTUs is significantly different in 60% subsite pairs (Supplementary Figure <xref ref-type="supplementary-material" rid="SM1">S6</xref> and Supplementary Table <xref ref-type="supplementary-material" rid="SM1">S3</xref>). When we consider less rare OTUs we find a significant increase in the mean abundance of the OTUs (Supplementary Figure <xref ref-type="supplementary-material" rid="SM1">S6</xref>) and significant decrease in the % of subsite pairs that are significantly different in terms of the number of rare OTUs, from 60 (occupancy threshold 1%) to 40, 20, and 13% (occupancy threshold 5, 10, and 25%, respectively) (Supplementary Table <xref ref-type="supplementary-material" rid="SM1">S3</xref>). According to the results, the LS may be a confounding factor in the analysis of rare OTUs, although the different LS doesn&#x2019;t necessary translate to different number of rare species, at least for oral subsites. There is a clear trend for oral subsites to be less different in terms of the number of rare OTUs when we increase the occupancy threshold. This trend, however, doesn&#x2019;t associate with different LSs of the subsites.</p>
</sec>
<sec><title>Importance of Rare OTUs for Anets-Samples Construction</title>
<p>We further explore how important rare and common taxa for correct grouping of samples. We separated species identified in 1250 oral samples to two groups, rare (occupancy is between 0.5 and 25% samples) and common (occupancy > 25%). Then we generated three OTU tables; comprised of only rare OTUs, rare and common OTUs, and only common OTUs. We find that considering only rare OTUs we reduce the resolution of the Principal coordinates analysis (PCoA) plot (Supplementary Figure <xref ref-type="supplementary-material" rid="SM1">S7A</xref>). In case of Anets-Samples (Supplementary Figure <xref ref-type="supplementary-material" rid="SM1">S7B</xref>), we actually increase the resolution and were able to detect a batch effect among oral samples. The effect was probably masked by the presence of common species because we didn&#x2019;t observe the effect if we use OTU table with only common OTUs (<bold>Figure <xref ref-type="fig" rid="F6">6D</xref></bold>) or with common and rare OTUs (<bold>Figure <xref ref-type="fig" rid="F6">6C</xref></bold>). In spite of the batch effect, the grouping of samples within the large batch (Supplementary Figure <xref ref-type="supplementary-material" rid="SM1">S7B</xref>) was consistent with the studied oral subsites, although not as evident as for Anets-Samples based on a combined set of rare and common OTUs (<bold>Figure <xref ref-type="fig" rid="F6">6D</xref></bold>). The PCoA plots generated for OTU tables by including or excluding the rare OTUs were rather similar (<bold>Figures <xref ref-type="fig" rid="F6">6A,B</xref></bold>) suggesting that we will not significantly effect the interpretation of the results by excluding rare species in the PCoA. However, by excluding the rare species when building Anets (<bold>Figure <xref ref-type="fig" rid="F6">6D</xref></bold>), we essentially decrease our chance to cluster samples according to subsides (<bold>Figures <xref ref-type="fig" rid="F6">6C,D</xref></bold>, right sides) and also decrease the number of associated samples (<italic>p</italic> > 0.05) from 1082 (87%) to 981 (78%). The results demonstrate high sensitivity of the Anets algorithm to signals from both, rare and more abundant, OTUs. The result is not surprising. To build Anets we have to collect additional statistics on co-occurrence of species with the rest and on the shared species richness to establish the pair-wise associations in Anets-Samples and in Anets-OTUs. By excluding some species, either less abundant or more abundant, we loose information important for the analysis and impair the results. Building Anets after filtering common species, however, may allow us to see biases obscured by the presence of common taxa.</p>
<fig id="F6" position="float">
<label>FIGURE 6</label>
<caption><p>Principal coordinates analysis (PCoA) plots and Anets-Samples for oral samples with or without rare OTUs. <bold>(A)</bold> PCoA plot generated by including rare OTUs. <bold>(B)</bold> PCoA plot generated by excluding rare OTUs. <bold>(C)</bold> Anets-Samples generated by including rare OTUs. <bold>(D)</bold> Anets-Samples generated by excluding rare OTUs. Large clusters (more than 10 samples) are bordered by rectangles.</p></caption>
<graphic xlink:href="fmicb-09-00297-g006.tif"/>
</fig>
</sec>
<sec><title>Topological Differences Between Networks Generated Using Anets and Unweighted UniFrac Distances</title>
<p>UniFrac is widely used distance metric incorporating phylogenetic information to compare microbial communities. All taxa, common and rare, are included in calculation of the distance. The metric, therefore, may be an alternative way to construct the network of samples by incorporating the phylogenetic signals from rare species. We have compared the network of samples generated by Anets with those based on the Unweighted UniFrac (UUF) distances. The &#x2018;phyloseq&#x2019; package (<xref ref-type="bibr" rid="B24">McMurdie and Holmes, 2013</xref>) was used to calculate the UUF distance for each pair of oral samples. Two networks were generated with thresholds for the distance to be equal 0.95 and 0.98. We chose these thresholds because we find it difficult to break the UniFrac-based networks into clusters because of low clustering coefficients and high centralization if compared with the Anets-Samples (Supplementary Table <xref ref-type="supplementary-material" rid="SM1">S4</xref>). We could increase the clustering coefficient and reduce centralization by increasing the distance measure but it also reduced the number of nodes in the UUF network. Using a looser threshold (0.95) we had 1243 nodes that were vastly interconnected by 68284 edges into one large cluster (Supplementary Figure <xref ref-type="supplementary-material" rid="SM1">S8</xref>). By increasing the distance threshold to 0.98 we generated a network with 868 samples and 6457 edges and a greater clustering coefficient (<bold>Figure <xref ref-type="fig" rid="F7">7B</xref></bold>). The generated clusters, however, were not as consistent with the annotation of subsites as in case of Anets-Samples network (<bold>Figure <xref ref-type="fig" rid="F7">7A</xref></bold>). Although in general all three networks showed the same trend of separation of subsites &#x2018;keratinized gingiva&#x2019; and &#x2018;bunccal mucosa&#x2019; from &#x2018;saliva,&#x2019; &#x2018;tongue dorsum,&#x2019; and &#x2018;throat,&#x2019; it was easier to cluster the Anets-based network, and, importantly, many large clusters in the Anets network were enriched with samples originated from the same subsite (<bold>Figure <xref ref-type="fig" rid="F7">7A</xref></bold>, right side). The comparison reveals a distinct topology of the Anets network if compared with UUF-based networks and a better association of the topological structure with oral subsites. The more centralized topology of the UUF-based network may be suitable for a global overview of the samples. The Anets-based network may perform better if we want a greater level of detail and more granularity in grouping the samples.</p>
<fig id="F7" position="float">
<label>FIGURE 7</label>
<caption><p>Networks of oral samples and their clustering by the Markov clustering algorithm (MCL) with the same parameters. <bold>(A)</bold> The network was generated using Anets-Samples algorithm. The large clusters (more than 10 samples) are bordered by rectangles. <bold>(B)</bold> The network was generated using Unweighted UniFrac (UUF) distances as measure of pair-wise similarity of the samples (nodes) with the threshold 0.98.</p></caption>
<graphic xlink:href="fmicb-09-00297-g007.tif"/>
</fig>
</sec>
<sec><title>Robustness of the Anets Algorithm</title>
<p>Several different factors including sampling strategy and sample handling, the choice of universal 16S rRNA gene PCR primers, DNA extraction methods, amplification artifacts, such as chimeras, and computational methods employed to produce the OTU table from sequencing reads may contribute to different results in the 16S rRNA gene profiling studies. All of them can affect the number of rare species and the produced Anets. To evaluate the robustness of the algorithm we explore changes in the structure of Anets based on OTU tables constructed by different processing pipeline, by different 16S rRNA gene variable region for sequencing, and by a different subset of oral samples. Namely, we consider three different OTU tables produced for oral samples by two commonly used 16S rRNA amplicon data processing pipelines, MOTHUR (<xref ref-type="bibr" rid="B34">Schloss et al., 2009</xref>) and QIIME (<xref ref-type="bibr" rid="B6">Caporaso et al., 2010</xref>) that utilize different algorithms to construct the OTU table. The former OTU table was produced by a high quality-filtering MOTHUR pipeline (<xref ref-type="bibr" rid="B32">Schloss et al., 2011</xref>) with low overall chimera rate. The formation of the chimeric sequences is a well-known factor contributing to erroneous OTUs and to overrated species richness (<xref ref-type="bibr" rid="B2">Ashelford et al., 2005</xref>). We also compared OTU tables generated by QIIME pipeline from sequencing of 16S rRNA gene variable regions 1&#x2013;3, referred as HMP v13 (Q), and variable regions 3&#x2013;5, referred as HMPv35(Q). These three OTU tables were generated for the same subset of 1250 oral samples. In addition, we included an OTU table (QIIME pipeline, v35) produced for a different subset of 1025 oral samples in the comparison. We refer to the table as HMPv35(Q) validation. The tables were downloaded from the NIH Human Microbiome Project websites and were comprised of different number of OTUs, from 8640 OTUs in HMPv13(M) to 26399 OTUs in HMPv35(Q) Validation. Most OTUs (95&#x2013;97%) in the tables were rare OTUs (found in less than 25% samples). The Anets-Samples was generated for each OTU table and visualized by Cytoscape using the same parameters. Comparison of the produced networks reveals not only their similar statistical characteristics (Supplementary Table <xref ref-type="supplementary-material" rid="SM1">S5</xref>), but also a similar trend in grouping of samples among subtypes (<bold>Figure <xref ref-type="fig" rid="F8">8</xref></bold>). The MOTHUR and QIIME networks, however, were surprisingly different in their ability to separate different subsites (<bold>Figures <xref ref-type="fig" rid="F8">8A,B</xref></bold>). The MOTHUR network performed well in separating tongue dorsum and throat from other subsites, but not as good in separating keratinized gingiva and buccal mucosa, while the QIIME v13 network performed better in separating keratinized gingiva and buccal mucosa from other subsites, and not as good for tongue dorsum and throat. The difference persists when we run Anets with different parameters. An interesting symmetrical structure, related to the batch effect, was revealed in the Anets-samples produced for OTU table HMPv35(Q) (<bold>Figure <xref ref-type="fig" rid="F8">8C</xref></bold>). The upper part of the network represents samples sequenced by J. Craig Venter Institute (JCVI) and the lower part representing samples sequenced by other sequencing centers. Importantly, each side of the network demonstrated similar grouping of samples into subtypes regardless of the batch affect. The network generated for a different subset of oral samples, HMPv35(Q) Validation, reveal a similar batch effect with separation of samples into subsites within each batch. Based on the results we conclude that the Anets algorithm recover similar groupings of samples from OTU tables produced by two commonly used 16S rRNA amplicon data processing pipelines regardless of the observed batch effects and type of sequencing (v13 or v35) as well as from an OTU table comprised of different samples from the same environments.</p>
<fig id="F8" position="float">
<label>FIGURE 8</label>
<caption><p>Anets-Samples generated for different OTU tables comprised of oral samples. <bold>(A)</bold> OTU table generated by QIIME pipeline from sequencing of 16S rRNA gene variable regions 1&#x2013;3. <bold>(B)</bold> OTU table generated by MOTHUR pipeline from sequencing of 16S rRNA gene variable regions 1&#x2013;3. <bold>(C)</bold> OTU table generated by QIIME pipeline from sequencing of 16S rRNA gene variable regions 3&#x2013;5. <bold>(D)</bold> OTU table generated by QIIME pipeline from sequencing of 16S rRNA gene variable regions 3&#x2013;5 of a distinct set of oral samples.</p></caption>
<graphic xlink:href="fmicb-09-00297-g008.tif"/>
</fig>
</sec>
</sec></sec>
<sec><title>Discussion</title>
<p>In this proof of concept study we aimed to demonstrate the use of the Anets-based computational framework for linking associations of rare OTUs to their environment. Results of the study demonstrate that a combination of the Anets-OTUs and Anets-Samples has a potential to serve as a powerful unsupervised methods for discovering relationships and associations of rare species from phylogenetic marker gene datasets used in microbiome studies. Applying the framework to analyses of microbiomes in <italic>Populus</italic> roots and on Human body sites we were able to reproduce associations of samples in these complex environment and associations of species that were consistent with the existing metadata and the analyses described in the previous literature. In case of Human microbiomes we were able to identify associations of co-dependent rare OTUs and link them to sub-sides of the human body. Similar observations were reported by Ding and Schloss (<xref ref-type="bibr" rid="B9">Ding and Schloss, 2014</xref>) using the Dirichlet multinomial mixture models (<xref ref-type="bibr" rid="B17">Holmes et al., 2012</xref>).</p>
<p>An important observation from the analysis of <italic>Populus</italic> and Human microbiomes by the approach is a close link between the rare microbial OTUs and specific environmental conditions. To explain the importance of rare putative species for classification of the environments we propose that the high-abundance OTUs are common among sampled environments because the environments have some common conditions stimulating outgrowth of the same putative species. The rare low-abundance OTUs are rare because each of these environments also has some specific conditions or microenvironments. These specific microenvironmental conditions may stimulate the growth species represented by rare OTUs. Although they are rare, they may be crucial for recovering the micro-environmental differences in microbiomes of the environments. It is possible that these rare OTUs, therefore, may be a better computational target for quantification of subtle differences among most variable properties of the environments, and their presence/absence pattern can be used for additional comprehensive classification of samples from the environments. New approaches to &#x2018;denoising&#x2019; sequencing data that avoid collapsing OTUs to higher taxonomic levels or <italic>a priori</italic> OTU similarity thresholds, such as ASVs approach (<xref ref-type="bibr" rid="B4">Callahan, 2017</xref>), might also further increase the ability to recover the micro-environmental differences among samples.</p>
<p>Although the results show the importance of rare OTUs in discriminating oral subsites and in revealing batch effects, they don&#x2019;t prove that the rare OTUs are real. Further experimental studies are necessary to provide a direct evidence of their existence. Models of microbial communities where a signal from rare species can be captured and compared with signals from common species would be also helpful to explore rare species and to validate the approach. There are, however, some challenges in developing a realistic model of microbial communities. Available computational tools, such as &#x201C;SPIEC-EASI&#x201D; R package (<xref ref-type="bibr" rid="B22">Kurtz et al., 2015</xref>) generate a synthetic OTU data using a random selection of species. The randomness contradicts the major assumption of the Anets algorithm that the selection of species in the sample is not random. In addition, the OTU tables simulated by a random selection don&#x2019;t necessary conform to the occupancy&#x2013;abundance relationship (<xref ref-type="bibr" rid="B16">Gaston, 1996</xref>) observed in real settings.</p>
<p>The transformation of OTU table into the OTU presence/absence values for analysis by Anets places some limitations and constraints on the approach. One such constraint is the presence of many common OTUs, such as found in more than &#x223C;75% samples. The loss of abundance data is another limitation. The information can be important for understanding dominant taxa and their interdependencies with each other and members of the rare biosphere. Another important condition for successful application of the approach is the species co-dependence in the studied environments. The condition is important to observe similar co-occurrence profiles for the associated OTUs and to simplify their clustering. Although this assumption is consistent with known metabolic and functional dependences of microbial species in different environments (<xref ref-type="bibr" rid="B20">Jousset et al., 2017</xref>), these dependences are not always the major factors that discriminate environments in a particular study.</p>
<p>Further studies are necessary to validate the proposed framework, to extend it by incorporating additional statistical tools, to provide guidelines on setting parameters for the Anets-Samples and Anets-OTUs, to explore different measures of similarity and their cutoffs, and to clarify limitations of the approach. Further work is also necessary to streamline all calculations in a package. At this point, the computations proposed in the framework are implemented by different programs, such as Anets (<xref ref-type="bibr" rid="B21">Karpinets et al., 2012</xref>), Cytoscape (<xref ref-type="bibr" rid="B37">Smoot et al., 2011</xref>), mcl (Markov clustering) (<xref ref-type="bibr" rid="B42">Van Dongen, 2008</xref>), as well as by simple in-house scripts written in R (see &#x201C;Operating Procedure to generate Anets&#x201D; in Supplementary Data Sheet <xref ref-type="supplementary-material" rid="SM1">S1</xref>). Importantly, the Anets program was implemented for a single processor to cope with a data of small scale and complexity. The program will be slow in processing large OTU tables generated by increasingly complex datasets. It is important to increase scalability of the algorithm by parallelizing independent computation steps and by designing efficient representation of the sparse data for better memory management.</p>
<p>We have thus taken the first initial steps in incorporating the &#x201C;rare biosphere&#x201D; of microbial community data and linking their contribution to environmental and phenotypic characteristics via the Anets algorithm. More interesting relationships may be found by this approach as the rate of accumulation of microbial data in different environments continues to increase and the cost of sequencing continues to decrease. We believe that the Anets technique holds unexplored potential for an in-depth analysis of the data. The approach is useful to reveal inherent patterns in the data without <italic>a priori</italic> knowledge of factors influencing the microbial communities as well as to visualize the patterns as networks or maps.</p>
</sec>
<sec id="s1" sec-type="materials|methods">
<title>Materials and Methods</title>
<sec><title>Mock Dataset</title>
<p>The dataset was generated manually to illustrate the ANETs approach, and represents an oversimplified case of two artificial environments populated by eight hypothetical species. The environments were randomly sampled in 12 locations as described in <bold>Figure <xref ref-type="fig" rid="F3">3A</xref></bold> in more detail. The major goal of the dataset was to provide an intuitive illustration of the proposed framework.</p>
</sec>
<sec><title><italic>Populus</italic> Root Microbiome Dataset</title>
<p>The dataset was described by <xref ref-type="bibr" rid="B35">Shakya et al. (2013)</xref>. It includes 84 samples that represent a combined (fungal and bacterial) microbiome in rhizoshpere (46 samples) and endosphere (38 samples) of 23 mature <italic>Populus deltoids</italic> trees growing in Tennessee (11 trees) and North Carolina (12 trees) taken in May (23 rhizosphere samples and 21 endosphere samples) and in September (23 rhizosphere samples and 17 endosphere samples). Bacterial (16S rRNA) and fungal (28S rRNA) genes from the samples were sequenced to estimate the abundance of fungal and bacterial OTUs and their association with plant phenotypic, genotypic, and environmental parameters. We initially explore abundance&#x2013;occupancy relationships in the dataset using all rhizosphere and endosphere samples of the study (<bold>Figure <xref ref-type="fig" rid="F2">2</xref></bold>) and then focused our further analysis on 46 rhizosphere samples. The OTU table for these samples was processed using the Anets tool in two ways: (1) to build the association network of OTUs, Anets-OTUs, and (2) to build the association network of samples, Anets-Samples. The Anets-Samples was generated using the Pearson correlation as the measure of association for each pair of samples and a <italic>p</italic>-value threshold equal 0.01. The Anets-OTUs was generated using OTUs that occurred in 10 or more samples. This threshold was necessary to reduce time and memory used by the Anets program for processing the data. The <italic>p</italic>-value threshold was set to 0.05. Markov clustering (<xref ref-type="bibr" rid="B42">Van Dongen, 2008</xref>) with the inflation value 1.8 was used to cluster the networks, and Cytoscape (<xref ref-type="bibr" rid="B37">Smoot et al., 2011</xref>) was used to visualize the networks. Soil properties for samples collected near 23 trees were analyzed using hierarchical clustering. All soil parameters were normalized before the clustering using the average value of the parameter and its standard deviation. The hierarchical clustering of soil samples was performed using Pearson correlation as the similarity metric and centroid linkage as the clustering method. The analysis was implemented using the Cluster 3 program (<xref ref-type="bibr" rid="B12">Eisen et al., 1998</xref>). The Java Treeview<sup><xref ref-type="fn" rid="fn02">2</xref></sup> was used to visualize the clusters. The &#x2018;vegan&#x2019; R package (<xref ref-type="bibr" rid="B10">Dixon, 2003</xref>), function &#x2018;capscale,&#x2019; was used to calculate variance partitioning the same way as in the initial study (<xref ref-type="bibr" rid="B35">Shakya et al., 2013</xref>).</p>
</sec>
<sec><title>Human Microbiome Dataset</title>
<p>The dataset was downloaded from the HMP website <ext-link ext-link-type="uri" xlink:href="http://www.hmpdacc.org/HMQCP/">http://www.hmpdacc.org/HMQCP/</ext-link>. The dataset is based on the analysis of 16S rRNA gene variable regions 1&#x2013;3 (V13) and includes 2910 samples obtained from 18 different body sites of 180 healthy men and women. Each site was represented by 145&#x2013;190 samples, except the vagina (87&#x2013;89 samples). The data is described in more detail by the Human Microbiome consortia publications (<xref ref-type="bibr" rid="B30">Project, 2012</xref>). The input for the analysis was the OTU table generated by the project from sequencing reads by the QIIME (Quantitative Insights Into Microbial Ecology) software (<xref ref-type="bibr" rid="B6">Caporaso et al., 2010</xref>). The table is comprised of 43140 OTUs and 2910 samples. For the cluster enrichment analysis we used publically available sample metadata, sex of the participant and body site.</p>
<p>The downloaded OTU table was processed using the Anets-Samples algorithm to build the association network of samples. The network was generated using the Pearson correlation as the measure of association for each pair of samples and the <italic>p</italic>-value threshold 0.01. The <italic>p</italic>-values were calculated using a Monte Carlo simulation approach as described before (<xref ref-type="bibr" rid="B21">Karpinets et al., 2012</xref>). The network was visualized using edge-weighted (by <italic>p-</italic>value) spring embedded layout in Cytoscape (<xref ref-type="bibr" rid="B37">Smoot et al., 2011</xref>). The Anets-OTUs was generated for a subset of 314 skin samples selected by the analysis as significantly associated (clusters with IDs 2, 10, and 16 in <bold>Figure <xref ref-type="fig" rid="F2">2</xref></bold>). The OTUs table of the samples was used as input for the Anets-OTUs algorithm with the following parameters: the minimum number of samples per OTU is 15, and a <italic>p</italic>-value threshold is 0.001. The stringent thresholds were important to limit memory use and the processing time for the Anets program. Markov clustering (<xref ref-type="bibr" rid="B42">Van Dongen, 2008</xref>) with the inflation value 1.8 was used to cluster the networks, and Cytoscape (<xref ref-type="bibr" rid="B37">Smoot et al., 2011</xref>) was used to visualize the networks and the clustering results. An edge-weighted (by <italic>p</italic>-value) spring embedded layout was used for the network visualization.</p>
</sec>
<sec><title>Enrichment Analysis</title>
<p>The analysis was used to find samples enriched in each cluster of OTUs in the Anets-OTUs and to find phenotypic or environmental characteristics enriched in each clusters of samples in the Anets-Samples. In both cases the analysis was done using the Fisher&#x2019;s exact test to examine independence of rows and columns in a two-dimensional contingency table generated by the following algorithms.</p>
<p>We identified samples enriched in the cluster of OTUs (Anets-OTUs) by linking each clustered OTU to the sample and finding those samples that have the greatest representation by OTUs within the cluster. We used the fisher.test() function in R to calculate probability that the number of OTUs representing a sample in the cluster is significantly greater than the number expected by randomly selecting OTUs in the cluster from a set of all associated OTUs, regardless of sample of origin. All associated OTUs were found as a set of unique OTUs associated significantly (<italic>p</italic>-value &#x003C; 0.05) with at least one other OTU in the Anets-OTUs. We classified the associated OTUs in two ways: if the OTU belongs to the sample or not, and if the OTU belongs to the cluster or not. Using this classification we created the contingency table with the number of the sample&#x2019;s OTUs in the cluster, the number of associated OTUs in the sample, the number of OTUs in the cluster that are not from the sample, and the number of associated OTUs that are not found in the sample. Because we performed several statistical tests simultaneously on the same data set, <italic>p</italic>-values calculated by the Fisher exact were adjusted using Bonferroni correction.</p>
<p>Specific characteristics (such as soil conditions in the <italic>Populus</italic> rhizosphere dataset or body subsites in the HMP dataset) enriched in the cluster of samples (Anets-Samples) were identified by linking each sample to the characteristics and revealing the characteristics represented by the greatest number of samples within the cluster. We used the Fisher&#x2019;s exact test to calculate probabilities that number of samples representing a characteristic within the cluster is significantly greater than the number expected by randomly selecting samples into the cluster from a set of all associated samples. In this case the background of the comparison was a set of all associated samples; they were classified for each cluster and each characteristic to create the contingency table as (i) representing the environmental/phenotypic characteristic or not and (ii) belonging to the cluster or not.</p>
</sec>
<sec><title>Generating Networks and Their Statistics for Validation</title>
<p>All datasets for validation were downloaded from the HMP website from the link <ext-link ext-link-type="uri" xlink:href="https://www.hmpdacc.org/hmp/HMMCP/">https://www.hmpdacc.org/hmp/HMMCP/</ext-link> for 16S rRNA amplicon datasets processed by QQIME and the link <ext-link ext-link-type="uri" xlink:href="https://www.hmpdacc.org/hmp/HMQCP/">https://www.hmpdacc.org/hmp/HMQCP/</ext-link> for datasets processed by MOTHUR software package using a high stringency approach (<xref ref-type="bibr" rid="B32">Schloss et al., 2011</xref>). The &#x2018;phyloseq&#x2019; R package (<xref ref-type="bibr" rid="B24">McMurdie and Holmes, 2013</xref>) was used to download the datasets, to create OTU tables for oral samples for comparisons, to filter OTUs by occupancy, to generate the UUF distances (default parameters) and to produce PCoA plots (distance measure was set to &#x2018;binary&#x2019;). The Anets-Samples were generated using Pearson correlation as measure of similarity and setting <italic>p</italic>-value threshold to 0.05. The networks were loaded into Cytoscape software, visualized using spring embedded layout without edge weighting and clustered using MCL algorithm by a Cytoscape plugin &#x2018;clusterMaker2&#x2019;<sup><xref ref-type="fn" rid="fn03">3</xref></sup> by setting the inflation value to 2.0. Another Cytoscape plugin &#x2018;Network Analyzer&#x2019;<sup><xref ref-type="fn" rid="fn04">4</xref></sup> was used to explore topology of the networks and to produce their statistics.</p>
</sec>
</sec>
<sec><title>Author Contributions</title>
<p>TK conceived the study. CWS contributed to the preparation, collection, and analysis of the data. JW, AF, CWS, and JZ provided mentoring guidance and advices throughout the study. TK, VG, JW, AF, CWS, and JZ contributed to writing the manuscript.</p>
</sec>
<sec><title>Conflict of Interest Statement</title>
<p>The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.</p>
</sec>
</body>
<back>
<fn-group>
<fn fn-type="financial-disclosure">
<p><bold>Funding.</bold> This work was made possible through support from the Moon Shorts Programs at the University of Texas MD Anderson Cancer Center and from the Genomic Science Program, United States Department of Energy, Office of Science, Biological and Environmental Research, as part of the Plant Microbe Interfaces Scientific Focus Area and the BioEnergy Science Center (BESC). Oak Ridge National Laboratory is managed by UT-Battelle LLC, for the United States Department of Energy under contract DE-AC05-00OR22725. The submitted manuscript has been authored by a contractor of the United States Government under contract DE-AC05-00OR22725.</p>
</fn>
</fn-group>
<ack>
<p>We would like to thank Dr. Michael Robeson and Dr. Migun Shakya for their comments, editing the manuscript, and for helpful discussions. We thank Dr. Migun Shakya for providing us a script to calculate variance partitioning the same way as in his original study. We also thank the reviewers of the manuscript for their helpful comments and analysis suggestions that greatly improved the manuscript.</p>
</ack>
<sec sec-type="supplementary material">
<title>Supplementary Material</title>
<p>The Supplementary Material for this article can be found online at: <ext-link ext-link-type="uri" xlink:href="https://www.frontiersin.org/articles/10.3389/fmicb.2018.00297/full#supplementary-material">https://www.frontiersin.org/articles/10.3389/fmicb.2018.00297/full#supplementary-material</ext-link></p>
<supplementary-material xlink:href="Data_Sheet_1.pdf" id="SM1" mimetype="application/pdf" xmlns:xlink="http://www.w3.org/1999/xlink"/>
</sec>
<ref-list>
<title>References</title>
<ref id="B1"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Aagaard</surname> <given-names>K.</given-names></name> <name><surname>Petrosino</surname> <given-names>J.</given-names></name> <name><surname>Keitel</surname> <given-names>W.</given-names></name> <name><surname>Watson</surname> <given-names>M.</given-names></name> <name><surname>Katancik</surname> <given-names>J.</given-names></name> <name><surname>Garcia</surname> <given-names>N.</given-names></name><etal/></person-group> (<year>2013</year>). <article-title>The human microbiome project strategy for comprehensive sampling of the human microbiome and why it matters.</article-title> <source><italic>FASEB J.</italic></source> <volume>27</volume> <fpage>1012</fpage>&#x2013;<lpage>1022</lpage>. <pub-id pub-id-type="doi">10.1096/fj.12-220806fj.12-220806</pub-id> <pub-id pub-id-type="pmid">23165986</pub-id></citation></ref>
<ref id="B2"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ashelford</surname> <given-names>K. E.</given-names></name> <name><surname>Chuzhanova</surname> <given-names>N. A.</given-names></name> <name><surname>Fry</surname> <given-names>J. C.</given-names></name> <name><surname>Jones</surname> <given-names>A. J.</given-names></name> <name><surname>Weightman</surname> <given-names>A. J.</given-names></name></person-group> (<year>2005</year>). <article-title>At least 1 in 20 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies.</article-title> <source><italic>Appl. Environ. Microbiol.</italic></source> <volume>71</volume> <fpage>7724</fpage>&#x2013;<lpage>7736</lpage>. <pub-id pub-id-type="doi">10.1128/aem.71.12.7724-7736.2005</pub-id> <pub-id pub-id-type="pmid">16332745</pub-id></citation></ref>
<ref id="B3"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Bascompte</surname> <given-names>J.</given-names></name> <name><surname>Jordano</surname> <given-names>P.</given-names></name> <name><surname>Melian</surname> <given-names>C. J.</given-names></name> <name><surname>Olesen</surname> <given-names>J. M.</given-names></name></person-group> (<year>2003</year>). <article-title>The nested assembly of plant-animal mutualistic networks.</article-title> <source><italic>Proc. Natl. Acad. Sci. U.S.A.</italic></source> <volume>100</volume> <fpage>9383</fpage>&#x2013;<lpage>9387</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.16335761001633576100</pub-id> <pub-id pub-id-type="pmid">12881488</pub-id></citation></ref>
<ref id="B4"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Callahan</surname> <given-names>B. J.</given-names></name></person-group> (<year>2017</year>). <article-title>Exact sequence variants should replace operational taxonomic units in marker-gene data analysis.</article-title> <source><italic>ISME J.</italic></source> <volume>11</volume> <fpage>2639</fpage>&#x2013;<lpage>2643</lpage>. <pub-id pub-id-type="doi">10.1038/ismej.2017.119</pub-id> <pub-id pub-id-type="pmid">28731476</pub-id></citation></ref>
<ref id="B5"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Callahan</surname> <given-names>B. J.</given-names></name> <name><surname>McMurdie</surname> <given-names>P. J.</given-names></name> <name><surname>Rosen</surname> <given-names>M. J.</given-names></name> <name><surname>Han</surname> <given-names>A. W.</given-names></name> <name><surname>Johnson</surname> <given-names>A. J. A.</given-names></name> <name><surname>Holmes</surname> <given-names>S. P.</given-names></name></person-group> (<year>2016</year>). <article-title>DADA2: high-resolution sample inference from Illumina amplicon data.</article-title> <source><italic>Nat. Methods</italic></source> <volume>13</volume> <fpage>581</fpage>&#x2013;<lpage>583</lpage>. <pub-id pub-id-type="doi">10.1038/Nmeth.3869</pub-id> <pub-id pub-id-type="pmid">27214047</pub-id></citation></ref>
<ref id="B6"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Caporaso</surname> <given-names>J. G.</given-names></name> <name><surname>Kuczynski</surname> <given-names>J.</given-names></name> <name><surname>Stombaugh</surname> <given-names>J.</given-names></name> <name><surname>Bittinger</surname> <given-names>K.</given-names></name> <name><surname>Bushman</surname> <given-names>F. D.</given-names></name> <name><surname>Costello</surname> <given-names>E. K.</given-names></name><etal/></person-group> (<year>2010</year>). <article-title>QIIME allows analysis of high-throughput community sequencing data.</article-title> <source><italic>Nat. Methods</italic></source> <volume>7</volume> <fpage>335</fpage>&#x2013;<lpage>336</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.f.303nmeth.f.303</pub-id></citation></ref>
<ref id="B7"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Coveley</surname> <given-names>S.</given-names></name> <name><surname>Elshahed</surname> <given-names>M. S.</given-names></name> <name><surname>Youssef</surname> <given-names>N. H.</given-names></name></person-group> (<year>2015</year>). <article-title>Response of the rare biosphere to environmental stressors in a highly diverse ecosystem (Zodletone spring. OK, USA).</article-title> <source><italic>PeerJ</italic></source> <volume>3</volume>:<issue>e1182</issue>. <pub-id pub-id-type="doi">10.7717/peerj.1182</pub-id> <pub-id pub-id-type="pmid">26312178</pub-id></citation></ref>
<ref id="B8"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Cripps</surname> <given-names>C. L.</given-names></name></person-group> (<year>1997</year>). <article-title>The genus Inocybe in Montana aspen stands.</article-title> <source><italic>Mycologia</italic></source> <volume>89</volume> <fpage>670</fpage>&#x2013;<lpage>688</lpage>. <pub-id pub-id-type="doi">10.2307/3761005</pub-id></citation></ref>
<ref id="B9"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Ding</surname> <given-names>T.</given-names></name> <name><surname>Schloss</surname> <given-names>P. D.</given-names></name></person-group> (<year>2014</year>). <article-title>Dynamics and associations of microbial community types across the human body.</article-title> <source><italic>Nature</italic></source> <volume>509</volume> <fpage>357</fpage>&#x2013;<lpage>360</lpage>. <pub-id pub-id-type="doi">10.1038/nature13178</pub-id> <pub-id pub-id-type="pmid">24739969</pub-id></citation></ref>
<ref id="B10"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Dixon</surname> <given-names>P.</given-names></name></person-group> (<year>2003</year>). <article-title>VEGAN, a package of R functions for community ecology.</article-title> <source><italic>J. Veg. Sci.</italic></source> <volume>14</volume> <fpage>927</fpage>&#x2013;<lpage>930</lpage>. <pub-id pub-id-type="doi">10.1111/j.1654-1103.2003.tb02228.x</pub-id></citation></ref>
<ref id="B11"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Edgar</surname> <given-names>R. C.</given-names></name> <name><surname>Flyvbjerg</surname> <given-names>H.</given-names></name></person-group> (<year>2015</year>). <article-title>Error filtering, pair assembly and error correction for next-generation sequencing reads.</article-title> <source><italic>Bioinformatics</italic></source> <volume>31</volume> <fpage>3476</fpage>&#x2013;<lpage>3482</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btv401</pub-id> <pub-id pub-id-type="pmid">26139637</pub-id></citation></ref>
<ref id="B12"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Eisen</surname> <given-names>M. B.</given-names></name> <name><surname>Spellman</surname> <given-names>P. T.</given-names></name> <name><surname>Brown</surname> <given-names>P. O.</given-names></name> <name><surname>Botstein</surname> <given-names>D.</given-names></name></person-group> (<year>1998</year>). <article-title>Cluster analysis and display of genome-wide expression patterns.</article-title> <source><italic>Proc. Natl. Acad. Sci. U.S.A.</italic></source> <volume>95</volume> <fpage>14863</fpage>&#x2013;<lpage>14868</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.95.25.14863</pub-id></citation></ref>
<ref id="B13"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Fang</surname> <given-names>H.</given-names></name> <name><surname>Huang</surname> <given-names>C.</given-names></name> <name><surname>Zhao</surname> <given-names>H.</given-names></name> <name><surname>Deng</surname> <given-names>M.</given-names></name></person-group> (<year>2017</year>). <article-title>gCoda: conditional dependence network inference for compositional data.</article-title> <source><italic>J. Comput. Biol.</italic></source> <volume>24</volume> <fpage>699</fpage>&#x2013;<lpage>708</lpage>. <pub-id pub-id-type="doi">10.1089/cmb.2017.0054</pub-id> <pub-id pub-id-type="pmid">28489411</pub-id></citation></ref>
<ref id="B14"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Faust</surname> <given-names>K.</given-names></name> <name><surname>Sathirapongsasuti</surname> <given-names>J. F.</given-names></name> <name><surname>Izard</surname> <given-names>J.</given-names></name> <name><surname>Segata</surname> <given-names>N.</given-names></name> <name><surname>Gevers</surname> <given-names>D.</given-names></name> <name><surname>Raes</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>Microbial co-occurrence relationships in the human microbiome.</article-title> <source><italic>PLoS Comput. Biol.</italic></source> <volume>8</volume>:<issue>e1002606</issue>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1002606PCOMPBIOL-D-12-00158</pub-id></citation></ref>
<ref id="B15"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Friedman</surname> <given-names>J.</given-names></name> <name><surname>Alm</surname> <given-names>E. J.</given-names></name></person-group> (<year>2012</year>). <article-title>Inferring correlation networks from genomic survey data.</article-title> <source><italic>PLoS Comput. Biol.</italic></source> <volume>8</volume>:<issue>e1002687</issue>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1002687</pub-id> <pub-id pub-id-type="pmid">23028285</pub-id></citation></ref>
<ref id="B16"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Gaston</surname> <given-names>K. J.</given-names></name></person-group> (<year>1996</year>). <article-title>The multiple forms of the interspecific abundance-distribution relationship.</article-title> <source><italic>OIKOS</italic></source> <volume>76</volume> <fpage>211</fpage>&#x2013;<lpage>220</lpage>. <pub-id pub-id-type="doi">10.2307/3546192</pub-id> <pub-id pub-id-type="pmid">17032375</pub-id></citation></ref>
<ref id="B17"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Holmes</surname> <given-names>I.</given-names></name> <name><surname>Harris</surname> <given-names>K.</given-names></name> <name><surname>Quince</surname> <given-names>C.</given-names></name></person-group> (<year>2012</year>). <article-title>Dirichlet multinomial mixtures: generative models for microbial metagenomics.</article-title> <source><italic>PLoS One</italic></source> <volume>7</volume>:<issue>e30126</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0030126PONE-D-11-15801</pub-id> <pub-id pub-id-type="pmid">22319561</pub-id></citation></ref>
<ref id="B18"><citation citation-type="journal"><collab>Human Microbiome Project Consortium</collab> (<year>2012</year>). <article-title>Structure, function and diversity of the healthy human microbiome.</article-title> <source><italic>Nature</italic></source> <volume>486</volume> <fpage>207</fpage>&#x2013;<lpage>214</lpage>. <pub-id pub-id-type="doi">10.1038/nature11234nature11234</pub-id></citation></ref>
<ref id="B19"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>James</surname> <given-names>A.</given-names></name> <name><surname>Pitchford</surname> <given-names>J. W.</given-names></name> <name><surname>Plank</surname> <given-names>M. J.</given-names></name></person-group> (<year>2012</year>). <article-title>Disentangling nestedness from models of ecological complexity.</article-title> <source><italic>Nature</italic></source> <volume>487</volume> <fpage>227</fpage>&#x2013;<lpage>230</lpage>. <pub-id pub-id-type="doi">10.1038/nature11214nature11214</pub-id> <pub-id pub-id-type="pmid">22722863</pub-id></citation></ref>
<ref id="B20"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Jousset</surname> <given-names>A.</given-names></name> <name><surname>Bienhold</surname> <given-names>C.</given-names></name> <name><surname>Chatzinotas</surname> <given-names>A.</given-names></name> <name><surname>Gallien</surname> <given-names>L.</given-names></name> <name><surname>Gobet</surname> <given-names>A.</given-names></name> <name><surname>Kurm</surname> <given-names>V.</given-names></name><etal/></person-group> (<year>2017</year>). <article-title>Where less may be more: how the rare biosphere pulls ecosystems strings.</article-title> <source><italic>ISME J.</italic></source> <volume>11</volume> <fpage>853</fpage>&#x2013;<lpage>862</lpage>. <pub-id pub-id-type="doi">10.1038/ismej.2016.174</pub-id> <pub-id pub-id-type="pmid">28072420</pub-id></citation></ref>
<ref id="B21"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Karpinets</surname> <given-names>T. V.</given-names></name> <name><surname>Park</surname> <given-names>B. H.</given-names></name> <name><surname>Uberbacher</surname> <given-names>E. C.</given-names></name></person-group> (<year>2012</year>). <article-title>Analyzing large biological datasets with association networks.</article-title> <source><italic>Nucleic Acids Res.</italic></source> <volume>40</volume>:<issue>e131</issue>. <pub-id pub-id-type="doi">10.1093/nar/gks403</pub-id> <pub-id pub-id-type="pmid">22638576</pub-id></citation></ref>
<ref id="B22"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Kurtz</surname> <given-names>Z. D.</given-names></name> <name><surname>Muller</surname> <given-names>C. L.</given-names></name> <name><surname>Miraldi</surname> <given-names>E. R.</given-names></name> <name><surname>Littman</surname> <given-names>D. R.</given-names></name> <name><surname>Blaser</surname> <given-names>M. J.</given-names></name> <name><surname>Bonneau</surname> <given-names>R. A.</given-names></name></person-group> (<year>2015</year>). <article-title>Sparse and compositionally robust inference of microbial ecological networks.</article-title> <source><italic>PLoS Comput. Biol.</italic></source> <volume>11</volume>:<issue>e1004226</issue>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1004226</pub-id> <pub-id pub-id-type="pmid">25950956</pub-id></citation></ref>
<ref id="B23"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Lynch</surname> <given-names>M. D.</given-names></name> <name><surname>Neufeld</surname> <given-names>J. D.</given-names></name></person-group> (<year>2015</year>). <article-title>Ecology and exploration of the rare biosphere.</article-title> <source><italic>Nat. Rev. Microbiol.</italic></source> <volume>13</volume> <fpage>217</fpage>&#x2013;<lpage>229</lpage>. <pub-id pub-id-type="doi">10.1038/nrmicro3400</pub-id> <pub-id pub-id-type="pmid">25730701</pub-id></citation></ref>
<ref id="B24"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>McMurdie</surname> <given-names>P. J.</given-names></name> <name><surname>Holmes</surname> <given-names>S.</given-names></name></person-group> (<year>2013</year>). <article-title>phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data.</article-title> <source><italic>PLoS One</italic></source> <volume>8</volume>:<issue>e61217</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0061217PONE-D-12-31789</pub-id> <pub-id pub-id-type="pmid">23630581</pub-id></citation></ref>
<ref id="B25"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>McMurdie</surname> <given-names>P. J.</given-names></name> <name><surname>Holmes</surname> <given-names>S.</given-names></name></person-group> (<year>2014</year>). <article-title>Waste not, want not: why rarefying microbiome data is inadmissible.</article-title> <source><italic>PLoS Comput. Biol.</italic></source> <volume>10</volume>:<issue>e1003531</issue>. <pub-id pub-id-type="doi">10.1371/journal.pcbi.1003531</pub-id> <pub-id pub-id-type="pmid">24699258</pub-id></citation></ref>
<ref id="B26"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Mi</surname> <given-names>X.</given-names></name> <name><surname>Swenson</surname> <given-names>N. G.</given-names></name> <name><surname>Valencia</surname> <given-names>R.</given-names></name> <name><surname>Kress</surname> <given-names>W. J.</given-names></name> <name><surname>Erickson</surname> <given-names>D. L.</given-names></name> <name><surname>Perez</surname> <given-names>A. J.</given-names></name><etal/></person-group> (<year>2012</year>). <article-title>The contribution of rare species to community phylogenetic diversity across a global network of forest plots.</article-title> <source><italic>Am. Nat.</italic></source> <volume>180</volume> <fpage>E17</fpage>&#x2013;<lpage>E30</lpage>. <pub-id pub-id-type="doi">10.1086/665999</pub-id> <pub-id pub-id-type="pmid">22673660</pub-id></citation></ref>
<ref id="B27"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Paulson</surname> <given-names>J. N.</given-names></name> <name><surname>Stine</surname> <given-names>O. C.</given-names></name> <name><surname>Bravo</surname> <given-names>H. C.</given-names></name> <name><surname>Pop</surname> <given-names>M.</given-names></name></person-group> (<year>2013</year>). <article-title>Differential abundance analysis for microbial marker-gene surveys.</article-title> <source><italic>Nat. Methods</italic></source> <volume>10</volume> <fpage>1200</fpage>&#x2013;<lpage>1202</lpage>. <pub-id pub-id-type="doi">10.1038/nmeth.2658</pub-id> <pub-id pub-id-type="pmid">24076764</pub-id></citation></ref>
<ref id="B28"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Pedros-Alio</surname> <given-names>C.</given-names></name></person-group> (<year>2012</year>). <article-title>The rare bacterial biosphere.</article-title> <source><italic>Ann. Rev. Mar. Sci.</italic></source> <volume>4</volume> <fpage>449</fpage>&#x2013;<lpage>466</lpage>. <pub-id pub-id-type="doi">10.1146/annurev-marine-120710-100948</pub-id> <pub-id pub-id-type="pmid">22457983</pub-id></citation></ref>
<ref id="B29"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Preheim</surname> <given-names>S. P.</given-names></name> <name><surname>Perrotta</surname> <given-names>A. R.</given-names></name> <name><surname>Martin-Platero</surname> <given-names>A. M.</given-names></name> <name><surname>Gupta</surname> <given-names>A.</given-names></name> <name><surname>Alm</surname> <given-names>E. J.</given-names></name></person-group> (<year>2013</year>). <article-title>Distribution-based clustering: using ecology to refine the operational taxonomic unit.</article-title> <source><italic>Appl. Environ. Microbiol.</italic></source> <volume>79</volume> <fpage>6593</fpage>&#x2013;<lpage>6603</lpage>. <pub-id pub-id-type="doi">10.1128/Aem.00342-13</pub-id> <pub-id pub-id-type="pmid">23974136</pub-id></citation></ref>
<ref id="B30"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Project</surname> <given-names>T. H. M.</given-names></name></person-group> (<year>2012</year>). <article-title>A framework for human microbiome research.</article-title> <source><italic>Nature</italic></source> <volume>486</volume> <fpage>215</fpage>&#x2013;<lpage>221</lpage>. <pub-id pub-id-type="doi">10.1038/nature11209nature11209</pub-id></citation></ref>
<ref id="B31"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Rosindell</surname> <given-names>J.</given-names></name> <name><surname>Hubbell</surname> <given-names>S. P.</given-names></name> <name><surname>Etienne</surname> <given-names>R. S.</given-names></name></person-group> (<year>2011</year>). <article-title>The unified neutral theory of biodiversity and biogeography at age ten.</article-title> <source><italic>Trends Ecol. Evol.</italic></source> <volume>26</volume> <fpage>340</fpage>&#x2013;<lpage>348</lpage>. <pub-id pub-id-type="doi">10.1016/j.tree.2011.03.024S0169-5347(11)00094-2</pub-id> <pub-id pub-id-type="pmid">21561679</pub-id></citation></ref>
<ref id="B32"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schloss</surname> <given-names>P. D.</given-names></name> <name><surname>Gevers</surname> <given-names>D.</given-names></name> <name><surname>Westcott</surname> <given-names>S. L.</given-names></name></person-group> (<year>2011</year>). <article-title>Reducing the effects of PCR amplification and sequencing artifacts on 16S rRNA-based studies.</article-title> <source><italic>PLoS One</italic></source> <volume>6</volume>:<issue>e27310</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0027310</pub-id> <pub-id pub-id-type="pmid">22194782</pub-id></citation></ref>
<ref id="B33"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schloss</surname> <given-names>P. D.</given-names></name> <name><surname>Westcott</surname> <given-names>S. L.</given-names></name></person-group> (<year>2011</year>). <article-title>Assessing and improving methods used in operational taxonomic unit-based approaches for 16S rRNA gene sequence analysis.</article-title> <source><italic>Appl. Environ. Microbiol.</italic></source> <volume>77</volume> <fpage>3219</fpage>&#x2013;<lpage>3226</lpage>. <pub-id pub-id-type="doi">10.1128/Aem.02810-10</pub-id> <pub-id pub-id-type="pmid">21421784</pub-id></citation></ref>
<ref id="B34"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Schloss</surname> <given-names>P. D.</given-names></name> <name><surname>Westcott</surname> <given-names>S. L.</given-names></name> <name><surname>Ryabin</surname> <given-names>T.</given-names></name> <name><surname>Hall</surname> <given-names>J. R.</given-names></name> <name><surname>Hartmann</surname> <given-names>M.</given-names></name> <name><surname>Hollister</surname> <given-names>E. B.</given-names></name><etal/></person-group> (<year>2009</year>). <article-title>Introducing mothur: open-source, platform-independent, community-supported software for describing and comparing microbial communities.</article-title> <source><italic>Appl. Environ. Microbiol.</italic></source> <volume>75</volume> <fpage>7537</fpage>&#x2013;<lpage>7541</lpage>. <pub-id pub-id-type="doi">10.1128/AEM.01541-09AEM.01541-09</pub-id> <pub-id pub-id-type="pmid">19801464</pub-id></citation></ref>
<ref id="B35"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Shakya</surname> <given-names>M.</given-names></name> <name><surname>Gottel</surname> <given-names>N.</given-names></name> <name><surname>Castro</surname> <given-names>H.</given-names></name> <name><surname>Yang</surname> <given-names>Z. K.</given-names></name> <name><surname>Gunter</surname> <given-names>L.</given-names></name> <name><surname>Labbe</surname> <given-names>J.</given-names></name><etal/></person-group> (<year>2013</year>). <article-title>A multifactor analysis of fungal and bacterial community structure in the root microbiome of mature populus deltoides trees.</article-title> <source><italic>PLoS One</italic></source> <volume>8</volume>:<issue>e76382</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0076382PONE-D-13-28933</pub-id> <pub-id pub-id-type="pmid">24146861</pub-id></citation></ref>
<ref id="B36"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sharon</surname> <given-names>I.</given-names></name> <name><surname>Kertesz</surname> <given-names>M.</given-names></name> <name><surname>Hug</surname> <given-names>L. A.</given-names></name> <name><surname>Pushkarev</surname> <given-names>D.</given-names></name> <name><surname>Blauwkamp</surname> <given-names>T. A.</given-names></name> <name><surname>Castelle</surname> <given-names>C. J.</given-names></name><etal/></person-group> (<year>2015</year>). <article-title>Accurate, multi-kb reads resolve complex populations and detect rare microorganisms.</article-title> <source><italic>Genome Res.</italic></source> <volume>25</volume> <fpage>534</fpage>&#x2013;<lpage>543</lpage>. <pub-id pub-id-type="doi">10.1101/gr.183012.114</pub-id> <pub-id pub-id-type="pmid">25665577</pub-id></citation></ref>
<ref id="B37"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Smoot</surname> <given-names>M. E.</given-names></name> <name><surname>Ono</surname> <given-names>K.</given-names></name> <name><surname>Ruscheinski</surname> <given-names>J.</given-names></name> <name><surname>Wang</surname> <given-names>P. L.</given-names></name> <name><surname>Ideker</surname> <given-names>T.</given-names></name></person-group> (<year>2011</year>). <article-title>Cytoscape 2.8: new features for data integration and network visualization.</article-title> <source><italic>Bioinformatics</italic></source> <volume>27</volume> <fpage>431</fpage>&#x2013;<lpage>432</lpage>. <pub-id pub-id-type="doi">10.1093/bioinformatics/btq675btq675</pub-id> <pub-id pub-id-type="pmid">21149340</pub-id></citation></ref>
<ref id="B38"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Sogin</surname> <given-names>M. L.</given-names></name> <name><surname>Morrison</surname> <given-names>H. G.</given-names></name> <name><surname>Huber</surname> <given-names>J. A.</given-names></name> <name><surname>Mark Welch</surname> <given-names>D.</given-names></name> <name><surname>Huse</surname> <given-names>S. M.</given-names></name> <name><surname>Neal</surname> <given-names>P. R.</given-names></name><etal/></person-group> (<year>2006</year>). <article-title>Microbial diversity in the deep sea and the underexplored &#x201C;rare biosphere&#x201D;.</article-title> <source><italic>Proc. Natl. Acad. Sci. U.S.A.</italic></source> <volume>103</volume> <fpage>12115</fpage>&#x2013;<lpage>12120</lpage>. <pub-id pub-id-type="doi">10.1073/pnas.0605127103</pub-id> <pub-id pub-id-type="pmid">16880384</pub-id></citation></ref>
<ref id="B39"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Suweis</surname> <given-names>S.</given-names></name> <name><surname>Simini</surname> <given-names>F.</given-names></name> <name><surname>Banavar</surname> <given-names>J. R.</given-names></name> <name><surname>Maritan</surname> <given-names>A.</given-names></name></person-group> (<year>2013</year>). <article-title>Emergence of structural and dynamical properties of ecological mutualistic networks.</article-title> <source><italic>Nature</italic></source> <volume>500</volume> <fpage>449</fpage>&#x2013;<lpage>452</lpage>. <pub-id pub-id-type="doi">10.1038/nature12438nature12438</pub-id> <pub-id pub-id-type="pmid">23969462</pub-id></citation></ref>
<ref id="B40"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Tsilimigras</surname> <given-names>M. C.</given-names></name> <name><surname>Fodor</surname> <given-names>A. A.</given-names></name></person-group> (<year>2016</year>). <article-title>Compositional data analysis of the microbiome: fundamentals, tools, and challenges.</article-title> <source><italic>Ann. Epidemiol.</italic></source> <volume>26</volume> <fpage>330</fpage>&#x2013;<lpage>335</lpage>. <pub-id pub-id-type="doi">10.1016/j.annepidem.2016.03.002</pub-id> <pub-id pub-id-type="pmid">27255738</pub-id></citation></ref>
<ref id="B41"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Unterseher</surname> <given-names>M.</given-names></name> <name><surname>Jumpponen</surname> <given-names>A.</given-names></name> <name><surname>Opik</surname> <given-names>M.</given-names></name> <name><surname>Tedersoo</surname> <given-names>L.</given-names></name> <name><surname>Moora</surname> <given-names>M.</given-names></name> <name><surname>Dormann</surname> <given-names>C. F.</given-names></name><etal/></person-group> (<year>2011</year>). <article-title>Species abundance distributions and richness estimations in fungal metagenomics - lessons learned from community ecology.</article-title> <source><italic>Mol. Ecol.</italic></source> <volume>20</volume> <fpage>275</fpage>&#x2013;<lpage>285</lpage>. <pub-id pub-id-type="doi">10.1111/j.1365-294X.2010.04948.x</pub-id> <pub-id pub-id-type="pmid">21155911</pub-id></citation></ref>
<ref id="B42"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Van Dongen</surname> <given-names>S.</given-names></name></person-group> (<year>2008</year>). <article-title>Graph clustering via a discrete uncoupling process.</article-title> <source><italic>SIAM J. Matrix Anal. Appl.</italic></source> <volume>30</volume> <fpage>121</fpage>&#x2013;<lpage>141</lpage>. <pub-id pub-id-type="doi">10.1137/040608635</pub-id></citation></ref>
<ref id="B43"><citation citation-type="journal"><person-group person-group-type="author"><name><surname>Youssef</surname> <given-names>N. H.</given-names></name> <name><surname>Couger</surname> <given-names>M. B.</given-names></name> <name><surname>Elshahed</surname> <given-names>M. S.</given-names></name></person-group> (<year>2010</year>). <article-title>Fine-scale bacterial beta diversity within a complex ecosystem (Zodletone Spring, OK, USA): the role of the rare biosphere.</article-title> <source><italic>PLoS One</italic></source> <volume>5</volume>:<issue>e12414</issue>. <pub-id pub-id-type="doi">10.1371/journal.pone.0012414e12414</pub-id> <pub-id pub-id-type="pmid">20865128</pub-id></citation></ref>
</ref-list>
<fn-group>
<fn id="fn01"><label>1</label><p><ext-link ext-link-type="uri" xlink:href="https://sourceforge.net/projects/Anets/">https://sourceforge.net/projects/Anets/</ext-link></p></fn>
<fn id="fn02"><label>2</label><p><ext-link ext-link-type="uri" xlink:href="http://sourceforge.net/projects/jtreeview/">http://sourceforge.net/projects/jtreeview/</ext-link></p></fn>
<fn id="fn03"><label>3</label><p><ext-link ext-link-type="uri" xlink:href="http://www.rbvi.ucsf.edu/cytoscape/clusterMaker2/">http://www.rbvi.ucsf.edu/cytoscape/clusterMaker2/</ext-link></p></fn>
<fn id="fn04"><label>4</label><p><ext-link ext-link-type="uri" xlink:href="http://apps.cytoscape.org/apps/networkanalyzer">http://apps.cytoscape.org/apps/networkanalyzer</ext-link></p></fn>
</fn-group>
</back>
</article>