Conservation of Species- and Trait-Based Modeling Network Interactions in Extremely Acidic Microbial Community Assembly

Understanding microbial interactions is essential to decipher the mechanisms of community assembly and their effects on ecosystem functioning, however, the conservation of species- and trait-based network interactions along environmental gradient remains largely unknown. Here, by using the network-based analyses with three paralleled data sets derived from 16S rRNA gene pyrosequencing, functional microarray, and predicted metagenome, we test our hypothesis that the network interactions of traits are more conserved than those of taxonomic measures, with significantly lower variation of network characteristics along the environmental gradient in acid mine drainage. The results showed that although the overall network characteristics remained similar, the structural variation was significantly lower at trait levels. The higher conserved individual node topological properties at trait level rather than at species level indicated that the responses of diverse traits remained relatively consistent even though different species played key roles under different environmental conditions. Additionally, the randomization tests revealed that it could not reject the null hypothesis that species-based correlations were random, while the tests suggested that correlation patterns of traits were non-random. Furthermore, relationships between trait-based network characteristics and environmental properties implied that trait-based networks might be more useful in reflecting the variation of ecosystem function. Taken together, our results suggest that deterministic trait-based community assembly results in greater conservation of network interaction, which may ensure ecosystem function across environmental regimes, emphasizing the potential importance of measuring the complexity and conservation of network interaction in evaluating the ecosystem stability and functioning.


INTRODUCTION
Fundamental mechanisms such as habitat filtering (e.g., resource limitation and abiotic stress), historical contingency (e.g., dispersal limitation, disturbance, and priority effect), and species interactions (e.g., competition and facilitation) are of particular interest in explaining the processes of community assembly (Chase, 2003;Fukami et al., 2005;Emerson and Gillespie, 2008;Fukami, 2015). These community assembly mechanisms have been the focus of tests of whether microbial biogeographic patterns conform to patterns similar to that of macroscopic animals and plants (Martiny et al., 2006). As a consequence of this review, a large number of studies have focused on the influences of contemporary environmental factors and the legacies of historical events on the spatial distribution of microbial communities. However, the importance of species interactions for microbial community assembly and how these interactions change along the environmental gradients remain largely unexplored.
Understanding community assembly of microbial communities is crucial, because the assembly mechanisms and the resulting microbial diversity patterns have important repercussions for ecosystem function. Through ecosystemwide interaction networks, microorganisms can facilitate and accomplish diverse ecological processes and biogeochemical cycling of matter, energy, and nutrients (Raes and Bork, 2008;Faust and Raes, 2012). Microbial ecologists seek to unravel how microbes form these complex networks, how these network interactions affect community assembly processes and how the changes of co-occurrence patterns will ultimately affect the ecosystem functioning. Recent studies have reconstructed and described the co-occurrence patterns of species and/or traits in diverse natural habitats (Chaffron et al., 2010;Zhou et al., 2010Zhou et al., , 2011Steele et al., 2011;Barberán et al., 2012;Gilbert et al., 2012;Widder et al., 2014;Aylward et al., 2015;Ma et al., 2016), offering initial implications about whether the organization and variation of microbial co-occurrence networks along environmental gradients contribute to and affect the ecosystem function. Nevertheless, understanding the conservation of network interaction in response to the environmental changes at both species and trait levels remains largely unsolved, and which may provide important insights into the underlying mechanisms of community assembly and ecosystem stability (Cadotte et al., 2012).
Community assembly can be driven by deterministic and stochastic processes simultaneously and so both trait convergence and species divergence can be observed within the same community (Fukami et al., 2005). Indeed, increasing evidence suggests that species and traits may show distinct assembly patterns in microbial communities (Green et al., 2008;Burke et al., 2011;Raes et al., 2011;Barberán et al., 2014). Thus, species and traits may exhibit inconsistent responses to the environmental changes, and traits are expected to show greater deterministic response than that of species and converge toward a common functional structure determined by environmental conditions (Fukami et al., 2005). Basically, there are at least three types of species/traits showing different distributions, such as linear distribution (i.e., S/T1 and S/T2), normal/logarithmic distribution (i.e., with peak/maximum at optima condition, S/T3 and S/T4) and stochastic fluctuation (S/T5) along a given environmental gradient (Figure 1A). In a previous model-based prediction of the dynamics of community composition and functional attributes (Kuang et al., 2016), we show that microbial assemblages dynamics under anthropogenic disturbance, such as in acid mine drainage (AMD) sites, are better predicted using functional genes than taxonomic composition. Moreover, our previous result suggests that a higher proportion of traits are predictable under a certain condition in AMD, while most of the species tend to show less deterministic pattern of their relative abundances (proportion is represented as line thickness in Figure 1A).
Correlations based on these relative species abundances or relative metabolic potentials are widely used to uncover biologically or biochemically meaningful relationships between different species/traits across environmental gradients (Weiss et al., 2016). Generally, a positive correlation reflects mutualistic interactions or correlated environmental responses, while negative correlations suggest antagonistic relationships such as resource competition or differences in fitness optima across gradients. The correlations (i.e., correlation coefficients) between species/traits can remain relatively constant or shift dramatically in different ranges of an environmental gradient because of their distinct environmental responses ( Figure 1B). Here, because higher proportion of traits, rather than species, respond more deterministic to environmental changes (i.e., most of the traits should be similar to T1∼T4 as represented in Figure 1A, whereas more species are similar to Supplementary Figure S5), we hypothesize that less variation of correlation coefficients can be found at trait level. It should be noted that some species and traits will disappear entirely under a certain environmental condition. Therefore, the different patterns of correlations we supposed here are based on those species/traits that can persistently exist in a large range of environmental gradient, and the data of these species and traits are used for subsequent comparisons of network node properties.
In this study, we use network-based modeling methods to assess the ecological characteristics of these correlations, including the features of nodes (i.e., individual species/traits) and edges (i.e., different relationships between nodes) within the network (Weiss et al., 2016). Since these network interactions describe the co-occurrence of different species or traits across different samples, and not their real physical interactions directly, they are mathematically calculated based on the species-or trait-based correlations (Fuhrman and Steele, 2008;Fuhrman, 2009;Zhou et al., 2010Zhou et al., , 2011Barberán et al., 2012). Thus, the individual node network properties reflect its relationships with other nodes and its topological position, while the overall network structure is determined by the general characteristics of different nodes within the network. Several key network indexes of nodes and overall topology are calculated and used for the statistical tests of our hypothesis. Finally, according to our hypothesis that shown above, we suppose that the molecular ecological network (MEN) properties (i.e., network indexes) exhibit relatively higher variation at the species level because of their dramatic changes of interspecies relationships along different ranges of the environmental gradient, but remain fairly conserved at the trait level ( Figure 1C).
Here, using AMD as a model system with low species richness (Denef et al., 2010), we tested the hypothesis by comprehensively comparing the species-and trait-based MENs in response to environmental changes across 40 environmental samples that were collected from diverse AMD sites across Southeast China (Kuang et al., 2013). These acidic, metal-rich and low-complexity environments exhibit a strong environmental gradient and harbor metabolically active acidophiles, and with a relatively comprehensive understanding of microbial diversity and metabolic abilities (Méndez-García et al., 2015), making them ideal systems for MENs analyses. Our results supported the hypothesis that the conservation of network interaction is significantly higher at trait level rather than at species level, and the trait-based network characteristics are more related to and affected by the environmental changes.

Sample Grouping According to the Environmental pH
The geochemical properties of these 40 AMD samples are distinct along a clear gradient of environmental pH (between 1.86 and 4.10) and our previous studies (Kuang et al., 2013(Kuang et al., , 2016 demonstrated that solution pH was the most dominant factor shaping the taxonomic and functional structures in these microbial communities, revealing the distinct strategies of acidic adaptation. While other environmental properties such as Fe 2+ and heavy metal ions were also varied at similar pH condition and influenced the microbial communities, indicating the differences of heavy metal resistance and utilization and competition of resources. In this study, we aimed to compare the structural conservation of MENs in response to the pH gradient and samples within a specific pH gradient were grouped together for the MENs construction. Specifically, we separated samples into 6 a priori pH groups (i.e., G1-G6) based on the solution pH, and kept a similar the sample size similar among different groups, ranging from 6 to 8 samples (the pH condition and sample size of each pH group were shown in Supplementary Table S1). To validate that whether this sample grouping can explicitly reveal the environmental gradient, we compared the differences of overall environmental properties in the entire data set and between two pH groups by permutational multivariate analysis of variance test (PERMANOVA, "adonis" function of vegan 2.3-0 in R; R Core Team, 2014; Oksanen et al., 2015) on the Euclidean distance matrixes, which are calculated by overall measured environmental properties including electrical conductivity (EC), dissolved oxygen (DO), total organic carbon (TOC), total phosphorus (P), and the concentrations of sulfate (SO 2− 4 ), ferric (Fe 3+ ), ferrous (Fe 2+ ), aluminum (Al), arsenic (As), cadmium (Cd), copper (Cu), lead (Pd), and zinc (Zn) (Supplementary Figure S1A; the overall environmental properties are available in Supplementary Table S2 in Kuang et al., 2013). Additionally, principal component analysis was used to link the general pattern of overall environmental properties to distinct pH condition. Meanwhile, similar PERMANOVA and principal component analysis were also performed for different biological data sets to show their variation that explained by our sample grouping along the pH gradient (see details of the biological data sets below). Further, Bray-Curtis similarities of community composition and functional structure were calculated using species-and traitbased data sets to show their variations between samples with different values of Group Difference ranging from 0 to 5. For example, similarities of samples with Group Difference of 0 revealed variation of samples in the same pH group, while similarities of samples with Group Difference of 5 showed the beta diversities of samples between G1 and G6 (i.e., samples with most distinct pH values and environmental properties).

Data Set
In this study, relationships between different species/traits along the pH gradient were estimated by conducting pairwise Pearson correlations using three paralleled data sets (Supplementary Figure S1B). The correlations between different species were calculated based on the relative OTU (operational taxonomic unit, defined at the 97% 16S rRNA similarity level) abundances derived from 16S rRNA gene pyrosequencing data (Kuang et al., 2013). This sequencing data has been deposited in the European Nucleotide Archive database (accession no. PRJEB9908).
Meanwhile, the correlations between different traits were evaluated based on two functional data sets. One of them was the metabolic potentials (i.e., signal intensities) of diverse GeoChip probes (GCps), which covering major functional genes involved in biogeochemical processes and stress toleration and adaptation based on functional microarray (GeoChip 4.0; Tu et al., 2014;Kuang et al., 2016). This GeoChip data set is publicly available at http://ieg.ou.edu/4download/.
The other functional data set was the abundances (i.e., copy numbers) of KEGG orthologs (KOs) from different KEGG (Kyoto Encyclopedia of Genes and Genomes) categories, including metabolism, genetic information processing, environmental information processing, and cellular processes (Kanehisa and Goto, 2000). We generated this predicted metagenomic content (i.e., composition of KOs) from the 16S rRNA gene sequence data of each sample through a recently developed computational approach called PICRUSt (Langille et al., 2013), in order to provide a complementary model-based functional profiling strategy for the trait-based network construction (Supplementary Figure S1B). PICRUSt is a bioinformatics program to infer the metagenome relying on the reference genomes that pre-annotated against KEGG database and output a table of KOs abundances. The accuracy of metagenome prediction is assessed by the nearest sequenced taxon index (NSTI, the sum of phylogenetic distances for each OTU to its closely nearest related microbe) and in general decreases with increasing NSTI (Langille et al., 2013). Our samples had good NSTI values of 0.11 ± 0.07 (mean ± SD) according to the comparison of NSTI values across various environmental microbiomes (Langille et al., 2013), providing a reliable data set for metagenome prediction by PICRUSt.

Network Construction
In order to form reliable correlations and comparable MENs along the environmental gradient, only OTUs/GCps/KOs found in >50% of samples in each pH group were selected for subsequent analyses (Supplementary Figure S1C). Our MENs were constructed following the mathematical and bioinformatics framework developed previously (Luo et al., 2007;Zhou et al., 2010Zhou et al., , 2011Deng et al., 2012). Briefly, profiles of OTUs/GCps/KOs were standardized to mean value of 0 and variance value of 1. The standardized data matrixes were used for subsequent correlation analysis, and pairwise Pearson correlation coefficients were calculated to measure the similarity between OTUs/GCps/KOs across different samples in each pH group. A threshold value, which was automatically identified based on the data structure itself using the random matrix theory (RMT)-based approach, was used to perform correction on multiple pairwise correlations. The optimal threshold value was determined when the nearest neighbor spacing distribution of eigenvalues follows Poisson distribution (Luo et al., 2007). The similarity matrices were then converted into adjacency matrices using the optimal thresholds. Finally, valid correlations (i.e., co-occurrence links between different OTUs/GCps/KOs) based on the framework described above were retained for MEN construction (Supplementary Figure S1C). In total, 18 networks across 6 pH groups were constructed including 6 OTUs-MENs, 6 GCps-MENs, and 6 KOs-MENs.

Network Analyses and Statistics
In this study, the coefficient of variation (CV) of the topological index, which is defined as the ratio of the standard deviation to the mean, was used to standardly assess the extent of conservation of MEN structure in response to the environmental changes (Supplementary Figure S1D). The overall network and individual node topological indexes were calculated based on the adjacency matrix (Zhou et al., 2010(Zhou et al., , 2011Deng et al., 2012).
For the overall topology, the average geodesic distance (avgGD), average clustering coefficient (avgCC) and modularity (Clauset et al., 2004) were estimated (see the detailed definitions in the legend of Supplementary Table S2) and their CVs were subsequently calculated across the pH groups. For a given network, the avgGD shows how close between nodes and the avgCC is used to measure the extent of module structure (Zhou et al., 2010(Zhou et al., , 2011. While a module is a group of nodes that interact strongly among themselves but little with others in other modules, and the modularity reflects how a network is modular (Zhou et al., 2010(Zhou et al., , 2011. Because only a single data point of each overall network index was estimated for each MEN, thus only one CV value could be calculated across 6 pH groups and the statistical significance could not be assessed between species-and trait-based MENs. In order to perform the t-tests to show whether the CVs of these overall network indexes are significantly different between species-and trait-based MENs, a total of 100 random networks were generated individually for each MEN (i.e., 18 MENs in total) using the Maslov-Sneppen procedure (Maslov and Sneppen, 2002). This method keeps the numbers of nodes and links unchanged but rewires all of the links' positions based on the corresponding MEN, therefore the random networks and the original one have same size and are comparable to each other. Accordingly, the average and standard deviation for the simulated CVs were obtained based on these random networks and the statistical comparisons of the CVs between species-and trait-based MENs were applied by t-test using the standard deviations derived from the corresponding random networks (Zhou et al., 2010(Zhou et al., , 2011. For the node properties, only shared nodes found in >3 networks were kept for the statistical analyses. In total, 48, 2,501, and 2,129 nodes were shared (e.g., OTUs that were found in >3 networks among 6 OTUs-MENs) among OTUs-, GCps-, and KOs-MENs, respectively. In this study, we used node connectivity (also called node degree) to describe the topological property of a node in a network. Generally, connectivity of a give node is the sum of links connecting this node with all other connected nodes, representing how strongly a node is connect to other nodes in the network (Zhou et al., 2010) and explicitly revealing a basic network feature of nodes. Since the node connectivity is dependent on the number of nodes in the network (i.e., network size) and the original value can't be directly compared between different networks, we calculated and ranked the node connectivity of each network, and normalized these ranks between 1 and 100 according to this formula: where Rank i norm is the normalized value of node i, Rank i org is the original value of node i and Max gives the maximum values for the rank of node across the network. The frequency distributions of their CVs across 6 pH groups were plotted based on different data sets and the statistical significance was tested by Wilcoxon test. Additionally, these normalized ranks were compared by cross-validation under lower and higher pH conditions. Their conservation was assessed by the distances (Dvalues) between normalized ranks of nodes and diagonal line. The normalized rank of a given node lying on the diagonal line reflects that it is completely constant under lower and higher pH conditions. The D-values for different data sets (i.e., 48, 2,501, and 2,129 normalized ranks in OTUs, GCps, and KOs data sets, respectively) were compared and tested by ANOVA. Furthermore, the pairwise correlation coefficients of every pair of OTUs/GCps/KOs were visualized along the pH gradient. We calculated the CV of each pair along the pH gradient and compared them among different data sets using ANOVA to estimate the difference of conservation of their cooccurrence patterns. We finally assess whether these correlation patterns were non-random by performing the randomization test. Specifically, we randomized the correlation coefficients for each column (i.e., a pH group, and a total of six columns for a data set) and calculated the CV of each row (i.e., six randomized correlation coefficients) and the mean CV for all rows. We repeated this randomization 999 times to obtain 999 mean CV values and form the null distribution for each data set. We then calculated the P-value as the rank of the mean observed CV relative to this null distribution to estimate that whether the observed correlation patterns were significantly different from random distribution pattern.

Random Grouping of Samples
To test whether the conservation of network interaction was actually related to the pH gradient, additional random grouping of our 40 AMD samples was also performed using GCps data set. All the methods for the network construction and indexes calculation that mentioned above were same except that samples were grouped along the pH gradient or randomly. Comparison of the overall topological properties of GCps-MENs were then conducted between pH-based sample grouping and random grouping.

Association of Network Characteristics with Environmental Properties
To decipher whether the changes of network characteristics are relevant to the environmental properties, their relationships were measured by Mantel tests. A connectivity score was defined as 101 subtract the normalized rank of each node in each MEN and set as zero when the value of normalized rank was missing. Thus, this connectivity score was ranged from 0 to 100 and reflected the network characteristic of nodes. Nodes with higher connectivity score suggest that they are module hubs with strong interaction with other nodes and locate in key topological positions in the network. Meanwhile, we used node significance to identify the correlations between the nodes and the environmental properties (Zhou et al., 2010(Zhou et al., , 2011. Specifically, the node significance was calculated as the square of Spearman correlations between relative abundances of OTUs/signal intensities of GCps/abundances of KOs and every standardized environmental property (i.e., a total of 14 environmental variables as mentioned above) for all shared nodes (i.e., 48, 2,501, and 2,129 shared nodes for OTUs-, GCps-, and KOs data sets, respectively) in each MEN. Higher node significance indicates higher correlation between a given node and a certain environmental variable. Finally, the Mantel test was performed to estimate the relationship between connectivity score and node significance based on their Euclidean distance matrices across 6 pH groups for each environmental variable in different data sets. Here, we attempt to understand the importance of network structure in the ecosystem functioning that referred by the state of the environmental properties, and use this Mantel test to examine whether the change of nodes' network topology is related to and affected by environmental properties. Generally, significant relationship implies that nodes with similar network topological characteristics will have similar correlations with an environmental variable.

Environmental Properties, Community Composition and Functional Structure across Different pH Groups
The pH values of our samples showed a clear gradient across different pH groups (see details in Section Materials and Methods and Supplementary Table S1; a total of 6 a priori pH groups were defined), which was demonstrated previously as the major factor shaping the taxonomic and functional structures (Kuang et al., 2013(Kuang et al., , 2016. Consistent with this, the overall measured environmental properties were significantly different in the entire data set (PERMANOVA, R 2 = 0.19, P < 0.05) and between groups with distinct pH values (e.g., G1 vs. G3) but remained similar between groups with narrow pH range (e.g., G1 vs. G2; Supplementary Table S1). Principal component analysis also suggested that the variation of these overall environmental properties was well explained by our sample grouping along the pH gradient (Supplementary Figure S2A). Additionally, similar trends were found for species-and trait-based data sets (Supplementary Figure S2B-D and Supplementary Table S1). Further, Bray-Curtis similarities of community composition and functional structure showed consistent patterns that both species-and trait-based similarities significantly decreased (P < 0.05) between samples with more distinct pH values and environmental properties (Supplementary Figure S3). These results suggested that the sample grouping used in this study could explicitly reveal the gradient of variation of environmental properties, with significant shifts in community composition and functional structure, implying that the structural variation of MENs across these pH groups may reflect their response to the changes of environmental conditions.

The Overall Network Structure and Its Variation of Different MENs
The structural conservation of overall network topology was assessed using species and trait data. In total, 48, 2,501, and 2,129 nodes were shared (e.g., OTUs that were found in >3 networks among 6 OTUs-MENs) among OTUs-, GCps-, and KOs-MENs, respectively. Several overall network topological indexes including the average geodesic distance (avgGD), average clustering coefficient (avgCC) and modularity were estimated and their CVs were calculated across 6 pH groups (Supplementary Table S2). The values of the overall network topological indexes across the environmental gradient were consistent without significant differences among different data sets, however, significant differences (P < 0.05) were found for their CVs with higher variation in OTUs-MENs compared to those in GCps-and KOs-MENs (Supplementary Table S2). Additional comparison of the overall network topologies of GCps-MENs revealed significantly higher variations in randomly grouped data set than that were grouped based on the pH gradient (Supplementary Table S3), suggesting that the conserved traitbased network interaction was actually related to the pH gradient. Although, previous studies have focused on how environmental changes (e.g., elevated CO 2 ) affect the overall network topology of species as well as functional gene network interactions in microbial communities (Zhou et al., 2010(Zhou et al., , 2011, little research has comprehensively compared their structural conservation across an environmental gradient. These results suggested that even though the overall topology remained similar, the structural variation between species-and trait-based MENs varied markedly under the same environmental gradient. This significantly lower overall structural variation at trait levels than that at species level revealed more conserved network interactions between traits during the environmental changes, and might be one of the underlying mechanisms to maintain the ecosystem stability.

Conservation of Node Topological Pattern among Different MENs
The node connectivity of each MEN was ranked and normalized before statistical analyses of individual node network characteristics. The frequency distributions of the CVs clearly showed a significant difference between speciesand trait-based MENs (P < 0.0001, Wilcoxon test; Figure 2). Specifically, the overall trend of CVs for GCps-and KOs-MENs remained similar (P = 0.064) but their values were significantly lower than those of OTUs-MENs. Consistently, cross-validation supported this pattern that higher percentage of nodes (GCps: 76% and KOs: 79%) revealed less variation (<20%, gray areas in Figure 3) of normalized ranks between lower and higher pH conditions with significant linear correlations (P < 0.0001, red lines in Figure 3) at trait levels than that at the species level (OTUs: 50%; non-significant linear correlation, P = 0.58). Additionally, similar results were also found that the D-values were significantly higher (P < 0.05, ANOVA) at species level compared to trait levels [D OTUs = 23.4 ± 17.8, D GCps = 13.2 ± 7.2 and D KOs = 11.2 ± 8.7 (mean ± SD)], suggesting significantly higher variation of normalized ranks at species level (Figure 3). These results indicated that the conservation of node connectivity for trait-based MENs were higher compared to those of species-based MENs. Furthermore, the comparison of conservation of node correlations clearly showed significantly higher mean CVs of correlation coefficients between OTUs than those between GCps or KOs (P < 0.001, ANOVA), while no significant difference was found between GCps and KOs (P = 0.25; Figure 4). A randomization test revealed a non-significant P-value (P = 0.76) for the correlation patterns of OTUs and indicated that the null hypothesis that species correlations were random could not be rejected. In contrast, a significant P-value (P < 0.001) was found and suggested a non-random correlation patterns for GCps and KOs. These patterns suggested that the correlations of traits are more conserved than those of species in response to the environmental changes. In summary, our findings revealed that the topological positions of diverse traits in different MENs were relatively conserved across environmental gradients even though different species played key roles under distinct environmental conditions.

Association of Network Characteristics with Environmental Properties
We finally examined whether the differences of network characteristics between species and trait levels could reflect their different correlations to the environmental properties. We used connectivity scores and node significance to assess the network characteristics and the correlations between nodes and environmental properties, respectively (see details in Section Materials and Methods). Mantel tests revealed stronger and more significant correlations between connectivity scores (i.e., network characteristics) and node significance (i.e., correlations between nodes and environmental properties) for all the environmental variables at trait levels than at species level (Supplementary Table S4). These results suggested that the trait-based network characteristics were more related to and affected by environmental conditions, possibly revealing more deterministic responses to the environmental changes.

DISCUSSION
Co-occurrence patterns not only show how particular organisms occur together under certain environmental conditions but also help to shed light on community assembly rules (Gotelli and McCabe, 2002). Our understanding of community assembly is based on the measures of the species diversity and composition, but species traits are increasingly being emphasized as important in explaining variation in community assembly and ecosystem function (Cadotte et al., 2013). Traits directly influence physiological and biochemical performance or species' fitness, and determine how species interact with one another or even the contributions of species to ecosystem function (McGill et al., 2006;Cadotte et al., 2011). By using the network-based approach, we explicitly showed distinct co-occurrence patterns between species and trait levels in an anthropogenically disturbed ecosystem. This study highlights the importance of interaction networks in understanding microbial community assembly, and more importantly, addresses how the conservation of network interaction responds to the environmental changes. Our results reveal that the conservation of trait-based network interaction stands in stark contrast to that of OTU networks, which often show high variation of network properties along environmental gradients (Widder et al., 2014) or across geographic locations (Ma et al., 2016). Further, this study clearly indicates that, although the co-occurrence (i.e., interacting relationships) at species level varied dramatically along the pH gradient, it remained strongly conserved at trait level. Such findings can be possibly explained by the fact that environmental conditions determine the available types of ecological niches that can be colonized randomly by whichever suitable species with similar traits, but that species composition within a functionally equivalent group depends on those species happen to arrive there first during the history of community assembly, that is trait-based The mean values of normalized rank of node connectivity under lower and higher pH were calculated using the data set of environmental group G1-G3 and G4-G6, respectively. Red lines show the best-fitted linear regression models, and the normalized ranks located within gray areas (OTUs: 50%; GCps: 76%; KOs: FIGURE 3 | Continued 79%) represent <20% of the difference between lower and higher pH conditions. D-values are the distances between normalized ranks of nodes and diagonal line. The D mean (mean ± SD) were calculated based on the normalized ranks of 48, 2,501, and 2,129 nodes in OTUs, GCps, and KOs data sets, respectively.
assembly is deterministic and species-based appears much more stochastic (Fukami et al., 2005;Burke et al., 2011;Fukami, 2015). Thus, this deterministic trait-based assembly results in a more conserved network interaction among different traits along the environmental gradient. Previous studies have reported that the metagenomic/functional composition is less variable than taxonomic composition in the human and ocean microbiomes (Human Microbiome Project Consortium, 2012;TARA Oceans Consortium, 2015). However, in our AMD samples, the variation of species-and trait-based communities can be well explained by the pH gradient (Supplementary Figure S2), and consistent patterns of their similarities were observed with significant decrease along the increase of environmental distance (Supplementary Figure S3), suggesting a stronger environmental filtering in this extreme environment. This result implied that the conserved trait-based network interaction was inherently due to the correlated or opposite environmental responses of diverse traits but not their similar functional structures. Furthermore, the test of random grouping in this study showed that the variation of the overall topology was significantly higher using the data set of random grouping than that of pH-based grouping (Supplementary Table S3). This result supports our idea that when the samples are randomly grouped, their trait distributions are no longer predictable and fluctuate stochastically among the groups. Therefore, this test, to some extent, resulted in trait distributions similar to those of relative species abundances, implying that different types of responses to pH is a key underlying mechanism explaining differences of network conservation between species and traits.
Ecosystem function broadly refers to the state of the system (e.g., stocks of materials like carbon, nitrogen, and nutrients) and the rates of processes involving fluxes of energy and matter that sustain the system, thus the environmental properties can be widely used to characterize ecosystem function (Jax, 2005). Clear evidence has revealed that changes of ecosystem functions like global CO 2 elevation (Zhou et al., 2010(Zhou et al., , 2011 and the shifts of fluvial network hydrology (Widder et al., 2014) and soil physiochemical features (Ma et al., 2016) apparently affect the phylogenetic or functional network topology. Our observation further indicated that, compared to species-based networks, trait-based networks might be more useful in reflecting the variation of ecosystem function. This implies that although global climate change factors fundamentally impact species diversity, composition and co-occurrence patterns, higher conservation of trait-based network interaction during environmental change might result in higher ecosystem stability.
In this study, the network structural conservation and interaction pattern of the predicted metagenome were similar to those based on functional microarray data, although this functional profiling was relying on the phylogenetic data. However, the limitation of interpreting PICRUSt predictions should be considered that the detected patterns depends on the reference genomes. The relatively insufficient metagenomes in extreme habitats such as hypersaline and acidic communities may cause the prediction accuracy to appear artificially lower (Langille et al., 2013). Therefore, other community functional measures like metatranscriptome and metaproteome are needed for testing our hypothesis in further study.
Computational exploration through microbial correlation networks is considered as a necessitating technique to study the microbial communities because of their enormous complexity (Weiss et al., 2016). However, the performance and limitations are different among these computational methods, which may cause different inferring correlation networks (Weiss et al., 2016). To test whether different approaches for network construction will affect our results, SparCC, which is particularly designed to deal with compositional data (Friedman and Alm, 2012, available at https://bitbucket.org/yonatanf/sparcc), was used complementarily for our analyses of network conservation (see details in Supplementary Text). Consistent results were found by using SparCC (Supplementary Figure S4, S5 and Supplementary   Table S5, S6), implying that our observed pattern of network conservation was insensitive to the approaches of network construction. Despite of this, it should be noted that our hypothesis was only tested by correlation-based methods, which may have several limitations, implying that other approaches like Bayesian network based on graphical model were still needed to further validate our findings (Friedman, 2004). Moreover, the statistical tests of overall network topologies were dependent on the random networks, which serving as a null model. However, the significant differences of the network properties between species-and trait-based networks may be an artifact due to the null model networks we used in this study (see detailed in Section Materials and Methods; Artzy-Randrup et al., 2004;Beber et al., 2012). Therefore, the interpretation of topological features between species-and trait-based networks should be addressed in future by using different null models.
In addition to the network construction, the comparison is another key challenge for interpreting the complexity and conservation among multiple biological networks. Although, we applied basic statistical analyses based on the indexes of the topological properties to assess the conservation of network characteristics, recent developed graph-theoretic algorithms provide more powerful performance to qualify the similarity and the conserved parts between networks (Pavlopoulos et al., 2011;Ali et al., 2014;Hu and Reinert, 2015;Yaveroglu et al., 2015). Future study using these advanced methods will help us to uncover the hidden features of interaction networks, and to better evaluate the dynamics of these features during the succession of the microbial communities.
In summary, we have linked the organization and conservation of species-and traits-based network interactions to the ecological processes of community assembly in AMD system, and emphasized that in addition to the calculation of diversity indexes based on species/trait richness and abundance, measuring the complexity and conservation of their network relationships and characteristics can be considered as an alternative way to provide a further understanding of ecosystem stability and functioning (Zhou et al., 1991). Ultimately, simultaneously illustrating the changes of diversity, composition and network interaction during the succession of microbial communities using species-and trait-based data will increase our knowledge of the modeling of ecosystem dynamics, and help with the engineering and manipulation of complex microbial communities that are relevant for waste water treatment, food production and the prevention and treatment of diseases (Faust and Raes, 2012).

DATA ACCESSIBILITY
The 16S rRNA gene pyrosequencing data reported in this paper have been deposited in the European Nucleotide Archive database (accession no. PRJEB9908). The GeoChip data set reported in this paper is publicly available at http://ieg.ou.edu/ 4download/.

AUTHOR CONTRIBUTIONS
JK, JZ, WS, and LH conceived the research. JK and MC drafted the manuscript. JK, JL, and LC carried out the field sampling. JK, MC, YC, and HS performed the analysis with advice from JL, LC, ZH, LH, JZ, and WS. JK, MC, YC, JZ, WS, and LH discussed the results and contributed to the revision of the final manuscript.